ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-Verified Image-Caption Associations for MS-COCO.- MOTCOM: The Multi-Object Tracking Dataset Complexity Metric.- How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?.- A Real World Dataset for Multi-View 3D Reconstruction.- REALY: Rethinking the Evaluation of 3D Face Reconstruction.- Capturing, Reconstructing, and Simulating: The UrbanScene3D Dataset.- 3D CoMPaT: Composition of Materials on Parts of 3D Things.- PartImageNet: A Large, High-Quality Dataset of Parts.- A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge.- OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images.- Facial Depth and Normal Estimation Using Single Dual-Pixel Camera.- The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing.- StyleBabel: Artistic Style Tagging and Captioning.- PANDORA: A Panoramic Detection Dataset for Object with Orientation.- FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context.- Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset.- The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting.- A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility.- BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis.- Dress Code: High-Resolution Multi-Category Virtual Try-On.- A Data-Centric Approach for Improving Ambiguous Labels with Combined Semi-Supervised Classification and Clustering.- ClearPose: Large-Scale Transparent Object Dataset and Benchmark.- When Deep Classifiers Agree: Analyzing Correlations between Learning Order and Image Statistics.- AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment.- MUGEN: A Playground for Video-Audio-Text Multimodal
Understanding and GENeration.- A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing.- MimicME: A Large Scale Diverse 4D Database for Facial Expression Analysis.- Delving into Universal Lesion Segmentation: Method, Dataset, and
Benchmark.- Large Scale Real-World Multi-person Tracking.- D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights.- The Missing Link: Finding Label Relations across Datasets.- Learning Omnidirectional Flow in 360° Video via Siamese Representation.- VizWiz-FewShot: Locating Objects in Images Taken by People with
Visual Impairments.- TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments.- Trapped in Texture Bias? A Large Scale Comparison of Deep Instance
Segmentation.- Deformable Feature Aggregationfor Dynamic Multi-modal 3D Object
Detection.- WeLSA: Learning to Predict 6D Pose from Weakly Labeled Data Using
Shape Alignment.- Graph R-CNN: Towards Accurate 3D Object Detection with
Semantic-Decorated Local Graph.- MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection.- Long-Tail Detection with Effective Class-Margins.- Semi-Supervised Monocular 3D Object Detection by Multi-View Consistency.- PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer towards Video Object Detection.