FMM: Special Session on Foundation Models for Multimedia.- Removing Stray-Light for Wild-Field Fundus Image Fusion based on Large Generative Models.- Training-free Region Prediction with Stable Diffusion.- Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites.- GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ Segmentation.- Fine-Grained Multi-Modal Fundus Image Generation Based on
Diffusion Models for Glaucoma Classification.- Adapting Pretrained Large-Scale Vision Models for Face Forgery Detection.- ICDAR: Special Session on Intelligent Cross-Data Analysis and Retrieval.- Towards Cross-modal Point Cloud Retrieval for Indoor Scenes.- Correlation visualization under missing values: a comparison between
imputation and direct parameter estimation methods.- IFI: Interpreting for Improving: a Multimodal Transformer with an Interpretability Technique for Recognition of Risk Events.- OOKPIK - A Collection of Out-of-Context Image-Caption Pairs.- LUMOS-DM: Landscape-based Multimodal Scene Retrieval Enhanced by Diffusion Model.- XR-MACCI: Special Session on eXtended Reality and Multimedia - Advancing Content Creation and Interaction.- Mining Landmark Images for Scene Reconstruction from Weakly
Annotated Video Collections.- A framework for 3D modeling of construction sites using aerial imagery and semantic NeRFs.- Multimodal 3D Object Retrieval.- An Integrated System for Spatio-Temporal Summarization of 360-degrees Videos.- Brave New Ideas.- Mutant Texts: A Technique for Uncovering Unexpected Inconsistencies in Large-Scale Vision-language Models.- Exploring Artificial Intelligence for Advancing Performance Processes and Events in Io3MT.- Demonstrations.- Implementation of Melody Slot Machines.- E2Evideo: End to End Video and Image Pre-processing and Analysis Tool.- Augmented Reality Photo Presentation and Content-based Image
Retrieval on Mobile Devices with AR-Explorer.- Augmented Reality Photo Presentation and Content-based Image Retrieval on Mobile Devices with AR-Explorer.- AI-Based Cropping of Soccer Videos for Different Social Media Representations.- Few-shot Object Detection as a Service: Facilitating Training and Deployment for Domain Experts.- DatAR: Supporting Neuroscience Literature Exploration by Finding
Relations between Topics in Augmented Reality.- EmoAda:A Multimodal Emotion Interaction and Psychological Adaptation System.- Video Browser Showdown.- Waseda Meisei SoftBank at Video Browser Showdown 2024.- Exploring Multimedia Vector Spaces with vitrivr-VR.- A new Retrieval Engine for vitrivr.- VISIONE 5.0: EnhancedUser Interface and AI Models for VBS2024.- PraK Tool: An Interactive Search Tool Based on Video Data Services.- Exquisitor at the Video Browser Showdown 2024: Relevance Feedback Meets Conversational Search.- VERGE in VBS 2024.- Optimizing the Interactive Video Retrieval Tool Vibro for the Video Browser Showdown 2024.- diveXplore at the Video Browser Showdown 2024.- Leveraging LLMs and Generative Models for Interactive Known-Item Video Search.- TalkSee: Interactive Video Retrieval Engine Using Large Language Model.- VideoCLIP 2: An Interactive CLIP-based Video Retrieval System for Novice Users at VBS2024.- ViewsInsight: Enhancing Video Retrieval for VBS 2024 with a User-Friendly Interaction Mechanism.