MULA21 * *Multimodal Learning and Applications Workshop
* 3D Hand Pose Estimation via aligned latent space injection and kinematic losses
* Adaptive Intermediate Representations for Video Understanding
* APES: Audiovisual Person Search in Untrimmed Video
* Beyond VQA: Generating Multi-word Answers and Rationales to Visual Questions
* Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
* Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
* Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing
* Exploring the Limits of Zero-Shot Learning: How Low Can You Go?
* Improved Attention for Visual Question Answering, An
* Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning
* Private-Shared Disentangled Multimodal VAE for Learning of Latent Representations
* Progressive Knowledge-Embedded Unified Perceptual Parsing for Scene Understanding
* Radar Camera Fusion via Representation Learning in Autonomous Driving
* Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences
* Target-Tailored Source-Transformation for Scene Graph Generation
* Using Text to Teach Image Retrieval
