Journals starting with mula

MULA21 * *Multimodal Learning and Applications Workshop
* 3D Hand Pose Estimation via aligned latent space injection and kinematic losses
* Adaptive Intermediate Representations for Video Understanding
* APES: Audiovisual Person Search in Untrimmed Video
* Beyond VQA: Generating Multi-word Answers and Rationales to Visual Questions
* Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
* Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
* Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing
* Exploring the Limits of Zero-Shot Learning: How Low Can You Go?
* Improved Attention for Visual Question Answering, An
* Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning
* Private-Shared Disentangled Multimodal VAE for Learning of Latent Representations
* Progressive Knowledge-Embedded Unified Perceptual Parsing for Scene Understanding
* Radar Camera Fusion via Representation Learning in Autonomous Driving
* Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences
* Target-Tailored Source-Transformation for Scene Graph Generation
* Using Text to Teach Image Retrieval
17 for MULA21

MULA22 * *Multimodal Learning and Applications
* Cascaded Siamese Self-supervised Audio to Video GAN
* Coarse-to-Fine Reasoning for Visual Question Answering
* Coupling Vision and Proprioception for Navigation of Legged Robots
* Doubling down: sparse grounding with an additional, almost-matching caption for detection-oriented multimodal pretraining
* Emphasizing Complementary Samples for Non-literal Cross-modal Retrieval
* Guiding Attention using Partial-Order Relationships for Image Captioning
* Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
* Learning to Ask Informative Sub-Questions for Visual Question Answering
* M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation
* Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters
* Probabilistic Compositional Embeddings for Multimodal Image Retrieval
* Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
* Semantically Grounded Visual Embeddings for Zero-Shot Learning
* Transformer Decoders with Multi-Modal Regularization for Cross-Modal Food Retrieval
* Unreasonable Effectiveness of CLIP Features for Image Captioning: An Experimental Analysis, The
16 for MULA22

MULA23 * *Multimodal Learning and Applications
* Adapting Grounded Visual Question Answering Models to Low Resource Languages
* Dynamic Multimodal Fusion
* Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
* MONET dataset: Multimodal drone thermal dataset recorded in rural scenarios, The
* Multi Event Localization by Audio-Visual Fusion with Omnidirectional Camera and Microphone Array
* Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention
* SEM-POS: Grammatically and Semantically Correct Video Captioning
* SSGVS: Semantic Scene Graph-to-Video Synthesis
* TFRGAN: Leveraging Text Information for Blind Face Restoration with Extreme Degradation
10 for MULA23

Index for "m"


Last update:25-Mar-24 16:25:22
Use price@usc.edu for comments.