Reasoning24
* *Multimodal Algorithmic Reasoning Workshop
* Multi-Explainable TemporalNet: An Interpretable Multimodal Approach using Temporal Convolutional Network for User-level Depression Detection
* Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models
* Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms
* ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System
* What does CLIP know about peeling a banana?
Reasoning25
* *Multimodal Algorithmic Reasoning Workshop
* Autonomous Multimodal Reasoning via Implicit Chain-of-Vision
* Comparison Visual Instruction Tuning
* Exemplar Masking for Multimodal Incremental Learning
* Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions
* SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging