Reasoning24
* *Multimodal Algorithmic Reasoning Workshop
* Multi-Explainable TemporalNet: An Interpretable Multimodal Approach using Temporal Convolutional Network for User-level Depression Detection
* Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models
* Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms
* ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System
* What does CLIP know about peeling a banana?