NFVLR23
* *Workshop and Challenges for New Frontiers in Visual Language Reasoning: Compositionality, Prompts and Causality
* Abstract Visual Reasoning Enabled by Language
* Causalainer: Causal Explainer for Automatic Video Summarization
* Is Multimodal Vision Supervision Beneficial to Language?
* Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestrian Attribute Recognition