ODRUM22: Open-Domain Retrieval Under Multi-Modal Settings
* Conditioned and composed image retrieval combining and partially fine-tuning CLIP-based features
* Cross-modal Target Retrieval for Tracking by Natural Language
* Deep Image Retrieval is not Robust to Label Noise
* Deep Normalized Cross-Modal Hashing with Bi-Direction Relation Reasoning
* Embedding Arithmetic of Multimodal Queries for Image Retrieval
* Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning
* Object Prior Embedded Network for Query-Agnostic Image Retrieval
