Visual7W visual question answering,
Large-scale visual question answering (QA) dataset, with object-level
groundings and multimodal answers.
WWW Link.
Dataset, Visual Question Answering.
Liang, J.W.[Jun-Wei],
Jiang, L.[Lu],
Cao, L.L.[Liang-Liang],
Kalantidis, Y.[Yannis],
Li, L.J.[Li-Jia],
Hauptmann, A.G.[Alexander G.],
Focal Visual-Text Attention for Memex Question Answering,
PAMI(41), No. 8, August 2019, pp. 1893-1908.
IEEE DOI
1907
BibRef
Earlier: A1, A2, A3, A5, A6, Only:
Focal Visual-Text Attention for Visual Question Answering,
CVPR18(6135-6143)
IEEE DOI
1812
Task analysis, Knowledge discovery, Visualization, Grounding,
Metadata, Cognition, Photo albums, question answering,
memex.
Visualization, Videos, Computational modeling, Correlation.
BibRef
Riquelme, F.[Felipe],
de Goyeneche, A.[Alfredo],
Zhang, Y.D.[Yun-Dong],
Niebles, J.C.[Juan Carlos],
Soto, A.[Alvaro],
Explaining VQA predictions using visual grounding and a knowledge
base,
IVC(101), 2020, pp. 103968.
Elsevier DOI
2009
Deep Learning, Attention, Supervision, Knowledge Base,
Interpretability, Explainability
BibRef
Yang, Z.Y.[Zheng-Yuan],
Kumar, T.[Tushar],
Chen, T.L.[Tian-Lang],
Su, J.S.[Jing-Song],
Luo, J.B.[Jie-Bo],
Grounding-Tracking-Integration,
CirSysVideo(31), No. 9, September 2021, pp. 3433-3443.
IEEE DOI
2109
Grounding, Target tracking, Visualization, History, Task analysis,
Object tracking, Annotations, Tracking by language
BibRef
Zhang, W.X.[Wei-Xia],
Ma, C.[Chao],
Wu, Q.[Qi],
Yang, X.K.[Xiao-Kang],
Language-Guided Navigation via Cross-Modal Grounding and Alternate
Adversarial Learning,
CirSysVideo(31), No. 9, September 2021, pp. 3469-3481.
IEEE DOI
2109
Navigation, Training, Trajectory, Visualization, Task analysis,
Grounding, Generators, Vision-and-language, embodied navigation,
adversarial learning
BibRef
Zhai, S.L.[Song-Lin],
Guo, G.B.[Gui-Bing],
Yuan, F.J.[Fa-Jie],
Liu, Y.[Yuan],
Wang, X.W.[Xing-Wei],
VSE-fs: Fast Full-Sample Visual Semantic Embedding,
IEEE_Int_Sys(36), No. 4, July 2021, pp. 3-12.
IEEE DOI
2109
Construct a joint embedding space between visual features and semantic
information.
Computational modeling, Training, Integrated circuits,
Time complexity, Semantics, Visualization, Intelligent systems,
Negative Sampling
BibRef
Bargal, S.A.[Sarah Adel],
Zunino, A.[Andrea],
Petsiuk, V.[Vitali],
Zhang, J.M.[Jian-Ming],
Saenko, K.[Kate],
Murino, V.[Vittorio],
Sclaroff, S.[Stan],
Guided Zoom: Zooming into Network Evidence to Refine Fine-Grained
Model Decisions,
PAMI(43), No. 11, November 2021, pp. 4196-4202.
IEEE DOI
2110
Grounding, Training, Predictive models, Annotations,
Location awareness, Correlation, Visualization, Explainable AI,
convolutional neural networks
BibRef
Yang, W.F.[Wen-Fei],
Zhang, T.Z.[Tian-Zhu],
Zhang, Y.D.[Yong-Dong],
Wu, F.[Feng],
Local Correspondence Network for Weakly Supervised Temporal Sentence
Grounding,
IP(30), 2021, pp. 3252-3262.
IEEE DOI
2103
Grounding, Annotations, Training,
Feature extraction, Computational modeling, Task analysis,
temporal sentence grounding
BibRef
Luo, W.[Wang],
Zhang, T.Z.[Tian-Zhu],
Yang, W.F.[Wen-Fei],
Liu, J.G.[Jin-Gen],
Mei, T.[Tao],
Wu, F.[Feng],
Zhang, Y.D.[Yong-Dong],
Action Unit Memory Network for Weakly Supervised Temporal Action
Localization,
CVPR21(9964-9974)
IEEE DOI
2111
Location awareness, Training, Knowledge engineering,
Motion segmentation, Refining, Interference, Benchmark testing
BibRef
Hong, R.C.[Ri-Chang],
Liu, D.[Daqing],
Mo, X.Y.[Xiao-Yu],
He, X.N.[Xiang-Nan],
Zhang, H.W.[Han-Wang],
Learning to Compose and Reason with Language Tree Structures for
Visual Grounding,
PAMI(44), No. 2, February 2022, pp. 684-696.
IEEE DOI
2201
Grounding, Visualization, Dogs, Natural languages, Cognition,
Computational modeling, Semantics, Fine-grained detection, visual reasoning
BibRef
Tang, K.H.[Kai-Hua],
Zhang, H.W.[Han-Wang],
Wu, B.Y.[Bao-Yuan],
Luo, W.H.[Wen-Han],
Liu, W.[Wei],
Learning to Compose Dynamic Tree Structures for Visual Contexts,
CVPR19(6612-6621).
IEEE DOI
2002
BibRef
Bin, Y.[Yi],
Ding, Y.J.[Yu-Juan],
Peng, B.[Bo],
Peng, L.[Liang],
Yang, Y.[Yang],
Chua, T.S.[Tat-Seng],
Entity Slot Filling for Visual Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 52-62.
IEEE DOI
2201
Task analysis, Visualization, Neural networks, Adaptation models,
Filling, Grounding, Training, Image captioning,
dataset
BibRef
Chu, C.[Chenhui],
Oliveira, V.[Vinicius],
Virgo, F.G.[Felix Giovanni],
Otani, M.[Mayu],
Garcia, N.[Noa],
Nakashima, Y.[Yuta],
The semantic typology of visually grounded paraphrases,
CVIU(215), 2022, pp. 103333.
Elsevier DOI
2201
Vision and language, Image interpretation,
Visual grounded paraphrases, Semantic typology, Dataset
BibRef
Deng, C.R.[Chao-Rui],
Wu, Q.[Qi],
Wu, Q.Y.[Qing-Yao],
Hu, F.Y.[Fu-Yuan],
Lyu, F.[Fan],
Tan, M.K.[Ming-Kui],
Visual Grounding Via Accumulated Attention,
PAMI(44), No. 3, March 2022, pp. 1670-1684.
IEEE DOI
2202
BibRef
Earlier:
CVPR18(7746-7755)
IEEE DOI
1812
Task analysis, Grounding, Cognition, Visual grounding,
bounding box regression.
Visualization, Feature extraction, Grounding, Natural languages,
Redundancy, Task analysis, Computational modeling
BibRef
Plummer, B.A.[Bryan A.],
Shih, K.J.[Kevin J.],
Li, Y.C.[Yi-Chen],
Xu, K.[Ke],
Lazebnik, S.[Svetlana],
Sclaroff, S.[Stan],
Saenko, K.[Kate],
Revisiting Image-Language Networks for Open-Ended Phrase Detection,
PAMI(44), No. 4, April 2022, pp. 2155-2167.
IEEE DOI
2203
Task analysis, Grounding, Visualization, Feature extraction,
Benchmark testing, Detectors, Vocabulary, Vision and language,
representation learning
BibRef
Burns, A.[Andrea],
Tan, R.[Reuben],
Saenko, K.[Kate],
Sclaroff, S.[Stan],
Plummer, B.A.[Bryan A.],
Language Features Matter: Effective Language Representations for
Vision-Language Tasks,
ICCV19(7473-7482)
IEEE DOI
2004
Code, Visualization.
WWW Link. data visualisation, graph theory, image representation,
learning (artificial intelligence), Grounding
BibRef
Arbelle, A.[Assaf],
Doveh, S.[Sivan],
Alfassy, A.[Amit],
Shtok, J.[Joseph],
Lev, G.[Guy],
Schwartz, E.[Eli],
Kuehne, H.[Hilde],
Levi, H.B.[Hila Barak],
Sattigeri, P.[Prasanna],
Panda, R.[Rameswar],
Chen, C.F.[Chun-Fu],
Bronstein, A.M.[Alex M.],
Saenko, K.[Kate],
Ullman, S.[Shimon],
Giryes, R.[Raja],
Feris, R.S.[Rogerio S.],
Karlinsky, L.[Leonid],
Detector-Free Weakly Supervised Grounding by Separation,
ICCV21(1781-1792)
IEEE DOI
2203
Training, Location awareness, Visualization, Image segmentation,
Grounding, Genomics, Detectors, Vision + language,
BibRef
Whitehead, S.[Spencer],
Wu, H.[Hui],
Ji, H.[Heng],
Feris, R.S.[Rogerio S.],
Saenko, K.[Kate],
Separating Skills and Concepts for Novel Visual Question Answering,
CVPR21(5628-5637)
IEEE DOI
2111
Training, Visualization, Grounding, Annotations,
Knowledge discovery, Encoding
BibRef
Yu, X.T.[Xin-Tong],
Zhang, H.M.[Hong-Ming],
Hong, R.X.[Rui-Xin],
Song, Y.Q.[Yang-Qiu],
Zhang, C.S.[Chang-Shui],
VD-PCR: Improving visual dialog with pronoun coreference resolution,
PR(125), 2022, pp. 108540.
Elsevier DOI
2203
Vision and language, Visual dialog, Pronoun coreference resolution
BibRef
Yuan, Y.T.[Yi-Tian],
Ma, L.[Lin],
Wang, J.W.[Jing-Wen],
Liu, W.[Wei],
Zhu, W.W.[Wen-Wu],
Semantic Conditioned Dynamic Modulation for Temporal Sentence
Grounding in Videos,
PAMI(44), No. 5, May 2022, pp. 2725-2741.
IEEE DOI
2204
Videos, Grounding, Semantics, Proposals, Task analysis, Convolution,
Visualization, Temporal sentence grounding in videos (TSG),
temporal convolution
BibRef
He, S.[Su],
Yang, X.F.[Xiao-Feng],
Lin, G.S.[Guo-Sheng],
Learning language to symbol and language to vision mapping for visual
grounding,
IVC(122), 2022, pp. 104451.
Elsevier DOI
2205
Cross modality, Visual grounding, Neural symbolic reasoning
BibRef
Jiang, W.H.[Wen-Hui],
Zhu, M.[Minwei],
Fang, Y.M.[Yu-Ming],
Shi, G.M.[Guang-Ming],
Zhao, X.W.[Xiao-Wei],
Liu, Y.[Yang],
Visual Cluster Grounding for Image Captioning,
IP(31), 2022, pp. 3920-3934.
IEEE DOI
2206
Grounding, Visualization, Proposals, Annotations, Transformers,
Task analysis, Decoding, Image captioning, attention evaluation,
grounding supervision
BibRef
Liao, Y.[Yue],
Zhang, A.[Aixi],
Chen, Z.Y.[Zhi-Yuan],
Hui, T.R.[Tian-Rui],
Liu, S.[Si],
Progressive Language-Customized Visual Feature Learning for One-Stage
Visual Grounding,
IP(31), 2022, pp. 4266-4277.
IEEE DOI
2207
Visualization, Feature extraction, Grounding, Linguistics,
Task analysis, Detectors, Representation learning,
cross-modal fusion
BibRef
Ding, X.P.[Xin-Peng],
Wang, N.N.[Nan-Nan],
Zhang, S.W.[Shi-Wei],
Huang, Z.Y.[Zi-Yuan],
Li, X.M.[Xiao-Meng],
Tang, M.Q.[Ming-Qian],
Liu, T.L.[Tong-Liang],
Gao, X.B.[Xin-Bo],
Exploring Language Hierarchy for Video Grounding,
IP(31), 2022, pp. 4693-4706.
IEEE DOI
2207
Proposals, Grounding, Training, Location awareness, Task analysis,
Semantics, Feature extraction, Video and language, language hierarchy
BibRef
Wang, Y.[Yuechen],
Deng, J.J.[Jia-Jun],
Zhou, W.G.[Wen-Gang],
Li, H.Q.[Hou-Qiang],
Weakly Supervised Temporal Adjacent Network for Language Grounding,
MultMed(24), 2022, pp. 3276-3286.
IEEE DOI
2207
Grounding, Semantics, Feature extraction, Visualization,
Task analysis, Annotations, Training, Temporal language grounding,
multiple instance learning
BibRef
Xu, Z.[Zhe],
Chen, D.[Da],
Wei, K.[Kun],
Deng, C.[Cheng],
Xue, H.[Hui],
HiSA: Hierarchically Semantic Associating for Video Temporal
Grounding,
IP(31), 2022, pp. 5178-5188.
IEEE DOI
2208
Grounding, Feature extraction, Proposals, Task analysis, Semantics,
Representation learning, Image segmentation,
cross-guided contrast
BibRef
Gao, J.L.[Jia-Lin],
Sun, X.[Xin],
Ghanem, B.[Bernard],
Zhou, X.[Xi],
Ge, S.M.[Shi-Ming],
Efficient Video Grounding With Which-Where Reading Comprehension,
CirSysVideo(32), No. 10, October 2022, pp. 6900-6913.
IEEE DOI
2210
Grounding, Proposals, Visualization, Location awareness,
Task analysis, Reinforcement learning, Germanium, deep learning
BibRef
Zhou, H.[Hao],
Zhang, C.Y.[Chong-Yang],
Luo, Y.[Yan],
Hu, C.P.[Chuan-Ping],
Zhang, W.J.[Wen-Jun],
Thinking Inside Uncertainty: Interest Moment Perception for Diverse
Temporal Grounding,
CirSysVideo(32), No. 10, October 2022, pp. 7190-7203.
IEEE DOI
2210
Annotations, Grounding, Task analysis, Uncertainty, Measurement,
Predictive models, Optimization, Temporal grounding, label uncertainty
BibRef
Tang, Z.H.[Zong-Heng],
Liao, Y.[Yue],
Liu, S.[Si],
Li, G.B.[Guan-Bin],
Jin, X.J.[Xiao-Jie],
Jiang, H.X.[Hong-Xu],
Yu, Q.[Qian],
Xu, D.[Dong],
Human-Centric Spatio-Temporal Video Grounding With Visual
Transformers,
CirSysVideo(32), No. 12, December 2022, pp. 8238-8249.
IEEE DOI
2212
Grounding, Visualization, Electron tubes, Location awareness,
Power transformers, Spatial temporal resolution, dataset
BibRef
Tang, H.Y.[Hao-Yu],
Zhu, J.[Jihua],
Wang, L.[Lin],
Zheng, Q.H.[Qing-Hai],
Zhang, T.W.[Tian-Wei],
Multi-Level Query Interaction for Temporal Language Grounding,
ITS(23), No. 12, December 2022, pp. 25479-25488.
IEEE DOI
2212
Semantics, Task analysis, Grounding, Proposals, Syntactics,
Location awareness, Feature extraction, Human-machine interface,
multi-level interaction
BibRef
Wang, W.[Wei],
Gao, J.Y.[Jun-Yu],
Xu, C.S.[Chang-Sheng],
Weakly-Supervised Video Object Grounding via Causal Intervention,
PAMI(45), No. 3, March 2023, pp. 3933-3948.
IEEE DOI
2302
Grounding, Visualization, Task analysis, Dairy products, Annotations,
Context modeling, Proposals, Weakly-supervised learning,
adversarial contrastive learning
BibRef
Wang, W.[Wei],
Gao, J.Y.[Jun-Yu],
Xu, C.S.[Chang-Sheng],
Weakly-Supervised Video Object Grounding via Learning Uni-Modal
Associations,
MultMed(25), 2023, pp. 6329-6340.
IEEE DOI
2311
BibRef
Nayyeri, M.[Mojtaba],
Xu, C.J.[Cheng-Jin],
Alam, M.M.[Mirza Mohtashim],
Lehmann, J.[Jens],
Yazdi, H.S.[Hamed Shariat],
LogicENN: A Neural Based Knowledge Graphs Embedding Model With
Logical Rules,
PAMI(45), No. 6, June 2023, pp. 7050-7062.
IEEE DOI
2305
Encoding, Grounding, Computational modeling, Analytical models,
Task analysis, Optimization, Predictive models, Knowledge graph,
representation learning
BibRef
Zhao, L.C.[Li-Chen],
Cai, D.G.[Dai-Gang],
Zhang, J.[Jing],
Sheng, L.[Lu],
Xu, D.[Dong],
Zheng, R.[Rui],
Zhao, Y.J.[Yin-Jie],
Wang, L.P.[Li-Peng],
Fan, X.[Xibo],
Toward Explainable 3D Grounded Visual Question Answering: A New
Benchmark and Strong Baseline,
CirSysVideo(33), No. 6, June 2023, pp. 2935-2949.
IEEE DOI
2306
Task analysis, Visualization, Annotations, Point cloud compression,
Solid modeling, Question answering (information retrieval),
vision and language on 3D scenes
BibRef
Zhu, L.J.[Liang-Jun],
Peng, L.[Li],
Zhou, W.N.[Wei-Nan],
Yang, J.[Jielong],
Dual-decoder transformer network for answer grounding in visual
question answering,
PRL(171), 2023, pp. 53-60.
Elsevier DOI
2306
Visual question answering, Answer grounding, Dual-decoder transformer
BibRef
Chen, T.[Tongbao],
Wang, W.[Wenmin],
Han, K.[Kangrui],
Xu, H.J.[Hui-Juan],
SaGCN: Semantic-Aware Graph Calibration Network for Temporal Sentence
Grounding,
CirSysVideo(33), No. 6, June 2023, pp. 3003-3016.
IEEE DOI
2306
Semantics, Grounding, Task analysis, Database languages, Calibration,
TV, Visualization, Temporal sentence grounding,
cross modal
BibRef
Zhang, H.[Hao],
Sun, A.[Aixin],
Jing, W.[Wei],
Zhou, J.T.Y.[Joey Tian-Yi],
Temporal Sentence Grounding in Videos: A Survey and Future Directions,
PAMI(45), No. 8, August 2023, pp. 10443-10465.
IEEE DOI
2307
Videos, Feature extraction, Proposals, Grounding, Task analysis,
Taxonomy, Location awareness, Cross-modal video retrieval,
vision and language
BibRef
Deng, J.J.[Jia-Jun],
Yang, Z.Y.[Zheng-Yuan],
Liu, D.[Daqing],
Chen, T.L.[Tian-Lang],
Zhou, W.G.[Wen-Gang],
Zhang, Y.[Yanyong],
Li, H.Q.[Hou-Qiang],
Ouyang, W.L.[Wan-Li],
TransVG++: End-to-End Visual Grounding With Language Conditioned
Vision Transformer,
PAMI(45), No. 11, November 2023, pp. 13636-13652.
IEEE DOI
2310
BibRef
Earlier: A1, A2, A4, A5, A7, Only:
TransVG: End-to-End Visual Grounding with Transformers,
ICCV21(1749-1759)
IEEE DOI
2203
Visualization, Codes, Grounding, Manuals, Transformers, Cognition,
Vision + language, Vision + other modalities
BibRef
Li, J.C.[Jun-Cheng],
Tang, S.L.[Si-Liang],
Zhu, L.C.[Lin-Chao],
Zhang, W.Q.[Wen-Qiao],
Yang, Y.[Yi],
Chua, T.S.[Tat-Seng],
Wu, F.[Fei],
Zhuang, Y.T.[Yue-Ting],
Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding,
PAMI(45), No. 10, October 2023, pp. 12601-12617.
IEEE DOI
2310
BibRef
Li, J.C.[Jun-Cheng],
Xie, J.L.[Jun-Lin],
Qian, L.[Long],
Zhu, L.C.[Lin-Chao],
Tang, S.L.[Si-Liang],
Wu, F.[Fei],
Yang, Y.[Yi],
Zhuang, Y.T.[Yue-Ting],
Wang, X.E.[Xin Eric],
Compositional Temporal Grounding with Structured Variational
Cross-Graph Correspondence Learning,
CVPR22(3022-3031)
IEEE DOI
2210
Grounding, Current measurement, Computational modeling, Semantics,
Diversity reception, Linguistics,
Vision + language
BibRef
González, C.[Cristina],
Ayobi, N.[Nicolás],
Hernández, I.[Isabela],
Pont-Tuset, J.[Jordi],
Arbeláez, P.[Pablo],
PiGLET:
Pixel-Level Grounding of Language Expressions With Transformers,
PAMI(45), No. 10, October 2023, pp. 12206-12221.
IEEE DOI
2310
BibRef
Zhang, R.S.[Rui-Song],
Wang, C.[Chuang],
Liu, C.L.[Cheng-Lin],
Cycle-Consistent Weakly Supervised Visual Grounding With Individual
and Contextual Representations,
IP(32), 2023, pp. 5167-5180.
IEEE DOI Code:
WWW Link.
2310
BibRef
Wang, Y.[Yan],
Su, Y.T.[Yu-Ting],
Li, W.H.[Wen-Hui],
Xiao, J.[Jun],
Li, X.Y.[Xuan-Ya],
Liu, A.A.[An-An],
Dual-Path Rare Content Enhancement Network for Image and Text
Matching,
CirSysVideo(33), No. 10, October 2023, pp. 6144-6158.
IEEE DOI
2310
BibRef
Xu, Z.[Zhe],
Wei, K.[Kun],
Yang, X.[Xu],
Deng, C.[Cheng],
Point-Supervised Video Temporal Grounding,
MultMed(25), 2023, pp. 6121-6131.
IEEE DOI
2311
BibRef
Luo, F.[Fan],
Chen, S.X.[Shao-Xiang],
Chen, J.J.[Jing-Jing],
Wu, Z.[Zuxuan],
Jiang, Y.G.[Yu-Gang],
Self-Supervised Learning for Semi-Supervised Temporal Language
Grounding,
MultMed(25), 2023, pp. 7747-7757.
IEEE DOI
2312
BibRef
Liu, D.Z.[Dai-Zong],
Fang, X.[Xiang],
Hu, W.[Wei],
Zhou, P.[Pan],
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance
for Temporal Sentence Grounding,
MultMed(25), 2023, pp. 8539-8553.
IEEE DOI
2312
BibRef
Yang, X.F.[Xiao-Feng],
Liu, F.[Fayao],
Lin, G.S.[Guo-Sheng],
Effective End-to-End Vision Language Pretraining With Semantic Visual
Loss,
MultMed(25), 2023, pp. 8408-8417.
IEEE DOI
2312
BibRef
Ma, G.Q.[Guo-Qing],
Bai, Y.[Yalong],
Zhang, W.[Wei],
Yao, T.[Ting],
Shihada, B.[Basem],
Mei, T.[Tao],
Boosting Generic Visual-Linguistic Representation With Dynamic
Contexts,
MultMed(25), 2023, pp. 8445-8457.
IEEE DOI
2312
BibRef
Su, C.[Chao],
Li, Z.[Zhi],
Lei, T.Y.[Tian-Yi],
Peng, D.Z.[De-Zhong],
Wang, X.[Xu],
MetaVG: A Meta-Learning Framework for Visual Grounding,
SPLetters(31), 2024, pp. 236-240.
IEEE DOI
2401
BibRef
Zeng, Y.W.[Ya-Wen],
Han, N.[Ning],
Pan, K.Y.[Ke-Yu],
Jin, Q.[Qin],
Temporally Language Grounding With Multi-Modal Multi-Prompt Tuning,
MultMed(26), 2024, pp. 3366-3377.
IEEE DOI
2402
Task analysis, Grounding, Transformers, Tuning, Visualization,
Semantics, Robustness, Temporally language grounding,
multi-modal understanding
BibRef
Fang, X.[Xiang],
Liu, D.[Daizong],
Zhou, P.[Pan],
Xu, Z.[Zichuan],
Li, R.X.[Rui-Xuan],
Hierarchical Local-Global Transformer for Temporal Sentence Grounding,
MultMed(26), 2024, pp. 3263-3277.
IEEE DOI
2402
Transformers, Semantics, Visualization, Grounding, Task analysis,
Feature extraction, Decoding, Multi-modal representations,
temporal transformer
BibRef
Wang, Z.Y.[Zhi-Yu],
Yang, C.[Chao],
Jiang, B.[Bin],
Yuan, J.S.[Jun-Song],
A Dual Reinforcement Learning Framework for Weakly Supervised Phrase
Grounding,
MultMed(26), 2024, pp. 394-405.
IEEE DOI
2402
Grounding, Task analysis, Training, Reinforcement learning,
Optimization, Image reconstruction, Proposals,
reinforcement learning
BibRef
Lu, Y.[Yu],
Quan, R.J.[Rui-Jie],
Zhu, L.C.[Lin-Chao],
Yang, Y.[Yi],
Zero-Shot Video Grounding With Pseudo Query Lookup and Verification,
IP(33), 2024, pp. 1643-1654.
IEEE DOI
2403
Grounding, Detectors, Proposals, Training, Task analysis, Visualization,
Semantics, Video grounding, zero-shot learning, vision and language
BibRef
Wang, W.K.[Wei-Kang],
Su, Y.T.[Yu-Ting],
Liu, J.[Jing],
Jing, P.G.[Pei-Guang],
Adaptive proposal network based on generative adversarial learning
for weakly supervised temporal sentence grounding,
PRL(179), 2024, pp. 9-16.
Elsevier DOI
2403
Weakly supervised learning, Temporal sentence grounding,
Contrastive generative adversarial learning
BibRef
Liu, M.[Meng],
Zhou, D.[Di],
Guo, J.[Jie],
Luo, X.[Xin],
Gao, Z.[Zan],
Nie, L.Q.[Li-Qiang],
Semantic-Aware Contrastive Learning With Proposal Suppression for
Video Semantic Role Grounding,
CirSysVideo(34), No. 4, April 2024, pp. 3003-3016.
IEEE DOI
2404
Semantics, Grounding, Proposals, Task analysis, Encoding,
Location awareness, Visualization, Video semantic role grounding,
proposal contrastive learning
BibRef
Tang, W.[Wei],
Li, L.[Liang],
Liu, X.J.[Xue-Jing],
Jin, L.[Lu],
Tang, J.H.[Jin-Hui],
Li, Z.C.[Ze-Chao],
Context Disentangling and Prototype Inheriting for Robust Visual
Grounding,
PAMI(46), No. 5, May 2024, pp. 3213-3229.
IEEE DOI
2404
Visualization, Grounding, Prototypes, Transformers, Task analysis,
Linguistics, Feature extraction, Context disentangling,
visual grounding (VG)
BibRef
Shi, F.Y.[Feng-Yuan],
Huang, W.L.[Wei-Lin],
Wang, L.M.[Li-Min],
End-to-end dense video grounding via parallel regression,
CVIU(242), 2024, pp. 103980.
Elsevier DOI
2404
Visual grounding, Dense grounding, Query based detection
BibRef
Shao, R.[Rui],
Wu, T.X.[Tian-Xing],
Wu, J.L.[Jian-Long],
Nie, L.Q.[Li-Qiang],
Liu, Z.W.[Zi-Wei],
Detecting and Grounding Multi-Modal Media Manipulation and Beyond,
PAMI(46), No. 8, August 2024, pp. 5556-5574.
IEEE DOI
2407
BibRef
Earlier: A1, A2, A5, Only:
Detecting and Grounding Multi-Modal Media Manipulation,
CVPR23(6904-6913)
IEEE DOI
2309
Deepfakes, Grounding, Forgery, Cognition, Faces, Semantics,
Visualization, Media manipulation detection, DeepFake detection,
multi-modal learning
BibRef
Chen, L.[Lei],
Deng, Z.[Zhen],
Liu, L.[Libo],
Yin, S.[Shibai],
Multilevel Semantic Interaction Alignment for Video-Text Cross-Modal
Retrieval,
CirSysVideo(34), No. 7, July 2024, pp. 6559-6575.
IEEE DOI Code:
WWW Link.
2407
Semantics, Feature extraction, Video recording, Correlation,
Task analysis, Object detection, Noise measurement,
attention mechanism
BibRef
Zhang, T.[Tong],
Lu, X.K.[Xian-Kai],
Zhang, H.[Hao],
Nie, X.S.[Xiu-Shan],
Yin, Y.L.[Yi-Long],
Shen, J.B.[Jian-Bing],
Relational Network via Cascade CRF for Video Language Grounding,
MultMed(26), 2024, pp. 8297-8311.
IEEE DOI
2408
Proposals, Task analysis, Grounding, Semantics,
Conditional random fields, Location awareness, Indexes,
proposal free
BibRef
Wu, Q.Q.[Qing-Qing],
Guo, L.J.[Li-Jun],
Zhang, R.[Rong],
Qian, J.B.[Jiang-Bo],
Gao, S.[Shangce],
QSMT-net: A query-sensitive proposal and multi-temporal-span matching
network for video grounding,
IVC(149), 2024, pp. 105188.
Elsevier DOI
2408
Video grounding, Multi-modal feature fusion, Cross-attention modeling
BibRef
Yao, H.B.[Hai-Bo],
Wang, L.P.[Li-Ping],
Cai, C.T.[Cheng-Tao],
Wang, W.[Wei],
Zhang, Z.[Zhi],
Shang, X.B.[Xia-Bing],
Language conditioned multi-scale visual attention networks for visual
grounding,
IVC(150), 2024, pp. 105242.
Elsevier DOI
2409
Deep learning, Visual grounding, Referring expression,
Vision and language, Multi-scale visual attention, Transformer network
BibRef
Wu, W.[Wansen],
Cao, M.[Meng],
Hu, Y.[Yue],
Peng, Y.[Yong],
Qin, L.[Long],
Yin, Q.[Quanjun],
Visual Grounding With Dual Knowledge Distillation,
CirSysVideo(34), No. 10, October 2024, pp. 10399-10410.
IEEE DOI
2411
Visualization, Task analysis, Semantics, Grounding,
Feature extraction, Location awareness, Proposals,
knowledge distillation
BibRef
Li, S.T.[Shu-Tao],
Li, B.[Bin],
Sun, B.[Bin],
Weng, Y.X.[Yi-Xuan],
Towards Visual-Prompt Temporal Answer Grounding in Instructional
Video,
PAMI(46), No. 12, December 2024, pp. 8836-8853.
IEEE DOI
2411
Visualization, Task analysis, Thyroid, Feature extraction, Semantics,
Grounding, Location awareness, Instructional video,
visual prompt
BibRef
Fang, X.[Xiang],
Xiong, Z.[Zeyu],
Fang, W.L.[Wan-Long],
Qu, X.Y.[Xiao-Ye],
Chen, C.[Chen],
Dongd, J.F.[Jian-Feng],
Tang, K.[Keke],
Zhou, P.[Pan],
Cheng, Y.[Yu],
Liu, D.Z.[Dai-Zong],
Rethinking Weakly-supervised Video Temporal Grounding From a Game
Perspective,
ECCV24(XLV: 290-311).
Springer DOI
2412
BibRef
Xiong, Z.[Zeyu],
Liu, D.Z.[Dai-Zong],
Fang, X.[Xiang],
Qu, X.Y.[Xiao-Ye],
Dong, J.F.[Jian-Feng],
Zhu, J.H.[Jia-Hao],
Tang, K.[Keke],
Zhou, P.[Pan],
Rethinking Video Sentence Grounding from a Tracking Perspective With
Memory Network and Masked Attention,
MultMed(26), 2024, pp. 11204-11218.
IEEE DOI
2412
Target tracking, Semantics, Task analysis, Object tracking,
Grounding, Feature extraction, Visualization, Cross-modal, VSG
BibRef
Qi, Z.B.[Zhao-Bo],
Yuan, Y.[Yibo],
Ruan, X.W.[Xiao-Wen],
Wang, S.H.[Shu-Hui],
Zhang, W.G.[Wei-Gang],
Huang, Q.M.[Qing-Ming],
Collaborative Debias Strategy for Temporal Sentence Grounding in
Video,
CirSysVideo(34), No. 11, November 2024, pp. 10972-10986.
IEEE DOI Code:
WWW Link.
2412
Grounding, Visualization, Task analysis, Predictive models,
Proposals, Data models, Training,
combinatorial bias
BibRef
Zhou, S.[Siyu],
Zhang, F.[Fuwei],
Wang, R.M.[Ruo-Mei],
Zhou, F.[Fan],
Su, Z.[Zhuo],
Subtask Prior-Driven Optimized Mechanism on Joint Video Moment
Retrieval and Highlight Detection,
CirSysVideo(34), No. 11, November 2024, pp. 11271-11285.
IEEE DOI
2412
Task analysis, Interference, 3G mobile communication,
Adaptation models, Correlation, Training,
cross-modal interaction
BibRef
Ji, Z.[Zhong],
Wu, J.[Jiahe],
Wang, Y.[Yaodong],
Yang, A.[Aiping],
Han, J.G.[Jun-Gong],
Progressive Semantic Reconstruction Network for Weakly Supervised
Referring Expression Grounding,
CirSysVideo(34), No. 12, December 2024, pp. 13058-13070.
IEEE DOI Code:
WWW Link.
2501
Image reconstruction, Semantics, Training, Grounding, Proposals,
Detectors, Visualization, Referring expression grounding,
progressive semantic reconstruction
BibRef
Dong, J.X.[Jian-Xiang],
Yin, Z.Z.[Zhao-Zheng],
Graph-based Dense Event Grounding with relative positional encoding,
CVIU(251), 2025, pp. 104257.
Elsevier DOI
2501
Dense Event Grounding, Temporal sentence grounding,
Video grounding, Relative positional encoding
BibRef
Tang, K.F.[Ke-Fan],
He, L.H.[Li-Huo],
Wang, N.N.[Nan-Nan],
Gao, X.B.[Xin-Bo],
Dual Semantic Reconstruction Network for Weakly Supervised Temporal
Sentence Grounding,
MultMed(27), 2025, pp. 95-107.
IEEE DOI
2501
Proposals, Grounding, Feature extraction, Image reconstruction,
Annotations, Semantics, Training, Information processing, Decoding,
consistency constraint
BibRef
Li, X.[Xiang],
Qiu, K.[Kai],
Wang, J.L.[Jing-Lu],
Xu, X.H.[Xiao-Hao],
Singh, R.[Rita],
Yamazaki, K.[Kashu],
Chen, H.[Hao],
Huang, X.N.[Xiao-Nan],
Raj, B.[Bhiksha],
R^2-Bench: Benchmarking the Robustness of Referring Perception Models
Under Perturbations,
ECCV24(IX: 211-230).
Springer DOI
2412
BibRef
Lee, P.[Pilhyeon],
Byun, H.R.[Hye-Ran],
BAM-DETR: Boundary-aligned Moment Detection Transformer for Temporal
Sentence Grounding in Videos,
ECCV24(II: 220-238).
Springer DOI
2412
BibRef
Ma, C.[Chuofan],
Jiang, Y.[Yi],
Wu, J.N.[Jian-Nan],
Yuan, Z.H.[Ze-Huan],
Qi, X.J.[Xiao-Juan],
GROMA: Localized Visual Tokenization for Grounding Multimodal Large
Language Models,
ECCV24(VI: 417-435).
Springer DOI
2412
BibRef
Huang, Z.[Ziling],
Satoh, S.[Shin'ichi],
LOA-TRANS: Enhancing Visual Grounding by Location-aware Transformers,
ECCV24(VII: 405-421).
Springer DOI
2412
BibRef
Zhu, C.[Chenming],
Wang, T.[Tai],
Zhang, W.W.[Wen-Wei],
Chen, K.[Kai],
Liu, X.H.[Xi-Hui],
SCANREASON: Empowering 3d Visual Grounding with Reasoning Capabilities,
ECCV24(VIII: 151-168).
Springer DOI
2412
BibRef
Xiao, Z.[Zilin],
Gong, M.[Ming],
Cascante-Bonilla, P.[Paola],
Zhang, X.Y.[Xing-Yao],
Wu, J.[Jie],
Ordonez, V.[Vicente],
Grounding Language Models for Visual Entity Recognition,
ECCV24(XI: 393-411).
Springer DOI
2412
BibRef
Cheng, Z.X.[Zi-Xu],
Pu, Y.J.[Yu-Jiang],
Gong, S.G.[Shao-Gang],
Kordjamshidi, P.[Parisa],
Kong, Y.[Yu],
Shine: Saliency-aware Hierarchical Negative Ranking for Compositional
Temporal Grounding,
ECCV24(XIX: 398-416).
Springer DOI
2412
BibRef
Lee, P.Y.[Phillip Y.],
Sung, M.[Minhyuk],
Reground: Improving Textual and Spatial Grounding at No Cost,
ECCV24(XXIII: 275-292).
Springer DOI
2412
BibRef
Jiang, H.B.[Hao-Bin],
Lu, Z.Q.[Zong-Qing],
Visual Grounding for Object-level Generalization in Reinforcement
Learning,
ECCV24(XXX: 55-72).
Springer DOI
2412
BibRef
Sun, P.L.[Peng-Lei],
Song, Y.X.[Yao-Xian],
Pan, X.[Xinglin],
Kang, W.T.[Wei-Tai],
Liu, G.[Gaowen],
Shah, M.[Mubarak],
Yan, Y.[Yan],
SEGVG: Transferring Object Bounding Box to Segmentation for Visual
Grounding,
ECCV24(XXXVIII: 57-75).
Springer DOI
2412
BibRef
Kang, D.[Dahyun],
Cho, M.[Minsu],
In Defense of Lazy Visual Grounding for Open-vocabulary Semantic
Segmentation,
ECCV24(XLI: 143-164).
Springer DOI
2412
BibRef
Liu, Y.[Ye],
He, J.[Jixuan],
Li, W.[Wanhua],
Kim, J.[Junsik],
Wei, D.L.[Dong-Lai],
Pfister, H.[Hanspeter],
Chen, C.W.[Chang Wen],
R^1-tuning: Efficient Image-to-video Transfer Learning for Video
Temporal Grounding,
ECCV24(XLI: 421-438).
Springer DOI
2412
BibRef
Zhang, H.[Hao],
Li, H.Y.[Hong-Yang],
Li, F.[Feng],
Ren, T.[Tianhe],
Zou, X.[Xueyan],
Liu, S.[Shilong],
Huang, S.J.[Shi-Jia],
Gao, J.F.[Jian-Feng],
Zhang, L.[Lei],
Li, C.Y.[Chun-Yuan],
Yang, J.W.[Jain-Wei],
LLAVA-Grounding: Grounded Visual Chat with Large Multimodal Models,
ECCV24(XLIII: 19-35).
Springer DOI
2412
BibRef
Yang, J.[Jihan],
Ding, R.[Runyu],
Brown, E.[Ellis],
Qi, X.J.[Xiao-Juan],
Xie, S.[Saining],
V-IRL: Grounding Virtual Intelligence in Real Life,
ECCV24(XLV: 36-55).
Springer DOI
2412
BibRef
Chen, W.[Wei],
Chen, L.[Long],
Wu, Y.[Yu],
An Efficient and Effective Transformer Decoder-based Framework for
Multi-task Visual Grounding,
ECCV24(XLV: 125-141).
Springer DOI
2412
BibRef
Qian, Z.P.[Zhi-Peng],
Ma, Y.W.[Yi-Wei],
Lin, Z.K.[Zhe-Kai],
Ji, J.Y.[Jia-Yi],
Zheng, X.[Xiawu],
Sun, X.S.[Xiao-Shuai],
Ji, R.R.[Rong-Rong],
Multi-branch Collaborative Learning Network for 3d Visual Grounding,
ECCV24(XLVI: 381-398).
Springer DOI
2412
BibRef
Liu, S.[Shilong],
Zeng, Z.Y.[Zhao-Yang],
Ren, T.[Tianhe],
Li, F.[Feng],
Zhang, H.[Hao],
Yang, J.[Jie],
Jiang, Q.[Qing],
Li, C.Y.[Chun-Yuan],
Yang, J.W.[Jian-Wei],
Su, H.[Hang],
Zhu, J.[Jun],
Zhang, L.[Lei],
Grounding Dino: Marrying Dino with Grounded Pre-training for Open-set
Object Detection,
ECCV24(XLVII: 38-55).
Springer DOI
2412
BibRef
Jin, Y.[Yang],
Mu, Y.D.[Ya-Dong],
Weakly-supervised Spatio-temporal Video Grounding with Variational
Cross-modal Alignment,
ECCV24(XLVIII: 412-429).
Springer DOI
2412
BibRef
Fujiwara, K.[Kent],
Tanaka, M.[Mikihiro],
Yu, Q.[Qing],
Chronologically Accurate Retrieval for Temporal Grounding of
Motion-language Models,
ECCV24(LVIII: 323-339).
Springer DOI
2412
BibRef
Yan, S.[Siming],
Bai, M.[Min],
Chen, W.F.[Wei-Feng],
Zhou, X.[Xiong],
Huang, Q.X.[Qi-Xing],
Li, L.E.[Li Erran],
Vigor: Improving Visual Grounding of Large Vision Language Models with
Fine-grained Reward Modeling,
ECCV24(LXI: 37-53).
Springer DOI
2412
BibRef
Chowdhury, S.[Sanjoy],
Nag, S.[Sayan],
Dasgupta, S.[Subhrajyoti],
Chen, J.[Jun],
Elhoseiny, M.[Mohamed],
Gao, R.H.[Ruo-Han],
Manocha, D.[Dinesh],
Meerkat: Audio-visual Large Language Model for Grounding in Space and
Time,
ECCV24(LXIV: 52-70).
Springer DOI
2412
BibRef
Leroy, V.[Vincent],
Cabon, Y.[Yohann],
Revaud, J.[Jerome],
Grounding Image Matching in 3d with Mast3r,
ECCV24(LXXII: 71-91).
Springer DOI
2412
BibRef
Unal, O.[Ozan],
Sakaridis, C.[Christos],
Saha, S.[Suman],
Van Gool, L.J.[Luc J.],
Four Ways to Improve Verbo-visual Fusion for Dense 3d Visual Grounding,
ECCV24(LXXVI: 196-213).
Springer DOI
2412
BibRef
Wan, D.[David],
Cho, J.[Jaemin],
Stengel-Eskin, E.[Elias],
Bansal, M.[Mohit],
Contrastive Region Guidance: Improving Grounding in Vision-language
Models Without Training,
ECCV24(LXXIX: 198-215).
Springer DOI
2412
BibRef
Zheng, M.H.[Ming-Hang],
Cai, X.H.[Xin-Hao],
Chen, Q.C.[Qing-Chao],
Peng, Y.X.[Yu-Xin],
Liu, Y.[Yang],
Training-Free Video Temporal Grounding Using Large-Scale Pre-Trained
Models,
ECCV24(LXXXII: 20-37).
Springer DOI
2412
BibRef
Bao, P.J.[Pei-Jun],
Shao, Z.[Zihao],
Yang, W.H.[Wen-Han],
Ng, B.P.[Boon Poh],
Kot, A.C.[Alex C.],
E3m: Zero-shot Spatio-temporal Video Grounding with
Expectation-maximization Multimodal Modulation,
ECCV24(LXXXIII: 227-243).
Springer DOI
2412
BibRef
Dong, P.J.[Pei-Jie],
Yang, X.F.[Xiao-Fei],
Wang, Q.[Qiang],
Li, Z.X.[Zhi-Xu],
Li, T.[Tiefeng],
Chu, X.W.[Xiao-Wen],
Multi-task Domain Adaptation for Language Grounding with 3d Objects,
ECCV24(XXXIV: 387-404).
Springer DOI
2412
BibRef
Hannan, T.[Tanveer],
Islam, M.M.[Md Mohaiminul],
Seidl, T.[Thomas],
Bertasius, G.[Gedas],
RGNET: A Unified Clip Retrieval and Grounding Network for Long Videos,
ECCV24(XXI: 352-369).
Springer DOI
2412
BibRef
Khoshsirat, S.[Seyedalireza],
Kambhamettu, C.[Chandra],
Embedding Attention Blocks for Answer Grounding,
ICIP24(521-527)
IEEE DOI
2411
Visualization, Limiting, Filters, Grounding, Semantics, Proposals,
Answer Grounding, Deep Learning
BibRef
Hamilton, M.[Mark],
Zisserman, A.[Andrew],
Hershey, J.R.[John R.],
Freeman, W.T.[William T.],
Separating the 'Chirp' from the 'Chat':
Self-supervised Visual Grounding of Sound and Language,
CVPR24(13117-13127)
IEEE DOI Code:
WWW Link.
2410
Location awareness, Visualization, Grounding,
Semantic segmentation, Computational modeling, object discovery
BibRef
Shen, Y.H.[Yun-Hang],
Fu, C.Y.[Chao-You],
Chen, P.X.[Pei-Xian],
Zhang, M.[Mengdan],
Li, K.[Ke],
Sun, X.[Xing],
Wu, Y.S.[Yun-Sheng],
Lin, S.H.[Shao-Hui],
Ji, R.R.[Rong-Rong],
Aligning and Prompting Everything All at Once for Universal Visual
Perception,
CVPR24(13193-13203)
IEEE DOI Code:
WWW Link.
2410
Image segmentation, Vocabulary, Visualization, Grounding,
Annotations, Machine vision, Object detection
BibRef
Wu, T.H.[Tsung-Han],
Biamby, G.[Giscard],
Chan, D.[David],
Dunlap, L.[Lisa],
Gupta, R.[Ritwik],
Wang, X.D.[Xu-Dong],
Gonzalez, J.E.[Joseph E.],
Darrell, T.J.[Trevor J.],
See, Say, and Segment: Teaching LMMs to Overcome False Premises,
CVPR24(13459-13469)
IEEE DOI
2410
Training, Image segmentation, Grounding, Semantics,
Natural languages, Benchmark testing
BibRef
Wang, Y.[Yuan],
Li, Y.[Yali],
Wang, S.J.[Sheng-Jin],
G3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric
Modeling for 3D Visual Grounding,
CVPR24(13917-13926)
IEEE DOI
2410
Visualization, Solid modeling, Adaptation models, Grounding,
Semantics, Syntactics
BibRef
Rizve, M.N.[Mamshad Nayeem],
Fei, F.[Fan],
Unnikrishnan, J.[Jayakrishnan],
Tran, S.[Son],
Yao, B.Z.[Benjamin Z.],
Zeng, B.[Belinda],
Shah, M.[Mubarak],
Chilimbi, T.[Trishul],
VidLA: Video-Language Alignment at Scale,
CVPR24(14043-14055)
IEEE DOI
2410
Visualization, Image resolution, Grounding, Semantics, Training data,
Computer architecture, Benchmark testing
BibRef
Shi, X.X.[Xiang-Xi],
Wu, Z.H.[Zhong-Hua],
Lee, S.[Stefan],
Viewpoint-Aware Visual Grounding in 3D Scenes,
CVPR24(14056-14065)
IEEE DOI
2410
Visualization, Solid modeling, Grounding, Annotations,
Predictive models, Data augmentation, 3D point cloud, visual grounding
BibRef
Feng, C.J.[Cheng-Jian],
Zhong, Y.J.[Yu-Jie],
Jie, Z.Q.[Ze-Qun],
Xie, W.[Weidi],
Ma, L.[Lin],
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset,
CVPR24(14121-14130)
IEEE DOI
2410
Training, Visualization, Head, Grounding, Pipelines, Detectors, Object detection
BibRef
Chang, C.P.[Chun-Peng],
Wang, S.X.[Shao-Xiang],
Pagani, A.[Alain],
Stricker, D.[Didier],
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual
Grounding,
CVPR24(14131-14140)
IEEE DOI Code:
WWW Link.
2410
Visualization, Solid modeling, Accuracy, Grounding,
Computational modeling, Source coding, 3D Visual Grounding,
Cross-modal Understanding
BibRef
Wang, S.[Sai],
Lin, Y.T.[Yu-Tian],
Wu, Y.[Yu],
Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual
Grounding,
CVPR24(14261-14270)
IEEE DOI
2410
Visualization, Accuracy, Grounding, Annotations, Manuals, visual grounding
BibRef
Favero, A.[Alessandro],
Zancato, L.[Luca],
Trager, M.[Matthew],
Choudhary, S.[Siddharth],
Perera, P.[Pramuditha],
Achille, A.[Alessandro],
Swaminathan, A.[Ashwin],
Soatto, S.[Stefano],
Multi-Modal Hallucination Control by Visual Information Grounding,
CVPR24(14303-14312)
IEEE DOI
2410
Training, Visualization, Grounding, Linguistics, Sampling methods,
Inference algorithms, Vision, language, reasoning
BibRef
Shen, Y.H.[Yu-Han],
Wang, H.Y.[Hui-Yu],
Yang, X.T.[Xi-Tong],
Feiszli, M.[Matt],
Elhamifar, E.[Ehsan],
Torresani, L.[Lorenzo],
Mavroudi, E.[Effrosyni],
Learning to Segment Referred Objects from Narrated Egocentric Videos,
CVPR24(14510-14520)
IEEE DOI
2410
Training, Image segmentation, Grounding, Annotations,
Object segmentation, Benchmark testing
BibRef
Xu, C.[Can],
Han, Y.H.[Yue-Hui],
Xu, R.[Rui],
Hui, L.[Le],
Xie, J.[Jin],
Yang, J.[Jian],
Multi-Attribute Interactions Matter for 3D Visual Grounding,
CVPR24(17253-17262)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Grounding, Computational modeling, Feature extraction
BibRef
Gu, X.[Xin],
Fan, H.[Heng],
Huang, Y.[Yan],
Luo, T.J.[Tie-Jian],
Zhang, L.[Libo],
Context-Guided Spatio-Temporal Video Grounding,
CVPR24(18330-18339)
IEEE DOI Code:
WWW Link.
2410
Location awareness, Degradation, Visualization, Codes, Grounding,
Computer architecture, spatio-temporal video grounding,
instance context learning
BibRef
Chen, B.[Brian],
Shvetsova, N.[Nina],
Rouditchenko, A.[Andrew],
Kondermann, D.[Daniel],
Thomas, S.[Samuel],
Chang, S.F.[Shih-Fu],
Feris, R.[Rogerio],
Glass, J.[James],
Kuehne, H.[Hilde],
What, When, and Where? Self-Supervised Spatio- Temporal Grounding in
Untrimmed Multi-Action Videos from Narrated Instructions,
CVPR24(18419-18429)
IEEE DOI
2410
Representation learning, Grounding, Annotations, Benchmark testing,
Encoding, Self-supervised learning, Grounding
BibRef
Xiao, Y.C.[Yi-Cheng],
Luo, Z.[Zhuoyan],
Liu, Y.[Yong],
Ma, Y.[Yue],
Bian, H.[Hengwei],
Ji, Y.[Yatai],
Yang, Y.[Yujiu],
Li, X.[Xiu],
Bridging the Gap: A Unified Video Comprehension Framework for Moment
Retrieval and Highlight Detection,
CVPR24(18709-18719)
IEEE DOI Code:
WWW Link.
2410
Video on demand, Codes, Grounding, Computational modeling,
Contrastive learning, Transformers, Video Moment Retrieval, Highlight Detection
BibRef
Wasim, S.T.[Syed Talal],
Naseer, M.[Muzammal],
Khan, S.[Salman],
Yang, M.H.[Ming-Hsuan],
Khan, F.S.[Fahad Shahbaz],
VideoGrounding-DINO: Towards Open-Vocabulary Spatio- Temporal Video
Grounding,
CVPR24(18909-18918)
IEEE DOI
2410
Visualization, Adaptation models, Vocabulary, Grounding, Semantics,
Natural languages, Training data, Video Grounding, Open Vocabulary,
MultiModal
BibRef
Shao, Y.Y.[Yan-Yan],
He, S.T.[Shu-Ting],
Ye, Q.[Qi],
Feng, Y.C.[Yu-Chao],
Luo, W.H.[Wen-Han],
Chen, J.M.[Ji-Ming],
Context-Aware Integration of Language and Visual References for
Natural Language Tracking,
CVPR24(19208-19217)
IEEE DOI Code:
WWW Link.
2410
Visualization, Target tracking, Grounding, Video sequences, Merging,
Modulation, Linguistics, natural language tracking,
visual object tracking
BibRef
Tao, M.,
Bai, B.[Bing],
Lin, H.Z.[Hao-Zhe],
Wang, H.[Heyuan],
Wang, Y.[Yu],
Luo, L.[Lin],
Fang, L.[Lu],
When Visual Grounding Meets Gigapixel-Level Large-Scale Scenes:
Benchmark and Approach,
CVPR24(22119-22128)
IEEE DOI
2410
Visualization, Grounding, Computational modeling,
Natural languages, Imaging, Benchmark testing
BibRef
Chng, Y.X.[Yong Xien],
Zheng, H.[Henry],
Han, Y.Z.[Yi-Zeng],
Qiu, X.[Xuchong],
Huang, G.[Gao],
Mask Grounding for Referring Image Segmentation,
CVPR24(26563-26573)
IEEE DOI
2410
Training, Visualization, Image segmentation, Grounding,
Magnetic resonance imaging, Benchmark testing
BibRef
Chen, K.[Kang],
Wu, X.Q.[Xiang-Qian],
VTQA: Visual Text Question Answering via Entity Alignment and
Cross-Media Reasoning,
CVPR24(27208-27217)
IEEE DOI Code:
WWW Link.
2410
Measurement, Visualization, Grounding, Computational modeling,
Natural languages, Fitting, Object detection,
dataset
BibRef
Kuckreja, K.[Kartik],
Danish, M.S.[Muhammad Sohail],
Naseer, M.[Muzammal],
Das, A.[Abhijit],
Khan, S.[Salman],
Khan, F.S.[Fahad Shahbaz],
GeoChat: Grounded Large Vision-Language Model for Remote Sensing,
CVPR24(27831-27840)
IEEE DOI
2410
Visualization, Scene classification, Grounding, Oral communication,
Object detection, Benchmark testing, Data models
BibRef
Shah, N.A.[Nisarg A.],
VS, V.[Vibashan],
Patel, V.M.[Vishal M.],
LQMFormer: Language-Aware Query Mask Transformer for Referring Image
Segmentation,
CVPR24(12903-12913)
IEEE DOI
2410
Image segmentation, Visualization, Grounding,
Computational modeling, Benchmark testing, Transformers, Multimodal
BibRef
Wang, W.X.[Wen-Xuan],
Yue, T.T.[Tong-Tian],
Zhang, Y.[Yisi],
Guo, L.T.[Long-Teng],
He, X.J.[Xing-Jian],
Wang, X.L.[Xin-Long],
Liu, J.[Jing],
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring
Expression Segmentation,
CVPR24(12998-13008)
IEEE DOI Code:
WWW Link.
2410
Visualization, Image segmentation, Grounding, Natural languages,
Manuals, Benchmark testing
BibRef
Rasheed, H.[Hanoona],
Maaz, M.[Muhammad],
Shaji, S.[Sahal],
Shaker, A.[Abdelrahman],
Khan, S.[Salman],
Cholakkal, H.[Hisham],
Anwer, R.M.[Rao M.],
Xing, E.[Eric],
Yang, M.H.[Ming-Hsuan],
Khan, F.S.[Fahad S.],
GLaMM: Pixel Grounding Large Multimodal Model,
CVPR24(13009-13018)
IEEE DOI
2410
Image segmentation, Visualization, Protocols, Grounding,
Computational modeling, Pipelines, Natural languages,
VLM
BibRef
Zhang, Y.Q.[Yu-Qi],
Luo, H.[Han],
Lei, Y.J.[Yig-Jie],
Towards CLIP-Driven Language-Free 3D Visual Grounding via 2D-3D
Relational Enhancement and Consistency,
CVPR24(13063-13072)
IEEE DOI Code:
WWW Link.
2410
Training, Bridges, Visualization, Solid modeling, Grounding,
Annotations, Language-free training, 3D visual grounding, CLIP
BibRef
Zhang, C.[Chao],
Li, M.[Mohan],
Budvytis, I.[Ignas],
Liwicki, S.[Stephan],
DiaLoc: An Iterative Approach to Embodied Dialog Localization,
CVPR24(12585-12593)
IEEE DOI
2410
Location awareness, Visualization, Navigation, Fuses, Collaboration,
Data visualization, multimodal learning, embodied AI, localization,
dialog grounding
BibRef
Xiao, B.[Bin],
Wu, H.P.[Hai-Ping],
Xu, W.J.[Wei-Jian],
Dai, X.Y.[Xi-Yang],
Hu, H.D.[Hou-Dong],
Lu, Y.[Yumao],
Zeng, M.[Michael],
Liu, C.[Ce],
Yuan, L.[Lu],
Florence-2: Advancing a Unified Representation for a Variety of
Vision Tasks,
CVPR24(4818-4829)
IEEE DOI
2410
Visualization, Grounding, Annotations, Semantics, Transfer learning,
Object detection
BibRef
Qian, S.Y.[Sheng-Yi],
Chen, W.F.[Wei-Feng],
Bai, M.[Min],
Zhou, X.[Xiong],
Tu, Z.W.[Zhuo-Wen],
Li, L.E.[Li Erran],
AffordanceLLM: Grounding Affordance from Vision Language Models,
OpenSUN3D24(7587-7597)
IEEE DOI
2410
Training, Location awareness, Grounding, Shape, Affordances, Performance gain
BibRef
Di, S.Z.[Shang-Zhe],
Xie, W.[Weidi],
Grounded Question-Answering in Long Egocentric Videos,
CVPR24(12934-12943)
IEEE DOI
2410
Visualization, Grounding, Large language models, Pipelines,
Training data, Benchmark testing, Data models, egocentric vision,
video grounding
BibRef
Miyanishi, T.[Taiki],
Azuma, D.[Daichi],
Kurita, S.[Shuhei],
Kawanabe, M.[Motoaki],
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans,
3DV24(717-727)
IEEE DOI Code:
WWW Link.
2408
Training, Visualization, Solid modeling, Grounding, Semantics,
Benchmark testing, 3D Visual Grounding
BibRef
Gong, R.[Ran],
Huang, J.Y.[Jiang-Yong],
Zhao, Y.Z.[Yi-Zhou],
Geng, H.R.[Hao-Ran],
Gao, X.F.[Xiao-Feng],
Wu, Q.Y.[Qing-Yang],
Ai, W.[Wensi],
Zhou, Z.H.[Zi-Heng],
Terzopoulos, D.[Demetri],
Zhu, S.C.[Song-Chun],
Jia, B.X.[Bao-Xiong],
Huang, S.Y.[Si-Yuan],
ARNOLD: A Benchmark for Language-Grounded Task Learning With
Continuous States in Realistic 3D Scenes,
ICCV23(20426-20438)
IEEE DOI Code:
WWW Link.
2401
BibRef
Wu, Y.[Yu],
Wei, Y.[Yana],
Wang, H.Z.[Hao-Zhe],
Liu, Y.F.[Yong-Fei],
Yang, S.[Sibei],
He, X.M.[Xu-Ming],
Grounded Image Text Matching with Mismatched Relation Reasoning,
ICCV23(2964-2975)
IEEE DOI Code:
WWW Link.
2401
BibRef
Song, C.H.[Chan Hee],
Sadler, B.M.[Brian M.],
Wu, J.[Jiaman],
Chao, W.L.[Wei-Lun],
Washington, C.[Clayton],
Su, Y.[Yu],
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with
Large Language Models,
ICCV23(2986-2997)
IEEE DOI
2401
BibRef
Lee, C.[Clarence],
Kumar, M.G.[M Ganesh],
Tan, C.[Cheston],
DetermiNet: A Large-Scale Diagnostic Dataset for Complex
Visually-Grounded Referencing using Determiners,
ICCV23(19962-19971)
IEEE DOI
2401
BibRef
Lin, K.Q.[Kevin Qinghong],
Zhang, P.[Pengchuan],
Chen, J.[Joya],
Pramanick, S.[Shraman],
Gao, D.F.[Di-Fei],
Wang, A.J.P.[Alex Jin-Peng],
Yan, R.[Rui],
Shou, M.Z.[Mike Zheng],
UniVTG: Towards Unified Video-Language Temporal Grounding,
ICCV23(2782-2792)
IEEE DOI Code:
WWW Link.
2401
BibRef
Liu, Y.[Yang],
Zhang, J.H.[Jia-Hua],
Chen, Q.C.[Qing-Chao],
Peng, Y.X.[Yu-Xin],
Confidence-aware Pseudo-label Learning for Weakly Supervised Visual
Grounding,
ICCV23(2816-2826)
IEEE DOI Code:
WWW Link.
2401
BibRef
Khoshsirat, S.[Seyedalireza],
Kambhamettu, C.[Chandra],
Sentence Attention Blocks for Answer Grounding,
ICCV23(6057-6067)
IEEE DOI
2401
BibRef
Li, H.X.[Hong-Xiang],
Cao, M.[Meng],
Cheng, X.[Xuxin],
Li, Y.[Yaowei],
Zhu, Z.H.[Zhi-Hong],
Zou, Y.X.[Yue-Xian],
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic
and Game Theory,
ICCV23(11998-12008)
IEEE DOI
2401
BibRef
Li, H.[Hanjun],
Shu, X.J.[Xiu-Jun],
He, S.[Sunan],
Qiao, R.Z.[Rui-Zhi],
Wen, W.[Wei],
Guo, T.[Taian],
Gan, B.[Bei],
Sun, X.[Xing],
D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with
Glance Annotation,
ICCV23(13688-13700)
IEEE DOI Code:
WWW Link.
2401
BibRef
Pan, Y.L.[Yu-Lin],
He, X.T.[Xiang-Teng],
Gong, B.[Biao],
Lv, Y.L.[Yi-Liang],
Shen, Y.J.[Yu-Jun],
Peng, Y.X.[Yu-Xin],
Zhao, D.L.[De-Li],
Scanning Only Once: An End-to-end Framework for Fast Temporal
Grounding in Long Videos,
ICCV23(13721-13731)
IEEE DOI Code:
WWW Link.
2401
BibRef
Jang, J.[Jinhyun],
Park, J.[Jungin],
Kim, J.[Jin],
Kwon, H.[Hyeongjun],
Sohn, K.H.[Kwang-Hoon],
Knowing Where to Focus: Event-aware Transformer for Video Grounding,
ICCV23(13800-13810)
IEEE DOI Code:
WWW Link.
2401
BibRef
Zhang, Y.M.[Yi-Ming],
Gong, Z.[ZeMing],
Chang, A.X.[Angel X.],
Multi3DRefer: Grounding Text Description to Multiple 3D Objects,
ICCV23(15179-15179)
IEEE DOI
2401
BibRef
Chen, C.[Chongyan],
Anjum, S.[Samreen],
Gurari, D.[Danna],
VQA Therapy: Exploring Answer Differences by Visually Grounding
Answers,
ICCV23(15269-15279)
IEEE DOI Code:
WWW Link.
2401
BibRef
Li, H.[Huan],
Wei, P.[Ping],
Ma, Z.[Zeyu],
Zheng, N.N.[Nan-Ning],
Inverse Compositional Learning for Weakly-supervised Relation
Grounding,
ICCV23(15431-15441)
IEEE DOI
2401
BibRef
Chen, D.Z.Y.[Dave Zhen-Yu],
Hu, R.H.[Rong-Hang],
Chen, X.L.[Xin-Lei],
Nießner, M.[Matthias],
Chang, A.X.[Angel X.],
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual
Grounding,
ICCV23(18063-18073)
IEEE DOI
2401
BibRef
de la Jara, I.M.[Ignacio M.],
Rodriguez-Opazo, C.[Cristian],
Marrese-Taylor, E.[Edison],
Bravo-Marquez, F.[Felipe],
An empirical study of the effect of video encoders on Temporal Video
Grounding,
CLVL23(2842-2847)
IEEE DOI
2401
BibRef
Wang, Z.[Zehan],
Huang, H.F.[Hai-Feng],
Zhao, Y.[Yang],
Li, L.J.[Lin-Jun],
Cheng, X.[Xize],
Zhu, Y.C.[Yi-Chen],
Yin, A.[Aoxiong],
Zhao, Z.[Zhou],
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
Supervised 3D Visual Grounding,
ICCV23(2662-2671)
IEEE DOI
2401
BibRef
Guo, Z.[Zoey],
Tang, Y.W.[Yi-Wen],
Zhang, R.[Ray],
Wang, D.[Dong],
Wang, Z.G.[Zhi-Gang],
Zhao, B.[Bin],
Li, X.L.[Xue-Long],
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding,
ICCV23(15326-15337)
IEEE DOI
2401
BibRef
Li, M.[Menghao],
Wang, C.L.[Chun-Lei],
Feng, W.[Wenquan],
Lyu, S.C.[Shu-Chang],
Cheng, G.L.[Guang-Liang],
Li, X.T.[Xiang-Tai],
Liu, B.[Binghao],
Zhao, Q.[Qi],
Iterative Robust Visual Grounding with Masked Reference based
Centerpoint Supervision,
VLAR23(4653-4658)
IEEE DOI Code:
WWW Link.
2401
BibRef
Hsu, J.[Joy],
Mao, J.Y.[Jia-Yuan],
Wu, J.J.[Jia-Jun],
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations,
CVPR23(2614-2623)
IEEE DOI
2309
BibRef
Yi, J.[Jingru],
Uzkent, B.[Burak],
Ignat, O.[Oana],
Li, Z.L.[Zi-Li],
Garg, A.[Amanmeet],
Yu, X.[Xiang],
Liu, L.[Linda],
Augment the Pairs: Semantics-Preserving Image-Caption Pair
Augmentation for Grounding-Based Vision and Language Models,
WACV24(5508-5518)
IEEE DOI Code:
WWW Link.
2404
Training, Representation learning, Measurement, Grounding,
Image color analysis, Semantics, Data augmentation, Algorithms, Robotics
BibRef
Uzkent, B.[Burak],
Garg, A.[Amanmeet],
Zhu, W.T.[Wen-Tao],
Doshi, K.[Keval],
Yi, J.[Jingru],
Wang, X.L.[Xiao-Long],
Omar, M.[Mohamed],
Dynamic Inference with Grounding Based Vision and Language Models,
CVPR23(2624-2633)
IEEE DOI
2309
BibRef
Cao, M.[Meng],
Wei, F.Y.[Fang-Yun],
Xu, C.[Can],
Geng, X.[Xiubo],
Chen, L.[Long],
Zhang, C.[Can],
Zou, Y.X.[Yue-Xian],
Shen, T.[Tao],
Jiang, D.X.[Da-Xin],
Iterative Proposal Refinement for Weakly-Supervised Video Grounding,
CVPR23(6524-6534)
IEEE DOI
2309
BibRef
Wang, L.[Lan],
Mittal, G.[Gaurav],
Sajeev, S.[Sandra],
Yu, Y.[Ye],
Hall, M.[Matthew],
Boddeti, V.N.[Vishnu Naresh],
Chen, M.[Mei],
ProTéGé: Untrimmed Pretraining for Video Temporal Grounding by Video
Temporal Grounding,
CVPR23(6575-6585)
IEEE DOI
2309
BibRef
Hwang, M.Y.[Min-Young],
Jeong, J.Y.[Jae-Yeon],
Kim, M.S.[Min-Soo],
Oh, Y.[Yoonseon],
Oh, S.H.[Song-Hwai],
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation
Using Scene Object Spectrum Grounding,
CVPR23(6683-6693)
IEEE DOI
2309
BibRef
Chen, J.[Joya],
Gao, D.F.[Di-Fei],
Lin, K.Q.[Kevin Qinghong],
Shou, M.Z.[Mike Zheng],
Affordance Grounding from Demonstration Video to Target Image,
CVPR23(6799-6808)
IEEE DOI
2309
BibRef
Shaharabany, T.[Tal],
Wolf, L.B.[Lior B.],
Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding,
CVPR23(6925-6934)
IEEE DOI
2309
BibRef
Su, W.[Wei],
Miao, P.[Peihan],
Dou, H.Z.[Huan-Zhang],
Wang, G.[Gaoang],
Qiao, L.[Liang],
Li, Z.[Zheyang],
Li, X.[Xi],
Language Adaptive Weight Generation for Multi-Task Visual Grounding,
CVPR23(10857-10866)
IEEE DOI
2309
BibRef
Li, G.[Gen],
Jampani, V.[Varun],
Sun, D.Q.[De-Qing],
Sevilla-Lara, L.[Laura],
LOCATE: Localize and Transfer Object Parts for Weakly Supervised
Affordance Grounding,
CVPR23(10922-10931)
IEEE DOI
2309
BibRef
Kim, S.[Siwon],
Oh, J.[Jinoh],
Lee, S.[Sungjin],
Yu, S.[Seunghak],
Do, J.[Jaeyoung],
Taghavi, T.[Tara],
Grounding Counterfactual Explanation of Image Classifiers to Textual
Concept Space,
CVPR23(10942-10950)
IEEE DOI
2309
BibRef
Zhang, Y.M.[Yi-Meng],
Chen, X.[Xin],
Jia, J.H.[Jing-Han],
Liu, S.[Sijia],
Ding, K.[Ke],
Text-Visual Prompting for Efficient 2D Temporal Video Grounding,
CVPR23(14794-14804)
IEEE DOI
2309
BibRef
Chen, Z.H.[Zhi-Hong],
Zhang, R.[Ruifei],
Song, Y.B.[Yi-Bing],
Wan, X.[Xiang],
Li, G.B.[Guan-Bin],
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method,
CVPR23(15039-15049)
IEEE DOI
2309
BibRef
Huang, Y.F.[Yi-Fei],
Yang, L.[Lijin],
Sato, Y.[Yoichi],
Weakly Supervised Temporal Sentence Grounding with Uncertainty-Guided
Self-training,
CVPR23(18908-18918)
IEEE DOI
2309
BibRef
Tan, C.[Chaolei],
Lin, Z.H.[Zi-Hang],
Hu, J.F.[Jian-Fang],
Zheng, W.S.[Wei-Shi],
Lai, J.H.[Jian-Huang],
Hierarchical Semantic Correspondence Networks for Video Paragraph
Grounding,
CVPR23(18973-18982)
IEEE DOI
2309
BibRef
Yang, Z.Y.[Zi-Yan],
Kafle, K.[Kushal],
Dernoncourt, F.[Franck],
Ordonez, V.[Vicente],
Improving Visual Grounding by Encouraging Consistent Gradient-Based
Explanations,
CVPR23(19165-19174)
IEEE DOI
2309
BibRef
Wu, Y.M.[Yan-Min],
Cheng, X.H.[Xin-Hua],
Zhang, R.R.[Ren-Rui],
Cheng, Z.[Zesen],
Zhang, J.[Jian],
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual
Grounding,
CVPR23(19231-19242)
IEEE DOI
2309
BibRef
Li, M.Z.[Meng-Ze],
Wang, H.[Han],
Zhang, W.Q.[Wen-Qiao],
Miao, J.X.[Jia-Xu],
Zhao, Z.[Zhou],
Zhang, S.Y.[Sheng-Yu],
Ji, W.[Wei],
Wu, F.[Fei],
WINNER: Weakly-supervised hIerarchical decompositioN and aligNment
for spatio-tEmporal video gRounding,
CVPR23(23090-23099)
IEEE DOI
2309
BibRef
Lin, Z.H.[Zi-Hang],
Tan, C.L.[Chao-Lei],
Hu, J.F.[Jian-Fang],
Jin, Z.[Zhi],
Ye, T.[Tiancai],
Zheng, W.S.[Wei-Shi],
Collaborative Static and Dynamic Vision-Language Streams for
Spatio-Temporal Video Grounding,
CVPR23(23100-23109)
IEEE DOI
2309
BibRef
Yang, L.[Lijin],
Kong, Q.[Quan],
Yang, H.K.[Hsuan-Kung],
Kehl, W.[Wadim],
Sato, Y.[Yoichi],
Kobori, N.[Norimasa],
DeCo: Decomposition and Reconstruction for Compositional Temporal
Grounding via Coarse-to-Fine Contrastive Ranking,
CVPR23(23130-23140)
IEEE DOI
2309
BibRef
Zhou, L.[Li],
Zhou, Z.[Zikun],
Mao, K.[Kaige],
He, Z.Y.[Zhen-Yu],
Joint Visual Grounding and Tracking with Natural Language
Specification,
CVPR23(23151-23160)
IEEE DOI
2309
BibRef
Devaraj, C.[Chinmaya],
Fermüller, C.[Cornelia],
Aloimonos, Y.F.[Yi-Fannis],
Incorporating Visual Grounding In GCN For Zero-shot Learning Of Human
Object Interaction Actions,
L3D-IVU23(5008-5017)
IEEE DOI
2309
BibRef
Fang, X.[Xiang],
Liu, D.Z.[Dai-Zong],
Zhou, P.[Pan],
Nan, G.S.[Guo-Shun],
You Can Ground Earlier than See: An Effective and Efficient Pipeline
for Temporal Sentence Grounding in Compressed Videos,
CVPR23(2448-2460)
IEEE DOI
2309
BibRef
Fu, T.J.[Tsu-Jui],
Li, L.J.[Lin-Jie],
Gan, Z.[Zhe],
Lin, K.[Kevin],
Wang, W.Y.[William Yang],
Wang, L.J.[Li-Juan],
Liu, Z.C.[Zi-Cheng],
An Empirical Study of End-to-End Video-Language Transformers with
Masked Visual Modeling,
CVPR23(22898-22909)
IEEE DOI
2309
BibRef
Li, L.J.[Lin-Jie],
Gan, Z.[Zhe],
Lin, K.[Kevin],
Lin, C.C.[Chung-Ching],
Liu, Z.C.[Zi-Cheng],
Liu, C.[Ce],
Wang, L.J.[Li-Juan],
LAVENDER: Unifying Video-Language Understanding as Masked Language
Modeling,
CVPR23(23119-23129)
IEEE DOI
2309
BibRef
Dong, J.X.[Jian-Xiang],
Yin, Z.Z.[Zhao-Zheng],
Boundary-aware Temporal Sentence Grounding with Adaptive Proposal
Refinement,
ACCV22(IV:641-657).
Springer DOI
2307
BibRef
Gao, Y.Z.[Yi-Zhao],
Lu, Z.W.[Zhi-Wu],
SST-VLM: Sparse Sampling-twice Inspired Video-language Model,
ACCV22(IV:537-553).
Springer DOI
2307
BibRef
Pacheco-Ortega, A.[Abel],
Mayol-Cuervas, W.[Walterio],
One-shot Learning for Human Affordance Detection,
CVMeta22(758-766).
Springer DOI
2304
BibRef
Ho, C.H.[Chih-Hui],
Appalaraju, S.[Srikar],
Jasani, B.[Bhavan],
Manmatha, R.,
Vasconcelos, N.M.[Nuno M.],
YORO - Lightweight End to End Visual Grounding,
CMMP22(3-23).
Springer DOI
2304
BibRef
Kim, D.[Dahye],
Park, J.[Jungin],
Lee, J.Y.[Ji-Young],
Park, S.[Seongheon],
Sohn, K.H.[Kwang-Hoon],
Language-free Training for Zero-shot Video Grounding,
WACV23(2538-2547)
IEEE DOI
2302
Training, Visualization, Grounding, Annotations, Natural languages, Standards
BibRef
Le, T.M.[Thao Minh],
Le, V.[Vuong],
Gupta, S.I.[Sun-Il],
Venkatesh, S.[Svetha],
Tran, T.[Truyen],
Guiding Visual Question Answering with Attention Priors,
WACV23(4370-4379)
IEEE DOI
2302
Training, Visualization, Systematics, Grounding, Semantics,
Linguistics, Cognition, visual reasoning)
BibRef
Chou, S.H.[Shih-Han],
Fan, Z.C.[Zi-Cong],
Little, J.J.[James J.],
Sigal, L.[Leonid],
Semi-Supervised Grounding Alignment for Multi-Modal Feature Learning,
CRV22(48-57)
IEEE DOI
2301
Representation learning, Training, Visualization, Grounding,
Annotations, Benchmark testing, grounding, VCR
BibRef
Gupta, K.[Kshitij],
Gautam, D.[Devansh],
Mamidi, R.[Radhika],
cViL: Cross-Lingual Training of Vision-Language Models using
Knowledge Distillation,
ICPR22(1734-1741)
IEEE DOI
2212
Training, Visualization, Analytical models, Pipelines, Transformers,
Question answering (information retrieval), Data models
BibRef
Chen, D.Z.Y.[Dave Zhen-Yu],
Wu, Q.R.[Qi-Rui],
Nießner, M.[Matthias],
Chang, A.X.[Angel X.],
D 3 Net: A Unified Speaker-Listener Architecture for
3D Dense Captioning and Visual Grounding,
ECCV22(XXXII:487-505).
Springer DOI
2211
BibRef
Parcalabescu, L.,
Frank, A.,
Exploring Phrase Grounding without Training: Contextualisation and
Extension to Text-Based Image Retrieval,
MULWS20(4137-4146)
IEEE DOI
2008
Grounding, Visualization, Detectors, Task analysis, Linguistics,
Proposals, Training
BibRef
Tung, H.,
Harley, A.W.,
Huang, L.,
Fragkiadaki, K.,
Reward Learning from Narrated Demonstrations,
CVPR18(7004-7013)
IEEE DOI
1812
Visualization, Natural languages, Detectors, Grounding,
Speech recognition, Microphones
BibRef
Cohen, N.[Niv],
Gal, R.[Rinon],
Meirom, E.A.[Eli A.],
Chechik, G.[Gal],
Atzmon, Y.[Yuval],
'This Is My Unicorn, Fluffy':
Personalizing Frozen Vision-Language Representations,
ECCV22(XX:558-577).
Springer DOI
2211
BibRef
Lee, J.H.[Ju-Hee],
Kang, J.W.[Je-Won],
Relation Enhanced Vision Language Pre-Training,
ICIP22(2286-2290)
IEEE DOI
2211
Visualization, Semantics, Force, Transformers, Task analysis,
vision-language pre-training
BibRef
Khan, Z.[Zaid],
Kumar, B.G.V.[B. G. Vijay],
Yu, X.[Xiang],
Schulter, S.[Samuel],
Chandraker, M.[Manmohan],
Fu, Y.[Yun],
Single-Stream Multi-level Alignment for Vision-Language Pretraining,
ECCV22(XXXVI:735-751).
Springer DOI
2211
BibRef
Wang, R.[Renhao],
Zhao, H.[Hang],
Gao, Y.[Yang],
CYBORGS: Contrastively Bootstrapping Object Representations by
Grounding in Segmentation,
ECCV22(XXXI:260-277).
Springer DOI
2211
BibRef
Yang, Z.Y.[Zheng-Yuan],
Gan, Z.[Zhe],
Wang, J.F.[Jian-Feng],
Hu, X.W.[Xiao-Wei],
Ahmed, F.[Faisal],
Liu, Z.C.[Zi-Cheng],
Lu, Y.[Yumao],
Wang, L.J.[Li-Juan],
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language
Modeling,
ECCV22(XXXVI:521-539).
Springer DOI
2211
BibRef
Li, H.[Huan],
Wei, P.[Ping],
Li, J.P.[Jia-Peng],
Ma, Z.[Zeyu],
Shang, J.[Jiahui],
Zheng, N.N.[Nan-Ning],
Asymmetric Relation Consistency Reasoning for Video Relation Grounding,
ECCV22(XXXV:125-141).
Springer DOI
2211
BibRef
Dvornik, N.[Nikita],
Hadji, I.[Isma],
Pham, H.[Hai],
Bhatt, D.[Dhaivat],
Martinez, B.[Brais],
Fazly, A.[Afsaneh],
Jepson, A.D.[Allan D.],
Flow Graph to Video Grounding for Weakly-Supervised Multi-step
Localization,
ECCV22(XXXV:319-335).
Springer DOI
2211
BibRef
Qu, M.X.[Meng-Xue],
Wu, Y.[Yu],
Liu, W.[Wu],
Gong, Q.Q.[Qi-Qi],
Liang, X.D.[Xiao-Dan],
Russakovsky, O.[Olga],
Zhao, Y.[Yao],
Wei, Y.C.[Yun-Chao],
SiRi: A Simple Selective Retraining Mechanism for Transformer-Based
Visual Grounding,
ECCV22(XXXV:546-562).
Springer DOI
2211
BibRef
Zhu, C.Y.[Chao-Yang],
Zhou, Y.[Yiyi],
Shen, Y.H.[Yun-Hang],
Luo, G.[Gen],
Pan, X.J.[Xing-Jia],
Chen, M.B.L.C.[Ming-Bao Lin. Chao],
Cao, L.J.[Liu-Juan],
Sun, X.S.[Xiao-Shuai],
Ji, R.R.[Rong-Rong],
SeqTR: A Simple Yet Universal Network for Visual Grounding,
ECCV22(XXXV:598-615).
Springer DOI
2211
BibRef
Khan, A.U.[Aisha Urooj],
Kuehne, H.[Hilde],
Gan, C.[Chuang],
da Vitoria Lobo, N.[Niels],
Shah, M.[Mubarak],
Weakly Supervised Grounding for VQA in Vision-Language Transformers,
ECCV22(XXXV:652-670).
Springer DOI
2211
BibRef
Hao, J.[Jiachang],
Sun, H.F.[Hai-Feng],
Ren, P.F.[Peng-Fei],
Wang, J.Y.[Jing-Yu],
Qi, Q.[Qi],
Liao, J.X.[Jian-Xin],
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training
Framework for Temporal Grounding,
ECCV22(XXXVI:130-147).
Springer DOI
2211
BibRef
Jain, A.[Ayush],
Gkanatsios, N.[Nikolaos],
Mediratta, I.[Ishita],
Fragkiadaki, K.[Katerina],
Bottom Up Top Down Detection Transformers for Language Grounding in
Images and Point Clouds,
ECCV22(XXXVI:417-433).
Springer DOI
2211
BibRef
Heisler, M.[Morgan],
Banitalebi-Dehkordi, A.[Amin],
Zhang, Y.[Yong],
SemAug: Semantically Meaningful Image Augmentations for Object
Detection Through Language Grounding,
ECCV22(XXXVI:610-626).
Springer DOI
2211
BibRef
Min, S.[Seonwoo],
Park, N.[Nokyung],
Kim, S.[Siwon],
Park, S.H.[Seung-Hyun],
Kim, J.[Jinkyu],
Grounding Visual Representations with Texts for Domain Generalization,
ECCV22(XXXVII:37-53).
Springer DOI
2211
BibRef
Wang, J.[Jia],
Wu, H.Y.[Hung-Yi],
Chen, J.C.[Jun-Cheng],
Shuai, H.H.[Hong-Han],
Cheng, W.H.[Wen-Huang],
Residual Graph Attention Network and Expression-Respect Data
Augmentation Aided Visual Grounding,
ICIP22(326-330)
IEEE DOI
2211
Visualization, Grounding, Training data, Cognition, Data models,
Complexity theory, Residual graph attention network, Visual grounding
BibRef
Xiong, Z.[Zeyu],
Liu, D.[Daizong],
Zhou, P.[Pan],
Gaussian Kernel-Based Cross Modal Network for Spatio-Temporal Video
Grounding,
ICIP22(2481-2485)
IEEE DOI
2211
Heating systems, Grounding, Natural languages, Electron tubes,
Task analysis, anchor-free, Gaussian kernel, spatial-temporal video grounding
BibRef
Alaniz, S.[Stephan],
Federici, M.[Marco],
Akata, Z.[Zeynep],
Compositional Mixture Representations for Vision and Text,
L3D-IVU22(4201-4210)
IEEE DOI
2210
Representation learning, Visualization, Computational modeling,
Semantics, Image retrieval, Employment, Object detection
BibRef
Cho, J.[Junhyeong],
Yoon, Y.[Youngseok],
Kwak, S.[Suha],
Collaborative Transformers for Grounded Situation Recognition,
CVPR22(19627-19636)
IEEE DOI
2210
Measurement, Training, Visualization, Computational modeling,
Estimation, Collaboration, Predictive models,
Visual reasoning
BibRef
Singh, A.[Amanpreet],
Hu, R.H.[Rong-Hang],
Goswami, V.[Vedanuj],
Couairon, G.[Guillaume],
Galuba, W.[Wojciech],
Rohrbach, M.[Marcus],
Kiela, D.[Douwe],
FLAVA: A Foundational Language And Vision Alignment Model,
CVPR22(15617-15629)
IEEE DOI
2210
Analytical models, Computational modeling,
Task analysis, Vision+language
BibRef
Saini, N.[Nirat],
Pham, K.[Khoi],
Shrivastava, A.[Abhinav],
Disentangling Visual Embeddings for Attributes and Objects,
CVPR22(13648-13657)
IEEE DOI
2210
WWW Link. Visualization, Codes, Benchmark testing,
Linguistics, Feature extraction, Recognition: detection, Visual reasoning
BibRef
Ge, Y.Y.[Yu-Ying],
Ge, Y.X.[Yi-Xiao],
Liu, X.H.[Xi-Hui],
Wang, J.P.[Jin-Peng],
Wu, J.P.[Jian-Ping],
Shan, Y.[Ying],
Qie, X.[Xiaohu],
Luo, P.[Ping],
MILES: Visual BERT Pre-training with Injected Language Semantics for
Video-Text Retrieval,
ECCV22(XXXV:691-708).
Springer DOI
2211
BibRef
Wang, A.J.P.[Alex Jin-Peng],
Ge, Y.X.[Yi-Xiao],
Cai, G.[Guanyu],
Yan, R.[Rui],
Lin, X.D.[Xu-Dong],
Shan, Y.[Ying],
Qie, X.[Xiaohu],
Shou, M.Z.[Mike Zheng],
Object-aware Video-language Pre-training for Retrieval,
CVPR22(3303-3312)
IEEE DOI
2210
Training, Visualization, Machine vision, Semantics,
Detectors, Transformers, retrieval
BibRef
Li, D.X.[Dong-Xu],
Li, J.N.[Jun-Nan],
Li, H.D.[Hong-Dong],
Niebles, J.C.[Juan Carlos],
Hoi, S.C.H.[Steven C.H.],
Align and Prompt: Video-and-Language Pre-training with Entity Prompts,
CVPR22(4943-4953)
IEEE DOI
2210
Representation learning, Vocabulary, Visualization, Semantics,
Detectors, Transformers, Vision + language,
Video analysis and understanding
BibRef
Xue, H.W.[Hong-Wei],
Hang, T.[Tiankai],
Zeng, Y.H.[Yan-Hong],
Sun, Y.C.[Yu-Chong],
Liu, B.[Bei],
Yang, H.[Huan],
Fu, J.L.[Jian-Long],
Guo, B.N.[Bai-Ning],
Advancing High-Resolution Video-Language Representation with
Large-Scale Video Transcriptions,
CVPR22(5026-5035)
IEEE DOI
2210
Visualization, Video on demand, Computational modeling,
Superresolution, Semantics, Transformers, Feature extraction,
Self- semi- meta- unsupervised learning
BibRef
Sammani, F.[Fawaz],
Mukherjee, T.[Tanmoy],
Deligiannis, N.[Nikos],
NLX-GPT: A Model for Natural Language Explanations in Vision and
Vision-Language Tasks,
CVPR22(8312-8322)
IEEE DOI
2210
Current measurement, Computational modeling, Natural languages,
Decision making, Memory management, Predictive models,
Vision + language
BibRef
Lin, B.Q.[Bing-Qian],
Zhu, Y.[Yi],
Chen, Z.C.[Zi-Cong],
Liang, X.[Xiwen],
Liu, J.Z.[Jian-Zhuang],
Liang, X.D.[Xiao-Dan],
ADAPT: Vision-Language Navigation with Modality-Aligned Action
Prompts,
CVPR22(15375-15385)
IEEE DOI
2210
Visualization, Adaptation models, Navigation, Transformers,
Nonhomogeneous media, Vision + language
BibRef
Dou, Z.Y.[Zi-Yi],
Xu, Y.C.[Yi-Chong],
Gan, Z.[Zhe],
Wang, J.F.[Jian-Feng],
Wang, S.H.[Shuo-Hang],
Wang, L.J.[Li-Juan],
Zhu, C.G.[Chen-Guang],
Zhang, P.C.[Peng-Chuan],
Yuan, L.[Lu],
Peng, N.[Nanyun],
Liu, Z.C.[Zi-Cheng],
Zeng, M.[Michael],
An Empirical Study of Training End-to-End Vision-and-Language
Transformers,
CVPR22(18145-18155)
IEEE DOI
2210
Meters, Training, Codes, Computational modeling, Transformers,
Vision + language, Machine learning
BibRef
Xu, Z.P.[Zi-Peng],
Lin, T.W.[Tian-Wei],
Tang, H.[Hao],
Li, F.[Fu],
He, D.L.[Dong-Liang],
Sebe, N.[Nicu],
Timofte, R.[Radu],
Van Gool, L.J.[Luc J.],
Ding, E.[Errui],
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image
Manipulation Empowered by Pre-Trained Vision-Language Model,
CVPR22(18208-18217)
IEEE DOI
2210
Personal protective equipment, Measurement, Training, Annotations,
Face recognition, Computational modeling, Face and gestures
BibRef
Du, Y.[Yu],
Wei, F.Y.[Fang-Yun],
Zhang, Z.H.[Zi-He],
Shi, M.J.[Miao-Jing],
Gao, Y.[Yue],
Li, G.Q.[Guo-Qi],
Learning to Prompt for Open-Vocabulary Object Detection with
Vision-Language Model,
CVPR22(14064-14073)
IEEE DOI
2210
Training, Representation learning, Visualization,
Transfer learning, Object detection, Detectors,
Self- semi- meta- unsupervised learning
BibRef
Chang, Y.S.[Ying-Shan],
Cao, G.H.[Gui-Hong],
Narang, M.[Mridu],
Gao, J.F.[Jian-Feng],
Suzuki, H.[Hisami],
Bisk, Y.[Yonatan],
WebQA: Multihop and Multimodal QA,
CVPR22(16474-16483)
IEEE DOI
2210
Knowledge engineering, Representation learning, Visualization,
Transformers, Cognition,
Visual reasoning
BibRef
Zellers, R.[Rowan],
Lu, J.[Jiasen],
Lu, X.[Ximing],
Yu, Y.[Youngjae],
Zhao, Y.P.[Yan-Peng],
Salehi, M.[Mohammadreza],
Kusupati, A.[Aditya],
Hessel, J.[Jack],
Farhadi, A.[Ali],
Choi, Y.[Yejin],
MERLOT RESERVE:
Neural Script Knowledge through Vision and Language and Sound,
CVPR22(16354-16366)
IEEE DOI
2210
Training, Representation learning, Visualization, Ethics,
Video on demand, Navigation, Stars, Vision + language, Visual reasoning
BibRef
Gupta, T.[Tanmay],
Kamath, A.[Amita],
Kembhavi, A.[Aniruddha],
Hoiem, D.[Derek],
Towards General Purpose Vision Systems:
An End-to-End Task-Agnostic Vision-Language Architecture,
CVPR22(16378-16388)
IEEE DOI
2210
Training, Visualization, Machine vision,
Object detection, Network architecture, Vision + language
BibRef
Surís, D.[Dídac],
Epstein, D.[Dave],
Vondrick, C.[Carl],
Globetrotter: Connecting Languages by Connecting Images,
CVPR22(16453-16463)
IEEE DOI
2210
Training, Deep learning, Visualization, Image segmentation, Codes,
Computational modeling, Vision + language
BibRef
Zhu, H.D.[Hai-Dong],
Sadhu, A.[Arka],
Zheng, Z.H.[Zhao-Heng],
Nevatia, R.[Ram],
Utilizing Every Image Object for Semi-supervised Phrase Grounding,
WACV21(2209-2218)
IEEE DOI
2106
Localize an object in the image given a referring expression.
Training, Grounding, Annotations,
Detectors, Task analysis
BibRef
Sung, Y.L.[Yi-Lin],
Cho, J.[Jaemin],
Bansal, M.[Mohit],
VL-ADAPTER: Parameter-Efficient Transfer Learning for
Vision-and-Language Tasks,
CVPR22(5217-5227)
IEEE DOI
2210
Training, Adaptation models, Computational modeling,
Transfer learning, Benchmark testing, Multitasking, Vision + language
BibRef
Wu, D.M.[Dong-Ming],
Dong, X.P.[Xing-Ping],
Shao, L.[Ling],
Shen, J.B.[Jian-Bing],
Multi-Level Representation Learning with Semantic Alignment for
Referring Video Object Segmentation,
CVPR22(4986-4995)
IEEE DOI
2210
Representation learning, Visualization, Adaptation models, Shape,
Grounding, Semantics, Vision + language, Segmentation,
grouping and shape analysis
BibRef
Gao, K.[Kaifeng],
Chen, L.[Long],
Niu, Y.[Yulei],
Shao, J.[Jian],
Xiao, J.[Jun],
Classification-Then-Grounding: Reformulating Video Scene Graphs as
Temporal Bipartite Graphs,
CVPR22(19475-19484)
IEEE DOI
2210
Image analysis, Codes, Grounding, Semantics, Bipartite graph,
Scene analysis and understanding,
Vision + language
BibRef
Kesen, I.[Ilker],
Can, O.A.[Ozan Arkan],
Erdem, E.[Erkut],
Erdem, A.[Aykut],
Yüret, D.[Deniz],
Modulating Bottom-Up and Top-Down Visual Processing via
Language-Conditional Filters,
MULA22(4609-4619)
IEEE DOI
2210
Visualization, Image segmentation, Image color analysis, Grounding,
Computational modeling, Process control, Predictive models
BibRef
Nebbia, G.[Giacomo],
Kovashka, A.[Adriana],
Doubling down: sparse grounding with an additional, almost-matching
caption for detection-oriented multimodal pretraining,
MULA22(4641-4650)
IEEE DOI
2210
Deep learning, Visualization, Grounding,
Computational modeling, Data models
BibRef
Ye, J.[Jiabo],
Tian, J.F.[Jun-Feng],
Yan, M.[Ming],
Yang, X.S.[Xiao-Shan],
Wang, X.[Xuwu],
Zhang, J.[Ji],
He, L.[Liang],
Lin, X.[Xin],
Shifting More Attention to Visual Backbone: Query-modulated
Refinement Networks for End-to-End Visual Grounding,
CVPR22(15481-15491)
IEEE DOI
2210
Training, Visualization, Grounding, Refining, Natural languages,
Feature extraction, Vision + language, Visual reasoning
BibRef
Jiang, H.J.[Hao-Jun],
Lin, Y.Z.[Yuan-Ze],
Han, D.C.[Dong-Chen],
Song, S.[Shiji],
Huang, G.[Gao],
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding,
CVPR22(15492-15502)
IEEE DOI
2210
Training, Visualization, Costs, Grounding, Annotations,
Computational modeling, Natural languages, Vision + language, Visual reasoning
BibRef
Huang, S.[Shijia],
Chen, Y.L.[Yi-Lun],
Jia, J.Y.[Jia-Ya],
Wang, L.W.[Li-Wei],
Multi-View Transformer for 3D Visual Grounding,
CVPR22(15503-15512)
IEEE DOI
2210
Point cloud compression, Visualization, Solid modeling, Grounding,
Natural languages, Vision + language
BibRef
Chen, S.[Sijia],
Li, B.[Baochun],
Multi-Modal Dynamic Graph Transformer for Visual Grounding,
CVPR22(15513-15522)
IEEE DOI
2210
Visualization, Image analysis, Grounding, Computational modeling,
Semantics, Natural languages, Vision + language,
Scene analysis and understanding
BibRef
Mavroudi, E.[Effrosyni],
Vidal, R.[René],
Weakly-Supervised Generation and Grounding of Visual Descriptions
with Conditional Generative Models,
CVPR22(15523-15533)
IEEE DOI
2210
Visualization, Grounding, Video description,
Computational modeling, Random variables,
Video analysis and understanding
BibRef
Chen, S.[Shi],
Zhao, Q.[Qi],
REX: Reasoning-aware and Grounded Explanation,
CVPR22(15565-15574)
IEEE DOI
2210
Visualization, Codes, Grounding, Transfer learning, Decision making,
Multitasking, Vision + language, Visual reasoning
BibRef
Lou, C.[Chao],
Han, W.J.[Wen-Juan],
Lin, Y.[Yuhuan],
Zheng, Z.L.[Zi-Long],
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual
Scene Graphs with Language Structures via Dependency Relationships,
CVPR22(15586-15595)
IEEE DOI
2210
Visualization, Grounding, Buildings, Benchmark testing, Linguistics,
Vision + language, Explainable computer vision
BibRef
Luo, J.Y.[Jun-Yu],
Fu, J.[Jiahui],
Kong, X.[Xianghao],
Gao, C.[Chen],
Ren, H.B.[Hai-Bing],
Shen, H.[Hao],
Xia, H.X.[Hua-Xia],
Liu, S.[Si],
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point
Progressive Selection,
CVPR22(16433-16442)
IEEE DOI
2210
Point cloud compression, Visualization, Solid modeling, Grounding,
Detectors, Vision + language
BibRef
Cai, D.[Daigang],
Zhao, L.C.[Li-Chen],
Zhang, J.[Jing],
Sheng, L.[Lu],
Xu, D.[Dong],
3DJCG: A Unified Framework for Joint Dense Captioning and Visual
Grounding on 3D Point Clouds,
CVPR22(16443-16452)
IEEE DOI
2210
Training, Point cloud compression, Visualization, Grounding,
Performance gain, retrieval, categorization, Vision + language,
Recognition: detection
BibRef
Luo, H.C.[Hong-Chen],
Zhai, W.[Wei],
Zhang, J.[Jing],
Cao, Y.[Yang],
Tao, D.C.[Da-Cheng],
Learning Affordance Grounding from Exocentric Images,
CVPR22(2242-2251)
IEEE DOI
2210
Analytical models, Visualization, Grounding, Affordances,
Computational modeling, Transforms, Feature extraction,
Scene analysis and understanding
BibRef
Jiang, X.[Xun],
Xu, X.[Xing],
Zhang, J.[Jingran],
Shen, F.M.[Fu-Min],
Cao, Z.[Zuo],
Shen, H.T.[Heng Tao],
Semi-supervised Video Paragraph Grounding with Contrastive Encoder,
CVPR22(2456-2465)
IEEE DOI
2210
Training, Grounding, Annotations, Training data,
Semisupervised learning, Transformers, Data models,
Vision + language
BibRef
Yu, W.[Wei],
Chen, W.X.[Wen-Xin],
Yin, S.[Songheng],
Easterbrook, S.[Steve],
Garg, A.[Animesh],
Modular Action Concept Grounding in Semantic Video Prediction,
CVPR22(3595-3604)
IEEE DOI
2210
Adaptation models, Visualization, Inverse problems, Grounding,
Semantics, Object detection, Predictive models,
Vision + language
BibRef
Soldan, M.[Mattia],
Pardo, A.[Alejandro],
Alcázar, J.L.[Juan León],
Heilbron, F.C.[Fabian Caba],
Zhao, C.[Chen],
Giancola, S.[Silvio],
Ghanem, B.[Bernard],
MAD: A Scalable Dataset for Language Grounding in Videos from Movie
Audio Descriptions,
CVPR22(5016-5025)
IEEE DOI
2210
Grounding, Annotations, Pipelines, Natural languages,
Machine learning, Benchmark testing, Vision + language,
Video analysis and understanding
BibRef
Yang, L.[Li],
Xu, Y.[Yan],
Yuan, C.F.[Chun-Feng],
Liu, W.[Wei],
Li, B.[Bing],
Hu, W.M.[Wei-Ming],
Improving Visual Grounding with Visual-Linguistic Verification and
Iterative Reasoning,
CVPR22(9489-9498)
IEEE DOI
2210
Location awareness, Visualization, Grounding, Natural languages,
Object detection, Transformers, Cognition, Recognition: detection, retrieval
BibRef
Li, L.H.[Liunian Harold],
Zhang, P.C.[Peng-Chuan],
Zhang, H.T.[Hao-Tian],
Yang, J.W.[Jian-Wei],
Li, C.Y.[Chun-Yuan],
Zhong, Y.[Yiwu],
Wang, L.J.[Li-Juan],
Yuan, L.[Lu],
Zhang, L.[Lei],
Hwang, J.N.[Jenq-Neng],
Chang, K.W.[Kai-Wei],
Gao, J.F.[Jian-Feng],
Grounded Language-Image Pre-training,
CVPR22(10955-10965)
IEEE DOI
2210
Visualization, Image recognition, Head, Grounding, Object detection,
Data models, Deep learning architectures and techniques,
Vision + language
BibRef
Li, Y.C.[Yi-Cong],
Wang, X.[Xiang],
Xiao, J.B.[Jun-Bin],
Ji, W.[Wei],
Chua, T.S.[Tat-Seng],
Invariant Grounding for Video Question Answering,
CVPR22(2918-2927)
IEEE DOI
2210
Visualization, Correlation, Grounding, Semantics, Predictive models,
Linguistics, Question answering (information retrieval),
Vision + language
BibRef
Yang, Z.Y.[Zheng-Yuan],
Zhang, S.Y.[Song-Yang],
Wang, L.W.[Li-Wei],
Luo, J.B.[Jie-Bo],
SAT: 2D Semantics Assisted Training for 3D Visual Grounding,
ICCV21(1836-1846)
IEEE DOI
2203
Training, Point cloud compression, Representation learning,
Visualization, Grounding, Semantics, Vision + language,
BibRef
Chen, J.W.[Jun-Wen],
Golisano, Y.K.[Yu Kong],
Explainable Video Entailment with Grounded Visual Evidence,
ICCV21(2001-2010)
IEEE DOI
2203
Training, Visualization, Grounding, Computational modeling,
Decision making, Focusing, Vision + language, Video analysis and understanding
BibRef
Zhao, L.C.[Li-Chen],
Cai, D.[Daigang],
Sheng, L.[Lu],
Xu, D.[Dong],
3DVG-Transformer: Relation Modeling for Visual Grounding on Point
Clouds,
ICCV21(2908-2917)
IEEE DOI
2203
Point cloud compression, Multiplexing, Visualization,
Solid modeling, Grounding, Transformers,
Vision + language
BibRef
Feng, M.[Mingtao],
Li, Z.[Zhen],
Li, Q.[Qi],
Zhang, L.[Liang],
Zhang, X.[XiangDong],
Zhu, G.M.[Guang-Ming],
Zhang, H.[Hui],
Wang, Y.[Yaonan],
Mian, A.[Ajmal],
Free-form Description Guided 3D Visual Graph Network for Object
Grounding in Point Cloud,
ICCV21(3702-3711)
IEEE DOI
2203
Point cloud compression, Visualization, Correlation, Grounding,
Natural languages, Detection and localization in 2D and 3D,
Visual reasoning and logical representation
BibRef
Ding, X.P.[Xin-Peng],
Wang, N.N.[Nan-Nan],
Zhang, S.W.[Shi-Wei],
Cheng, D.[De],
Li, X.M.[Xiao-Meng],
Huang, Z.Y.[Zi-Yuan],
Tang, M.Q.[Ming-Qian],
Gao, X.B.[Xin-Bo],
Support-Set Based Cross-Supervision for Video Grounding,
ICCV21(11553-11562)
IEEE DOI
2203
Training, Visualization, Costs, Correlation, Grounding, Semantics,
Image and video retrieval, Vision + language
BibRef
Khandelwal, S.[Siddhesh],
Suhail, M.[Mohammed],
Sigal, L.[Leonid],
Segmentation-grounded Scene Graph Generation,
ICCV21(15859-15869)
IEEE DOI
2203
Image segmentation, Visualization, Grounding, Annotations, Genomics,
Scene analysis and understanding,
Transfer/Low-shot/Semi/Unsupervised Learning
BibRef
Patel, S.[Shivansh],
Wani, S.[Saim],
Jain, U.[Unnat],
Schwing, A.[Alexander],
Lazebnik, S.[Svetlana],
Savva, M.[Manolis],
Chang, A.X.[Angel X.],
Interpretation of Emergent Communication in Heterogeneous
Collaborative Embodied Agents,
ICCV21(15993-15943)
IEEE DOI
2203
Systematics, Navigation, Grounding, Collaboration, Task analysis,
Vision for robotics and autonomous vehicles, Explainable AI,
Visual reasoning and logical representation
BibRef
Shi, J.[Jing],
Zhong, Y.[Yiwu],
Xu, N.[Ning],
Li, Y.[Yin],
Xu, C.L.[Chen-Liang],
A Simple Baseline for Weakly-Supervised Scene Graph Generation,
ICCV21(16373-16382)
IEEE DOI
2203
Visualization, Grounding, Computational modeling, Pipelines,
Genomics, Complexity theory, Scene analysis and understanding, Vision + language
BibRef
Su, R.[Rui],
Yu, Q.[Qian],
Xu, D.[Dong],
STVGBert: A Visual-linguistic Transformer based Framework for
Spatio-temporal Video Grounding,
ICCV21(1513-1522)
IEEE DOI
2203
Representation learning, Visualization, Grounding, Detectors,
Benchmark testing, Transformers, Electron tubes,
Vision + language, Video analysis and understanding
BibRef
Cui, C.Y.Q.[Claire Yu-Qing],
Khandelwal, A.[Apoorv],
Artzi, Y.[Yoav],
Snavely, N.[Noah],
Averbuch-Elor, H.[Hadar],
Who's Waldo? Linking People Across Text and Images,
ICCV21(1354-1364)
IEEE DOI
2203
Visualization, Codes, Grounding, Force, Benchmark testing,
Transformers, Vision + language, Datasets and evaluation
BibRef
González, C.[Cristina],
Ayobi, N.[Nicolás],
Hernández, I.[Isabela],
Hernández, J.[José],
Pont-Tuset, J.[Jordi],
Arbeláez, P.[Pablo],
Panoptic Narrative Grounding,
ICCV21(1344-1353)
IEEE DOI
2203
Measurement, Visualization, Image segmentation, Grounding,
Annotations, Semantics, Vision + language, grouping and shape
BibRef
Hong, Y.[Yining],
Li, Q.[Qing],
Zhu, S.C.[Song-Chun],
Huang, S.Y.[Si-Yuan],
VLGrammar: Grounded Grammar Induction of Vision and Language,
ICCV21(1645-1654)
IEEE DOI
2203
Visualization, Semantics, Natural languages, Image retrieval,
Probabilistic logic, Vision + language,
BibRef
Yuan, Z.H.[Zhi-Hao],
Yan, X.[Xu],
Liao, Y.H.[Ying-Hong],
Zhang, R.M.[Rui-Mao],
Wang, S.[Sheng],
Li, Z.[Zhen],
Cui, S.G.[Shu-Guang],
InstanceRefer: Cooperative Holistic Understanding for Visual
Grounding on Point Clouds through Instance Multi-level Contextual
Referring,
ICCV21(1771-1780)
IEEE DOI
2203
Location awareness, Point cloud compression, Visualization,
Solid modeling, Grounding, Predictive models, Vision + language,
Visual reasoning and logical representation
BibRef
Soldan, M.[Mattia],
Xu, M.M.[Meng-Meng],
Qu, S.[Sisi],
Tegner, J.[Jesper],
Ghanem, B.[Bernard],
VLG-Net: Video-Language Graph Matching Network for Video Grounding,
CVEU21(3217-3227)
IEEE DOI
2112
Location awareness, Grounding,
Semantics, Syntactics, Graph neural networks
BibRef
Lu, X.P.[Xiao-Peng],
Fan, Z.[Zhen],
Wang, Y.[Yansen],
Oh, J.[Jean],
Rosé, C.P.[Carolyn P.],
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling,
XSAnim21(2631-2639)
IEEE DOI
2112
Integrated optics, Visualization, Grounding,
Computational modeling, Knowledge discovery
BibRef
Tian, Y.P.[Ya-Peng],
Hu, D.[Di],
Xu, C.L.[Chen-Liang],
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound
Separation,
CVPR21(2744-2753)
IEEE DOI
2111
Training, Visualization, Codes, Grounding,
Computational modeling
BibRef
Nan, G.S.[Guo-Shun],
Qiao, R.[Rui],
Xiao, Y.[Yao],
Liu, J.[Jun],
Leng, S.C.[Si-Cong],
Zhang, H.[Hao],
Lu, W.[Wei],
Interventional Video Grounding with Dual Contrastive Learning,
CVPR21(2764-2774)
IEEE DOI
2111
Visualization, Correlation, Grounding, Benchmark testing,
Knowledge discovery, Data models
BibRef
Zhao, Y.[Yang],
Zhao, Z.[Zhou],
Zhang, Z.[Zhu],
Lin, Z.J.[Zhi-Jie],
Cascaded Prediction Network via Segment Tree for Temporal Video
Grounding,
CVPR21(4195-4204)
IEEE DOI
2111
Costs, Grounding, Navigation, Fuses,
Benchmark testing
BibRef
Liu, Y.F.[Yong-Fei],
Wan, B.[Bo],
Ma, L.[Lin],
He, X.M.[Xu-Ming],
Relation-aware Instance Refinement for Weakly Supervised Visual
Grounding,
CVPR21(5608-5617)
IEEE DOI
2111
Location awareness, Learning systems, Visualization, Grounding,
Semantics, Noise reduction, Benchmark testing
BibRef
Liu, H.L.[Hao-Lin],
Lin, A.[Anran],
Han, X.G.[Xiao-Guang],
Yang, L.[Lei],
Yu, Y.Z.[Yi-Zhou],
Cui, S.G.[Shu-Guang],
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in
RGBD Images,
CVPR21(6028-6037)
IEEE DOI
2111
Heating systems, Geometry, Visualization,
Grounding, Fuses, Feature extraction
BibRef
Lin, X.R.[Xiang-Ru],
Li, G.B.[Guan-Bin],
Yu, Y.Z.[Yi-Zhou],
Scene-Intuitive Agent for Remote Embodied Visual Grounding,
CVPR21(7032-7041)
IEEE DOI
2111
Training, Visualization, Grounding, Navigation, Fuses, Semantics, Pipelines
BibRef
Liu, D.Z.[Dai-Zong],
Qu, X.Y.[Xiao-Ye],
Dong, J.F.[Jian-Feng],
Zhou, P.[Pan],
Cheng, Y.[Yu],
Wei, W.[Wei],
Xu, Z.[Zichuan],
Xie, Y.[Yulai],
Context-aware Biaffine Localizing Network for Temporal Sentence
Grounding,
CVPR21(11230-11239)
IEEE DOI
2111
Location awareness, Codes, Grounding, Cognition,
Task analysis
BibRef
Meng, Z.H.[Zi-Hang],
Yu, L.C.[Li-Cheng],
Zhang, N.[Ning],
Berg, T.[Tamara],
Damavandi, B.[Babak],
Singh, V.[Vikas],
Bearman, A.[Amy],
Connecting What to Say With Where to Look by Modeling Human Attention
Traces,
CVPR21(12674-12683)
IEEE DOI
2111
Measurement, Visualization, Grounding, Unified modeling language,
Training data, Transformers
BibRef
Wang, L.W.[Li-Wei],
Huang, J.[Jing],
Li, Y.[Yin],
Xu, K.[Kun],
Yang, Z.Y.[Zheng-Yuan],
Yu, D.[Dong],
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge
Distillation,
CVPR21(14085-14095)
IEEE DOI
2111
Training, Visualization, Technological innovation,
Costs, Grounding, Detectors
BibRef
Huang, B.B.[Bin-Bin],
Lian, D.Z.[Dong-Ze],
Luo, W.X.[Wei-Xin],
Gao, S.H.[Sheng-Hua],
Look Before You Leap:
Learning Landmark Features for One-Stage Visual Grounding,
CVPR21(16883-16892)
IEEE DOI
2111
Visualization, Grounding, Convolution,
Heuristic algorithms, Computational modeling, Linguistics
BibRef
Zhou, H.[Hao],
Zhang, C.Y.[Chong-Yang],
Luo, Y.[Yan],
Chen, Y.J.[Yan-Jun],
Hu, C.P.[Chuan-Ping],
Embracing Uncertainty: Decoupling and De-bias for Robust Temporal
Grounding,
CVPR21(8441-8450)
IEEE DOI
2111
Performance evaluation, Uncertainty, Grounding,
Annotations, Feature extraction, Robustness
BibRef
Khan, A.U.[Aisha Urooj],
Kuehne, H.[Hilde],
Duarte, K.[Kevin],
Gan, C.[Chuang],
Lobo, N.[Niels],
Shah, M.[Mubarak],
Found a Reason for me? Weakly-supervised Grounded Visual Question
Answering using Capsules,
CVPR21(8461-8470)
IEEE DOI
2111
Training, Visualization, Vocabulary, Grounding, Focusing, Detectors,
Knowledge discovery
BibRef
Zhang, S.Y.[Sheng-Yu],
Jiang, T.[Tan],
Wang, T.[Tan],
Kuang, K.[Kun],
Zhao, Z.[Zhou],
Zhu, J.[Jianke],
Yu, J.[Jin],
Yang, H.X.[Hong-Xia],
Wu, F.[Fei],
DeVLBert: Out-of-distribution Visio-Linguistic Pretraining with
Causality,
CiV21(1744-1747)
IEEE DOI
2109
Visualization, Correlation,
Image retrieval, Knowledge discovery
BibRef
Nguyen, A.T.[Andre T.],
Richards, L.E.[Luke E.],
Kebe, G.Y.[Gaoussou Youssouf],
Raff, E.[Edward],
Darvish, K.[Kasra],
Ferraro, F.[Frank],
Matuszek, C.[Cynthia],
Practical Cross-modal Manifold Alignment for Robotic Grounded
Language Learning,
MULA21(1613-1622)
IEEE DOI
2109
Manifolds, Measurement, Learning systems,
Natural languages, Robot sensing systems
BibRef
Shrestha, A.[Amar],
Pugdeethosapol, K.[Krittaphat],
Fang, H.[Haowen],
Qiu, Q.[Qinru],
MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language
Queries at Phrase Level,
ICPR21(8275-8282)
IEEE DOI
2105
Visualization, Grounding, Fuses, Magnetic resonance imaging,
Natural languages, Games
BibRef
Zhang, Z.,
Zhao, Z.,
Zhao, Y.,
Wang, Q.,
Liu, H.,
Gao, L.,
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form
Sentences,
CVPR20(10665-10674)
IEEE DOI
2008
Grounding, Task analysis, Visualization, Cognition,
Feature extraction, Natural languages
BibRef
Sadhu, A.[Arka],
Chen, K.[Kan],
Nevatia, R.[Ram],
Video Object Grounding Using Semantic Roles in Language Description,
CVPR20(10414-10424)
IEEE DOI
2008
grounds objects in videos referred to in natural language descriptions.
Semantics, Encoding, Proposals, Grounding, Visualization,
Task analysis, Feature extraction
BibRef
Ma, C.Y.[Chih-Yao],
Kalantidis, Y.[Yannis],
AlRegib, G.[Ghassan],
Vajda, P.[Peter],
Rohrbach, M.[Marcus],
Kira, Z.[Zsolt],
Learning to Generate Grounded Visual Captions Without Localization
Supervision,
ECCV20(XVIII:353-370).
Springer DOI
2012
BibRef
Gouthaman, K.V.,
Mittal, A.[Anurag],
Reducing Language Biases in Visual Question Answering with
Visually-grounded Question Encoder,
ECCV20(XIII:18-34).
Springer DOI
2011
BibRef
Zeng, R.H.[Run-Hao],
Xu, H.M.[Hao-Ming],
Huang, W.B.[Wen-Bing],
Chen, P.H.[Pei-Hao],
Tan, M.K.[Ming-Kui],
Gan, C.[Chuang],
Dense Regression Network for Video Grounding,
CVPR20(10284-10293)
IEEE DOI
2008
Grounding, Training, Task analysis, Proposals, Semantics,
Magnetic heads, Feature extraction
BibRef
Gupta, T.[Tanmay],
Vahdat, A.[Arash],
Chechik, G.[Gal],
Yang, X.D.[Xiao-Dong],
Kautz, J.[Jan],
Hoiem, D.[Derek],
Contrastive Learning for Weakly Supervised Phrase Grounding,
ECCV20(III:752-768).
Springer DOI
2012
BibRef
Tan, H.L.,
Leong, M.C.,
Xu, Q.,
Li, L.,
Fang, F.,
Cheng, Y.,
Gauthier, N.,
Sun, Y.,
Lim, J.H.,
Task-Oriented Multi-Modal Question Answering For Collaborative
Applications,
ICIP20(1426-1430)
IEEE DOI
2011
Task analysis, Collaboration, Grounding, Visualization, Cognition,
Training, Machine learning, question answering,
corpora
BibRef
Yang, S.[Sibei],
Li, G.B.[Guan-Bin],
Yu, Y.Z.[Yi-Zhou],
Propagating Over Phrase Relations for One-stage Visual Grounding,
ECCV20(XIX:589-605).
Springer DOI
2011
BibRef
Xiao, J.B.[Jun-Bin],
Shang, X.[Xindi],
Yang, X.[Xun],
Tang, S.[Sheng],
Chua, T.S.[Tat-Seng],
Visual Relation Grounding in Videos,
ECCV20(VI:447-464).
Springer DOI
2011
Code, Relations.
WWW Link.
BibRef
Mun, J.,
Cho, M.,
Han, B.,
Local-Global Video-Text Interactions for Temporal Grounding,
CVPR20(10807-10816)
IEEE DOI
2008
Semantics, Feature extraction, Grounding, Visualization, Proposals,
Task analysis, Context modeling
BibRef
Wu, C.,
Lin, Z.,
Cohen, S.,
Bui, T.,
Maji, S.,
PhraseCut: Language-Based Image Segmentation in the Wild,
CVPR20(10213-10222)
IEEE DOI
2008
Visualization, Grounding, Image segmentation, Task analysis,
Genomics, Bioinformatics, Natural languages
BibRef
Selvaraju, R.R.,
Tendulkar, P.,
Parikh, D.,
Horvitz, E.,
Tulio Ribeiro, M.,
Nushi, B.,
Kamar, E.,
SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions,
CVPR20(10000-10008)
IEEE DOI
2008
Cognition, Task analysis, Visualization, Image color analysis,
Grounding, Text recognition, Computational modeling
BibRef
Chen, L.[Lei],
Zhai, M.Y.[Meng-Yao],
He, J.W.[Jia-Wei],
Mori, G.[Greg],
Object Grounding via Iterative Context Reasoning,
MDALC19(1407-1415)
IEEE DOI
2004
Localize set of queries in the image.
image classification, image representation, image segmentation,
inference mechanisms, iterative methods, query processing,
weakly supervised learning
BibRef
Sinha, A.[Abhishek],
Akilesh, B.,
Sarkar, M.[Mausoom],
Krishnamurthy, B.[Balaji],
Attention Based Natural Language Grounding by Navigating Virtual
Environment,
WACV19(236-244)
IEEE DOI
1904
learning (artificial intelligence),
natural language processing, virtual reality,
Grounding
BibRef
Selvaraju, R.R.,
Lee, S.,
Shen, Y.,
Jin, H.,
Ghosh, S.,
Heck, L.,
Batra, D.,
Parikh, D.,
Taking a HINT: Leveraging Explanations to Make Vision and Language
Models More Grounded,
ICCV19(2591-2600)
IEEE DOI
2004
gradient methods, image retrieval, natural language processing,
neural nets, question answering (information retrieval), HINT,
Correlation
BibRef
Zhang, Y.,
Niebles, J.C.,
Soto, A.,
Interpretable Visual Question Answering by Visual Grounding From
Attention Supervision Mining,
WACV19(349-357)
IEEE DOI
1904
data mining, data visualisation, image representation,
learning (artificial intelligence),
Computer architecture
BibRef
Shi, J.[Jing],
Xu, J.[Jia],
Gong, B.Q.[Bo-Qing],
Xu, C.L.[Chen-Liang],
Not All Frames Are Equal: Weakly-Supervised Video Grounding With
Contextual Similarity and Visual Clustering Losses,
CVPR19(10436-10444).
IEEE DOI
2002
BibRef
Datta, S.[Samyak],
Sikka, K.[Karan],
Roy, A.[Anirban],
Ahuja, K.[Karuna],
Parikh, D.[Devi],
Divakaran, A.[Ajay],
Align2Ground: Weakly Supervised Phrase Grounding Guided by
Image-Caption Alignment,
ICCV19(2601-2610)
IEEE DOI
2004
image representation, image retrieval,
learning (artificial intelligence), Image coding
BibRef
Fang, Z.Y.[Zhi-Yuan],
Kong, S.[Shu],
Fowlkes, C.C.[Charless C.],
Yang, Y.Z.[Ye-Zhou],
Modularized Textual Grounding for Counterfactual Resilience,
CVPR19(6371-6381).
IEEE DOI
2002
BibRef
Zhuang, B.,
Wu, Q.,
Shen, C.,
Reid, I.D.,
van den Hengel, A.J.[Anton J.],
Parallel Attention: A Unified Framework for Visual Object Discovery
Through Dialogs and Queries,
CVPR18(4252-4261)
IEEE DOI
1812
Visualization, Task analysis, Cognition, Proposals, Grounding,
Correlation
BibRef
Yang, Z.Y.[Zheng-Yuan],
Chen, T.L.[Tian-Lang],
Wang, L.W.[Li-Wei],
Luo, J.B.[Jie-Bo],
Improving One-Stage Visual Grounding by Recursive Sub-query
Construction,
ECCV20(XIV:387-404).
Springer DOI
2011
Code, Query.
WWW Link.
BibRef
Liu, D.Q.[Da-Qing],
Zhang, H.W.[Han-Wang],
Zha, Z.J.[Zheng-Jun],
Wu, F.[Feng],
Learning to Assemble Neural Module Tree Networks for Visual Grounding,
ICCV19(4672-4681)
IEEE DOI
2004
approximation theory, data visualisation, grammars,
learning (artificial intelligence), Training
BibRef
Sadhu, A.,
Chen, K.,
Nevatia, R.,
Zero-Shot Grounding of Objects From Natural Language Queries,
ICCV19(4693-4702)
IEEE DOI
2004
image classification, learning (artificial intelligence), Visualization,
natural language processing, object detection, query processing.
BibRef
Yang, Z.Y.[Zheng-Yuan],
Gong, B.Q.[Bo-Qing],
Wang, L.W.[Li-Wei],
Huang, W.B.[Wen-Bing],
Yu, D.[Dong],
Luo, J.B.[Jie-Bo],
A Fast and Accurate One-Stage Approach to Visual Grounding,
ICCV19(4682-4692)
IEEE DOI
2004
document image processing, feature extraction, image fusion,
image segmentation, natural language processing, Encoding
BibRef
Rohrbach, A.[Anna],
Rohrbach, M.[Marcus],
Tang, S.[Siyu],
Oh, S.J.[Seong Joon],
Schiele, B.[Bernt],
Generating Descriptions with Grounded and Co-referenced People,
CVPR17(4196-4206)
IEEE DOI
1711
Movie description.
Grounding, Head, Joining processes, Motion pictures, Videos, Visualization
BibRef
Zhu, Y.,
Kiros, R.,
Zemel, R.,
Salakhutdinov, R.,
Urtasun, R.,
Torralba, A.B.,
Fidler, S.,
Aligning Books and Movies: Towards Story-Like Visual Explanations by
Watching Movies and Reading Books,
ICCV15(19-27)
IEEE DOI
1602
Grounding
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Referring Expression Comprehension .