19.4.3.3.4 Visual Grounding, Grounding Expressions

Chapter Contents (Back)
Question Answer. Grounding. Visual Grounding. Visual Dialog. Mostly a subset of the related:
See also Visual Question Answering, Query, VQA.

Visual7W visual question answering,
Large-scale visual question answering (QA) dataset, with object-level groundings and multimodal answers. WWW Link.
Dataset, Visual Question Answering.

Liang, J.W.[Jun-Wei], Jiang, L.[Lu], Cao, L.L.[Liang-Liang], Kalantidis, Y.[Yannis], Li, L.J.[Li-Jia], Hauptmann, A.G.[Alexander G.],
Focal Visual-Text Attention for Memex Question Answering,
PAMI(41), No. 8, August 2019, pp. 1893-1908.
IEEE DOI 1907
BibRef
Earlier: A1, A2, A3, A5, A6, Only:
Focal Visual-Text Attention for Visual Question Answering,
CVPR18(6135-6143)
IEEE DOI 1812
Task analysis, Knowledge discovery, Visualization, Grounding, Metadata, Cognition, Photo albums, question answering, memex. Visualization, Videos, Computational modeling, Correlation. BibRef

Riquelme, F.[Felipe], de Goyeneche, A.[Alfredo], Zhang, Y.D.[Yun-Dong], Niebles, J.C.[Juan Carlos], Soto, A.[Alvaro],
Explaining VQA predictions using visual grounding and a knowledge base,
IVC(101), 2020, pp. 103968.
Elsevier DOI 2009
Deep Learning, Attention, Supervision, Knowledge Base, Interpretability, Explainability BibRef

Niu, Y.L.[Yu-Lei], Zhang, H.W.[Han-Wang], Lu, Z.W.[Zhi-Wu], Chang, S.F.[Shih-Fu],
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions,
PAMI(43), No. 1, January 2021, pp. 347-359.
IEEE DOI 2012
Grounding, Context modeling, Visualization, Task analysis, Pediatrics, Bayes methods, Annotations, referring expression generation BibRef

Yang, S.[Sibei], Li, G.[Guanbin], Yu, Y.Z.[Yi-Zhou],
Relationship-Embedded Representation Learning for Grounding Referring Expressions,
PAMI(43), No. 8, August 2021, pp. 2765-2779.
IEEE DOI 2107
BibRef
Earlier:
Cross-Modal Relationship Inference for Grounding Referring Expressions,
CVPR19(4140-4149).
IEEE DOI 2002
Locate the object instance in an image described by a referring expression. Visualization, Semantics, Grounding, Proposals, Data mining, Logic gates, Feature extraction, Referring expressions, gated graph convolutional network. Locate target object based on natural language descriptions. BibRef

Yang, Z.Y.[Zheng-Yuan], Kumar, T.[Tushar], Chen, T.L.[Tian-Lang], Su, J.S.[Jing-Song], Luo, J.B.[Jie-Bo],
Grounding-Tracking-Integration,
CirSysVideo(31), No. 9, September 2021, pp. 3433-3443.
IEEE DOI 2109
Grounding, Target tracking, Visualization, History, Task analysis, Object tracking, Annotations, Tracking by language BibRef

Zhang, W.X.[Wei-Xia], Ma, C.[Chao], Wu, Q.[Qi], Yang, X.K.[Xiao-Kang],
Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning,
CirSysVideo(31), No. 9, September 2021, pp. 3469-3481.
IEEE DOI 2109
Navigation, Training, Trajectory, Visualization, Task analysis, Grounding, Generators, Vision-and-language, embodied navigation, adversarial learning BibRef

Zhai, S.L.[Song-Lin], Guo, G.B.[Gui-Bing], Yuan, F.J.[Fa-Jie], Liu, Y.[Yuan], Wang, X.W.[Xing-Wei],
VSE-fs: Fast Full-Sample Visual Semantic Embedding,
IEEE_Int_Sys(36), No. 4, July 2021, pp. 3-12.
IEEE DOI 2109
Construct a joint embedding space between visual features and semantic information. Computational modeling, Training, Integrated circuits, Time complexity, Semantics, Visualization, Intelligent systems, Negative Sampling BibRef

Sun, M.J.[Ming-Jie], Xiao, J.[Jimin], Lim, E.G.[Eng Gee], Liu, S.[Si], Goulermas, J.Y.[John Y.],
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding,
PAMI(43), No. 11, November 2021, pp. 4189-4195.
IEEE DOI 2110
Image reconstruction, Training, Proposals, Visualization, Task analysis, Linguistics, Grounding, discriminative triad matching BibRef

Bargal, S.A.[Sarah Adel], Zunino, A.[Andrea], Petsiuk, V.[Vitali], Zhang, J.M.[Jian-Ming], Saenko, K.[Kate], Murino, V.[Vittorio], Sclaroff, S.[Stan],
Guided Zoom: Zooming into Network Evidence to Refine Fine-Grained Model Decisions,
PAMI(43), No. 11, November 2021, pp. 4196-4202.
IEEE DOI 2110
Grounding, Training, Predictive models, Annotations, Location awareness, Correlation, Visualization, Explainable AI, convolutional neural networks BibRef

Yang, W.F.[Wen-Fei], Zhang, T.Z.[Tian-Zhu], Zhang, Y.D.[Yong-Dong], Wu, F.[Feng],
Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding,
IP(30), 2021, pp. 3252-3262.
IEEE DOI 2103
Grounding, Annotations, Training, Feature extraction, Computational modeling, Task analysis, temporal sentence grounding BibRef

Luo, W.[Wang], Zhang, T.Z.[Tian-Zhu], Yang, W.[Wenfei], Liu, J.G.[Jin-Gen], Mei, T.[Tao], Wu, F.[Feng], Zhang, Y.D.[Yong-Dong],
Action Unit Memory Network for Weakly Supervised Temporal Action Localization,
CVPR21(9964-9974)
IEEE DOI 2111
Location awareness, Training, Knowledge engineering, Motion segmentation, Refining, Interference, Benchmark testing BibRef

Hong, R.[Richang], Liu, D.[Daqing], Mo, X.Y.[Xiao-Yu], He, X.N.[Xiang-Nan], Zhang, H.[Hanwang],
Learning to Compose and Reason with Language Tree Structures for Visual Grounding,
PAMI(44), No. 2, February 2022, pp. 684-696.
IEEE DOI 2201
Grounding, Visualization, Dogs, Natural languages, Cognition, Computational modeling, Semantics, Fine-grained detection, visual reasoning BibRef

Bin, Y.[Yi], Ding, Y.J.[Yu-Juan], Peng, B.[Bo], Peng, L.[Liang], Yang, Y.[Yang], Chua, T.S.[Tat-Seng],
Entity Slot Filling for Visual Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 52-62.
IEEE DOI 2201
Task analysis, Visualization, Neural networks, Adaptation models, Filling, Grounding, Training, Image captioning, dataset BibRef

Chu, C.[Chenhui], Oliveira, V.[Vinicius], Virgo, F.G.[Felix Giovanni], Otani, M.[Mayu], Garcia, N.[Noa], Nakashima, Y.[Yuta],
The semantic typology of visually grounded paraphrases,
CVIU(215), 2022, pp. 103333.
Elsevier DOI 2201
Vision and language, Image interpretation, Visual grounded paraphrases, Semantic typology, Dataset BibRef

Deng, C.R.[Chao-Rui], Wu, Q.[Qi], Wu, Q.Y.[Qing-Yao], Hu, F.Y.[Fu-Yuan], Lyu, F.[Fan], Tan, M.K.[Ming-Kui],
Visual Grounding Via Accumulated Attention,
PAMI(44), No. 3, March 2022, pp. 1670-1684.
IEEE DOI 2202
BibRef
Earlier: CVPR18(7746-7755)
IEEE DOI 1812
Task analysis, Grounding, Cognition, Visual grounding, bounding box regression. Visualization, Feature extraction, Grounding, Natural languages, Redundancy, Task analysis, Computational modeling BibRef

Plummer, B.A.[Bryan A.], Shih, K.J.[Kevin J.], Li, Y.C.[Yi-Chen], Xu, K.[Ke], Lazebnik, S.[Svetlana], Sclaroff, S.[Stan], Saenko, K.[Kate],
Revisiting Image-Language Networks for Open-Ended Phrase Detection,
PAMI(44), No. 4, April 2022, pp. 2155-2167.
IEEE DOI 2203
Task analysis, Grounding, Visualization, Feature extraction, Benchmark testing, Detectors, Vocabulary, Vision and language, representation learning BibRef

Burns, A.[Andrea], Tan, R.[Reuben], Saenko, K.[Kate], Sclaroff, S.[Stan], Plummer, B.A.[Bryan A.],
Language Features Matter: Effective Language Representations for Vision-Language Tasks,
ICCV19(7473-7482)
IEEE DOI 2004
Code, Visualization.
WWW Link. data visualisation, graph theory, image representation, learning (artificial intelligence), Grounding BibRef

Arbelle, A.[Assaf], Doveh, S.[Sivan], Alfassy, A.[Amit], Shtok, J.[Joseph], Lev, G.[Guy], Schwartz, E.[Eli], Kuehne, H.[Hilde], Levi, H.B.[Hila Barak], Sattigeri, P.[Prasanna], Panda, R.[Rameswar], Chen, C.F.[Chun-Fu], Bronstein, A.M.[Alex M.], Saenko, K.[Kate], Ullman, S.[Shimon], Giryes, R.[Raja], Feris, R.[Rogerio], Karlinsky, L.[Leonid],
Detector-Free Weakly Supervised Grounding by Separation,
ICCV21(1781-1792)
IEEE DOI 2203
Training, Location awareness, Visualization, Image segmentation, Grounding, Genomics, Detectors, Vision + language, BibRef

Whitehead, S.[Spencer], Wu, H.[Hui], Ji, H.[Heng], Feris, R.[Rogerio], Saenko, K.[Kate],
Separating Skills and Concepts for Novel Visual Question Answering,
CVPR21(5628-5637)
IEEE DOI 2111
Training, Visualization, Grounding, Annotations, Knowledge discovery, Encoding BibRef

Yu, X.T.[Xin-Tong], Zhang, H.M.[Hong-Ming], Hong, R.X.[Rui-Xin], Song, Y.Q.[Yang-Qiu], Zhang, C.S.[Chang-Shui],
VD-PCR: Improving visual dialog with pronoun coreference resolution,
PR(125), 2022, pp. 108540.
Elsevier DOI 2203
Vision and language, Visual dialog, Pronoun coreference resolution BibRef

Yuan, Y.T.[Yi-Tian], Ma, L.[Lin], Wang, J.W.[Jing-Wen], Liu, W.[Wei], Zhu, W.[Wenwu],
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos,
PAMI(44), No. 5, May 2022, pp. 2725-2741.
IEEE DOI 2204
Videos, Grounding, Semantics, Proposals, Task analysis, Convolution, Visualization, Temporal sentence grounding in videos (TSG), temporal convolution BibRef

Lin, L.[Liang], Yan, P.X.[Peng-Xiang], Xu, X.Q.[Xiao-Qian], Yang, S.[Sibei], Zeng, K.[Kun], Li, G.[Guanbin],
Structured Attention Network for Referring Image Segmentation,
MultMed(24), No. 2022, pp. 1922-1932.
IEEE DOI 2204
Visualization, Linguistics, Image segmentation, Cognition, Feature extraction, Semantics, Task analysis, cross-modal reasoning BibRef

Yang, X.[Xu], Wang, H.[Hao], Xie, D.[De], Deng, C.[Cheng], Tao, D.C.[Da-Cheng],
Object-Agnostic Transformers for Video Referring Segmentation,
IP(31), No. 2022, pp. 2839-2849.
IEEE DOI 2204
Task analysis, Visualization, Transformers, Feature extraction, Object detection, Image segmentation, Context modeling, video grounding BibRef

He, S.[Su], Yang, X.F.[Xiao-Feng], Lin, G.S.[Guo-Sheng],
Learning language to symbol and language to vision mapping for visual grounding,
IVC(122), 2022, pp. 104451.
Elsevier DOI 2205
Cross modality, Visual grounding, Neural symbolic reasoning BibRef

Jiang, W.H.[Wen-Hui], Zhu, M.[Minwei], Fang, Y.M.[Yu-Ming], Shi, G.M.[Guang-Ming], Zhao, X.W.[Xiao-Wei], Liu, Y.[Yang],
Visual Cluster Grounding for Image Captioning,
IP(31), 2022, pp. 3920-3934.
IEEE DOI 2206
Grounding, Visualization, Proposals, Annotations, Transformers, Task analysis, Decoding, Image captioning, attention evaluation, grounding supervision BibRef

Liao, Y.[Yue], Zhang, A.[Aixi], Chen, Z.Y.[Zhi-Yuan], Hui, T.R.[Tian-Rui], Liu, S.[Si],
Progressive Language-Customized Visual Feature Learning for One-Stage Visual Grounding,
IP(31), 2022, pp. 4266-4277.
IEEE DOI 2207
Visualization, Feature extraction, Grounding, Linguistics, Task analysis, Detectors, Representation learning, cross-modal fusion BibRef

Ding, X.P.[Xin-Peng], Wang, N.N.[Nan-Nan], Zhang, S.W.[Shi-Wei], Huang, Z.Y.[Zi-Yuan], Li, X.M.[Xiao-Meng], Tang, M.Q.[Ming-Qian], Liu, T.L.[Tong-Liang], Gao, X.B.[Xin-Bo],
Exploring Language Hierarchy for Video Grounding,
IP(31), 2022, pp. 4693-4706.
IEEE DOI 2207
Proposals, Grounding, Training, Location awareness, Task analysis, Semantics, Feature extraction, Video and language, language hierarchy BibRef

Wang, Y.[Yuechen], Deng, J.J.[Jia-Jun], Zhou, W.G.[Wen-Gang], Li, H.Q.[Hou-Qiang],
Weakly Supervised Temporal Adjacent Network for Language Grounding,
MultMed(24), 2022, pp. 3276-3286.
IEEE DOI 2207
Grounding, Semantics, Feature extraction, Visualization, Task analysis, Annotations, Training, Temporal language grounding, multiple instance learning BibRef

Xu, Z.[Zhe], Chen, D.[Da], Wei, K.[Kun], Deng, C.[Cheng], Xue, H.[Hui],
HiSA: Hierarchically Semantic Associating for Video Temporal Grounding,
IP(31), 2022, pp. 5178-5188.
IEEE DOI 2208
Grounding, Feature extraction, Proposals, Task analysis, Semantics, Representation learning, Image segmentation, cross-guided contrast BibRef

Wang, X.[Xing], Xie, D.[De], Zheng, Y.[Yuanshi],
Referring expression grounding by multi-context reasoning,
PRL(160), 2022, pp. 66-72.
Elsevier DOI 2208
Referring expression grounding, Reasoning, Graph networks BibRef

Gao, J.L.[Jia-Lin], Sun, X.[Xin], Ghanem, B.[Bernard], Zhou, X.[Xi], Ge, S.M.[Shi-Ming],
Efficient Video Grounding With Which-Where Reading Comprehension,
CirSysVideo(32), No. 10, October 2022, pp. 6900-6913.
IEEE DOI 2210
Grounding, Proposals, Visualization, Location awareness, Task analysis, Reinforcement learning, Germanium, deep learning BibRef

Zhou, H.[Hao], Zhang, C.Y.[Chong-Yang], Luo, Y.[Yan], Hu, C.P.[Chuan-Ping], Zhang, W.J.[Wen-Jun],
Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding,
CirSysVideo(32), No. 10, October 2022, pp. 7190-7203.
IEEE DOI 2210
Annotations, Grounding, Task analysis, Uncertainty, Measurement, Predictive models, Optimization, Temporal grounding, label uncertainty BibRef

Shen, H.T.[Heng Tao], Chen, C.[Cheng], Wang, P.[Peng], Gao, L.L.[Lian-Li], Wang, M.[Meng], Song, J.K.[Jing-Kuan],
Continual Referring Expression Comprehension via Dual Modular Memorization,
IP(31), 2022, pp. 6694-6706.
IEEE DOI 2211
Task analysis, Training, Benchmark testing, Training data, Grounding, Data models, Visualization, Continual learning, lifelong learning, visual grounding BibRef

Tang, Z.H.[Zong-Heng], Liao, Y.[Yue], Liu, S.[Si], Li, G.B.[Guan-Bin], Jin, X.J.[Xiao-Jie], Jiang, H.X.[Hong-Xu], Yu, Q.[Qian], Xu, D.[Dong],
Human-Centric Spatio-Temporal Video Grounding With Visual Transformers,
CirSysVideo(32), No. 12, December 2022, pp. 8238-8249.
IEEE DOI 2212
Grounding, Visualization, Electron tubes, Location awareness, Power transformers, Spatial temporal resolution, dataset BibRef

Tang, H.Y.[Hao-Yu], Zhu, J.[Jihua], Wang, L.[Lin], Zheng, Q.H.[Qing-Hai], Zhang, T.W.[Tian-Wei],
Multi-Level Query Interaction for Temporal Language Grounding,
ITS(23), No. 12, December 2022, pp. 25479-25488.
IEEE DOI 2212
Semantics, Task analysis, Grounding, Proposals, Syntactics, Location awareness, Feature extraction, Human-machine interface, multi-level interaction BibRef

Suo, W.[Wei], Sun, M.Y.[Meng-Yang], Wang, P.[Peng], Zhang, Y.N.[Yan-Ning], Wu, Q.[Qi],
Rethinking and Improving Feature Pyramids for One-Stage Referring Expression Comprehension,
IP(32), 2023, pp. 854-864.
IEEE DOI 2301
Task analysis, Visualization, Head, Semantics, Object detection, Neck, Computational modeling, Referring expression comprehension, feature pyramids network BibRef


Chou, S.H.[Shih-Han], Fan, Z.C.[Zi-Cong], Little, J.J.[James J.], Sigal, L.[Leonid],
Semi-Supervised Grounding Alignment for Multi-Modal Feature Learning,
CRV22(48-57)
IEEE DOI 2301
Representation learning, Training, Visualization, Grounding, Annotations, Benchmark testing, grounding, VCR BibRef

Gupta, K.[Kshitij], Gautam, D.[Devansh], Mamidi, R.[Radhika],
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation,
ICPR22(1734-1741)
IEEE DOI 2212
Training, Visualization, Analytical models, Pipelines, Transformers, Question answering (information retrieval), Data models BibRef

Chen, D.Z.Y.[Dave Zhen-Yu], Wu, Q.R.[Qi-Rui], Nießner, M.[Matthias], Chang, A.X.[Angel X.],
D 3 Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding,
ECCV22(XXXII:487-505).
Springer DOI 2211
BibRef

Parcalabescu, L., Frank, A.,
Exploring Phrase Grounding without Training: Contextualisation and Extension to Text-Based Image Retrieval,
MULWS20(4137-4146)
IEEE DOI 2008
Grounding, Visualization, Detectors, Task analysis, Linguistics, Proposals, Training BibRef

Tung, H., Harley, A.W., Huang, L., Fragkiadaki, K.,
Reward Learning from Narrated Demonstrations,
CVPR18(7004-7013)
IEEE DOI 1812
Visualization, Natural languages, Detectors, Grounding, Speech recognition, Microphones BibRef

Cohen, N.[Niv], Gal, R.[Rinon], Meirom, E.A.[Eli A.], Chechik, G.[Gal], Atzmon, Y.[Yuval],
'This Is My Unicorn, Fluffy': Personalizing Frozen Vision-Language Representations,
ECCV22(XX:558-577).
Springer DOI 2211
BibRef

Lee, J.H.[Ju-Hee], Kang, J.W.[Je-Won],
Relation Enhanced Vision Language Pre-Training,
ICIP22(2286-2290)
IEEE DOI 2211
Visualization, Semantics, Force, Transformers, Task analysis, vision-language pre-training BibRef

Khan, Z.[Zaid], Kumar, B.G.V.[B. G. Vijay], Yu, X.[Xiang], Schulter, S.[Samuel], Chandraker, M.[Manmohan], Fu, Y.[Yun],
Single-Stream Multi-level Alignment for Vision-Language Pretraining,
ECCV22(XXXVI:735-751).
Springer DOI 2211
BibRef

Wang, R.[Renhao], Zhao, H.[Hang], Gao, Y.[Yang],
CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation,
ECCV22(XXXI:260-277).
Springer DOI 2211
BibRef

Yang, Z.Y.[Zheng-Yuan], Gan, Z.[Zhe], Wang, J.F.[Jian-Feng], Hu, X.W.[Xiao-Wei], Ahmed, F.[Faisal], Liu, Z.C.[Zi-Cheng], Lu, Y.[Yumao], Wang, L.J.[Li-Juan],
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling,
ECCV22(XXXVI:521-539).
Springer DOI 2211
BibRef

Li, H.[Huan], Wei, P.[Ping], Li, J.P.[Jia-Peng], Ma, Z.[Zeyu], Shang, J.[Jiahui], Zheng, N.N.[Nan-Ning],
Asymmetric Relation Consistency Reasoning for Video Relation Grounding,
ECCV22(XXXV:125-141).
Springer DOI 2211
BibRef

Dvornik, N.[Nikita], Hadji, I.[Isma], Pham, H.[Hai], Bhatt, D.[Dhaivat], Martinez, B.[Brais], Fazly, A.[Afsaneh], Jepson, A.D.[Allan D.],
Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization,
ECCV22(XXXV:319-335).
Springer DOI 2211
BibRef

Qu, M.X.[Meng-Xue], Wu, Y.[Yu], Liu, W.[Wu], Gong, Q.Q.[Qi-Qi], Liang, X.D.[Xiao-Dan], Russakovsky, O.[Olga], Zhao, Y.[Yao], Wei, Y.C.[Yun-Chao],
SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding,
ECCV22(XXXV:546-562).
Springer DOI 2211
BibRef

Zhu, C.Y.[Chao-Yang], Zhou, Y.[Yiyi], Shen, Y.[Yunhang], Luo, G.[Gen], Pan, X.[Xingjia], Chen, M.L.C.[Mingbao Lin. Chao], Cao, L.J.[Liu-Juan], Sun, X.S.[Xiao-Shuai], Ji, R.R.[Rong-Rong],
SeqTR: A Simple Yet Universal Network for Visual Grounding,
ECCV22(XXXV:598-615).
Springer DOI 2211
BibRef

Khan, A.U.[Aisha Urooj], Kuehne, H.[Hilde], Gan, C.[Chuang], da Vitoria Lobo, N.[Niels], Shah, M.[Mubarak],
Weakly Supervised Grounding for VQA in Vision-Language Transformers,
ECCV22(XXXV:652-670).
Springer DOI 2211
BibRef

Hao, J.[Jiachang], Sun, H.F.[Hai-Feng], Ren, P.F.[Peng-Fei], Wang, J.Y.[Jing-Yu], Qi, Q.[Qi], Liao, J.X.[Jian-Xin],
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding,
ECCV22(XXXVI:130-147).
Springer DOI 2211
BibRef

Jain, A.[Ayush], Gkanatsios, N.[Nikolaos], Mediratta, I.[Ishita], Fragkiadaki, K.[Katerina],
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds,
ECCV22(XXXVI:417-433).
Springer DOI 2211
BibRef

Heisler, M.[Morgan], Banitalebi-Dehkordi, A.[Amin], Zhang, Y.[Yong],
SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding,
ECCV22(XXXVI:610-626).
Springer DOI 2211
BibRef

Min, S.[Seonwoo], Park, N.[Nokyung], Kim, S.[Siwon], Park, S.H.[Seung-Hyun], Kim, J.[Jinkyu],
Grounding Visual Representations with Texts for Domain Generalization,
ECCV22(XXXVII:37-53).
Springer DOI 2211
BibRef

Wang, J.[Jia], Wu, H.Y.[Hung-Yi], Chen, J.C.[Jun-Cheng], Shuai, H.H.[Hong-Han], Cheng, W.H.[Wen-Huang],
Residual Graph Attention Network and Expression-Respect Data Augmentation Aided Visual Grounding,
ICIP22(326-330)
IEEE DOI 2211
Visualization, Grounding, Training data, Cognition, Data models, Complexity theory, Residual graph attention network, Visual grounding BibRef

Xiong, Z.[Zeyu], Liu, D.[Daizong], Zhou, P.[Pan],
Gaussian Kernel-Based Cross Modal Network for Spatio-Temporal Video Grounding,
ICIP22(2481-2485)
IEEE DOI 2211
Heating systems, Grounding, Natural languages, Electron tubes, Task analysis, anchor-free, Gaussian kernel, spatial-temporal video grounding BibRef

Alaniz, S.[Stephan], Federici, M.[Marco], Akata, Z.[Zeynep],
Compositional Mixture Representations for Vision and Text,
L3D-IVU22(4201-4210)
IEEE DOI 2210
Representation learning, Visualization, Computational modeling, Semantics, Image retrieval, Employment, Object detection BibRef

Cho, J.[Junhyeong], Yoon, Y.[Youngseok], Kwak, S.[Suha],
Collaborative Transformers for Grounded Situation Recognition,
CVPR22(19627-19636)
IEEE DOI 2210
Measurement, Training, Visualization, Computational modeling, Estimation, Collaboration, Predictive models, Visual reasoning BibRef

Singh, A.[Amanpreet], Hu, R.[Ronghang], Goswami, V.[Vedanuj], Couairon, G.[Guillaume], Galuba, W.[Wojciech], Rohrbach, M.[Marcus], Kiela, D.[Douwe],
FLAVA: A Foundational Language And Vision Alignment Model,
CVPR22(15617-15629)
IEEE DOI 2210
Analytical models, Computational modeling, Pattern recognition, Task analysis, Vision+language BibRef

Zhang, K.[Kun], Mao, Z.D.[Zhen-Dong], Wang, Q.[Quan], Zhang, Y.D.[Yong-Dong],
Negative-Aware Attention Framework for Image-Text Matching,
CVPR22(15640-15649)
IEEE DOI 2210
Force measurement, Codes, Machine vision, Optimization methods, Benchmark testing, Pattern recognition, Vision+language, Vision applications and systems BibRef

Saini, N.[Nirat], Pham, K.[Khoi], Shrivastava, A.[Abhinav],
Disentangling Visual Embeddings for Attributes and Objects,
CVPR22(13648-13657)
IEEE DOI 2210

WWW Link. Visualization, Codes, Benchmark testing, Linguistics, Feature extraction, Recognition: detection, Visual reasoning BibRef

Ge, Y.Y.[Yu-Ying], Ge, Y.X.[Yi-Xiao], Liu, X.H.[Xi-Hui], Wang, J.P.[Jin-Peng], Wu, J.P.[Jian-Ping], Shan, Y.[Ying], Qie, X.[Xiaohu], Luo, P.[Ping],
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval,
ECCV22(XXXV:691-708).
Springer DOI 2211
BibRef

Wang, A.J.P.[Alex Jin-Peng], Ge, Y.X.[Yi-Xiao], Cai, G.[Guanyu], Yan, R.[Rui], Lin, X.D.[Xu-Dong], Shan, Y.[Ying], Qie, X.[Xiaohu], Shou, M.Z.[Mike Zheng],
Object-aware Video-language Pre-training for Retrieval,
CVPR22(3303-3312)
IEEE DOI 2210
Training, Visualization, Machine vision, Semantics, Detectors, Transformers, retrieval BibRef

Li, D.X.[Dong-Xu], Li, J.N.[Jun-Nan], Li, H.D.[Hong-Dong], Niebles, J.C.[Juan Carlos], Hoi, S.C.H.[Steven C.H.],
Align and Prompt: Video-and-Language Pre-training with Entity Prompts,
CVPR22(4943-4953)
IEEE DOI 2210
Representation learning, Vocabulary, Visualization, Semantics, Detectors, Transformers, Pattern recognition, Vision + language, Video analysis and understanding BibRef

Xue, H.W.[Hong-Wei], Hang, T.[Tiankai], Zeng, Y.H.[Yan-Hong], Sun, Y.C.[Yu-Chong], Liu, B.[Bei], Yang, H.[Huan], Fu, J.L.[Jian-Long], Guo, B.N.[Bai-Ning],
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions,
CVPR22(5026-5035)
IEEE DOI 2210
Visualization, Video on demand, Computational modeling, Superresolution, Semantics, Transformers, Feature extraction, Self- semi- meta- unsupervised learning BibRef

Sammani, F.[Fawaz], Mukherjee, T.[Tanmoy], Deligiannis, N.[Nikos],
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks,
CVPR22(8312-8322)
IEEE DOI 2210
Current measurement, Computational modeling, Natural languages, Decision making, Memory management, Predictive models, Vision + language BibRef

Lin, B.Q.[Bing-Qian], Zhu, Y.[Yi], Chen, Z.C.[Zi-Cong], Liang, X.[Xiwen], Liu, J.Z.[Jian-Zhuang], Liang, X.D.[Xiao-Dan],
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts,
CVPR22(15375-15385)
IEEE DOI 2210
Visualization, Adaptation models, Navigation, Transformers, Nonhomogeneous media, Pattern recognition, Vision + language BibRef

Dou, Z.Y.[Zi-Yi], Xu, Y.C.[Yi-Chong], Gan, Z.[Zhe], Wang, J.F.[Jian-Feng], Wang, S.H.[Shuo-Hang], Wang, L.J.[Li-Juan], Zhu, C.G.[Chen-Guang], Zhang, P.C.[Peng-Chuan], Yuan, L.[Lu], Peng, N.[Nanyun], Liu, Z.C.[Zi-Cheng], Zeng, M.[Michael],
An Empirical Study of Training End-to-End Vision-and-Language Transformers,
CVPR22(18145-18155)
IEEE DOI 2210
Meters, Training, Codes, Computational modeling, Transformers, Pattern recognition, Vision + language, Machine learning BibRef

Xu, Z.P.[Zi-Peng], Lin, T.W.[Tian-Wei], Tang, H.[Hao], Li, F.[Fu], He, D.L.[Dong-Liang], Sebe, N.[Nicu], Timofte, R.[Radu], Van Gool, L.J.[Luc J.], Ding, E.[Errui],
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model,
CVPR22(18208-18217)
IEEE DOI 2210
Personal protective equipment, Measurement, Training, Annotations, Face recognition, Computational modeling, Face and gestures BibRef

Du, Y.[Yu], Wei, F.Y.[Fang-Yun], Zhang, Z.[Zihe], Shi, M.J.[Miao-Jing], Gao, Y.[Yue], Li, G.Q.[Guo-Qi],
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model,
CVPR22(14064-14073)
IEEE DOI 2210
Training, Representation learning, Visualization, Transfer learning, Object detection, Detectors, Self- semi- meta- unsupervised learning BibRef

Chang, Y.S.[Ying-Shan], Cao, G.H.[Gui-Hong], Narang, M.[Mridu], Gao, J.F.[Jian-Feng], Suzuki, H.[Hisami], Bisk, Y.[Yonatan],
WebQA: Multihop and Multimodal QA,
CVPR22(16474-16483)
IEEE DOI 2210
Knowledge engineering, Representation learning, Visualization, Transformers, Cognition, Visual reasoning BibRef

Zellers, R.[Rowan], Lu, J.[Jiasen], Lu, X.[Ximing], Yu, Y.[Youngjae], Zhao, Y.P.[Yan-Peng], Salehi, M.[Mohammadreza], Kusupati, A.[Aditya], Hessel, J.[Jack], Farhadi, A.[Ali], Choi, Y.[Yejin],
MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound,
CVPR22(16354-16366)
IEEE DOI 2210
Training, Representation learning, Visualization, Ethics, Video on demand, Navigation, Stars, Vision + language, Visual reasoning BibRef

Gupta, T.[Tanmay], Kamath, A.[Amita], Kembhavi, A.[Aniruddha], Hoiem, D.[Derek],
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture,
CVPR22(16378-16388)
IEEE DOI 2210
Training, Visualization, Machine vision, Object detection, Network architecture, Vision + language BibRef

Materzynska, J.[Joanna], Torralba, A.[Antonio], Bau, D.[David],
Disentangling visual and written concepts in CLIP,
CVPR22(16389-16398)
IEEE DOI 2210
Visualization, Image coding, Benchmark testing, Cognition, Pattern recognition, Task analysis, Vision + language, Visual reasoning BibRef

Li, M.[Manling], Xu, R.[Ruochen], Wang, S.[Shuohang], Zhou, L.[Luowei], Lin, X.D.[Xu-Dong], Zhu, C.G.[Chen-Guang], Zeng, M.[Michael], Ji, H.[Heng], Chang, S.F.[Shih-Fu],
CLIP-Event: Connecting Text and Images with Event Structures,
CVPR22(16399-16408)
IEEE DOI 2210
Codes, Computational modeling, Image retrieval, Benchmark testing, Information retrieval, Pattern recognition, Vision + language BibRef

Surís, D.[Dídac], Epstein, D.[Dave], Vondrick, C.[Carl],
Globetrotter: Connecting Languages by Connecting Images,
CVPR22(16453-16463)
IEEE DOI 2210
Training, Deep learning, Visualization, Image segmentation, Codes, Computational modeling, Vision + language BibRef

Zhu, H.D.[Hai-Dong], Sadhu, A.[Arka], Zheng, Z.H.[Zhao-Heng], Nevatia, R.[Ram],
Utilizing Every Image Object for Semi-supervised Phrase Grounding,
WACV21(2209-2218)
IEEE DOI 2106
Localize an object in the image given a referring expression. Training, Grounding, Annotations, Detectors, Task analysis BibRef

Zhong, Y.[Yiwu], Yang, J.W.[Jian-Wei], Zhang, P.[Pengchuan], Li, C.Y.[Chun-Yuan], Codella, N.[Noel], Li, L.H.[Liunian Harold], Zhou, L.[Luowei], Dai, X.[Xiyang], Yuan, L.[Lu], Li, Y.[Yin], Gao, J.F.[Jian-Feng],
RegionCLIP: Region-based Language-Image Pretraining,
CVPR22(16772-16782)
IEEE DOI 2210
Representation learning, Visualization, Technological innovation, Image recognition, Text recognition, Transfer learning, Vision + language BibRef

Sung, Y.L.[Yi-Lin], Cho, J.[Jaemin], Bansal, M.[Mohit],
VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks,
CVPR22(5217-5227)
IEEE DOI 2210
Training, Adaptation models, Computational modeling, Transfer learning, Benchmark testing, Multitasking, Vision + language BibRef

Wu, D.M.[Dong-Ming], Dong, X.P.[Xing-Ping], Shao, L.[Ling], Shen, J.B.[Jian-Bing],
Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation,
CVPR22(4986-4995)
IEEE DOI 2210
Representation learning, Visualization, Adaptation models, Shape, Grounding, Semantics, Vision + language, Segmentation, grouping and shape analysis BibRef

Gao, K.[Kaifeng], Chen, L.[Long], Niu, Y.[Yulei], Shao, J.[Jian], Xiao, J.[Jun],
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs,
CVPR22(19475-19484)
IEEE DOI 2210
Image analysis, Codes, Grounding, Semantics, Bipartite graph, Pattern recognition, Scene analysis and understanding, Vision + language BibRef

Kesen, I.[Ilker], Can, O.A.[Ozan Arkan], Erdem, E.[Erkut], Erdem, A.[Aykut], Yüret, D.[Deniz],
Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters,
MULA22(4609-4619)
IEEE DOI 2210
Visualization, Image segmentation, Image color analysis, Grounding, Computational modeling, Process control, Predictive models BibRef

Nebbia, G.[Giacomo], Kovashka, A.[Adriana],
Doubling down: sparse grounding with an additional, almost-matching caption for detection-oriented multimodal pretraining,
MULA22(4641-4650)
IEEE DOI 2210
Deep learning, Visualization, Grounding, Computational modeling, Data models BibRef

Ye, J.[Jiabo], Tian, J.F.[Jun-Feng], Yan, M.[Ming], Yang, X.S.[Xiao-Shan], Wang, X.[Xuwu], Zhang, J.[Ji], He, L.[Liang], Lin, X.[Xin],
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding,
CVPR22(15481-15491)
IEEE DOI 2210
Training, Visualization, Grounding, Refining, Natural languages, Feature extraction, Vision + language, Visual reasoning BibRef

Jiang, H.[Haojun], Lin, Y.Z.[Yuan-Ze], Han, D.[Dongchen], Song, S.[Shiji], Huang, G.[Gao],
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding,
CVPR22(15492-15502)
IEEE DOI 2210
Training, Visualization, Costs, Grounding, Annotations, Computational modeling, Natural languages, Vision + language, Visual reasoning BibRef

Huang, S.[Shijia], Chen, Y.L.[Yi-Lun], Jia, J.Y.[Jia-Ya], Wang, L.W.[Li-Wei],
Multi-View Transformer for 3D Visual Grounding,
CVPR22(15503-15512)
IEEE DOI 2210
Point cloud compression, Visualization, Solid modeling, Grounding, Natural languages, Vision + language BibRef

Chen, S.[Sijia], Li, B.[Baochun],
Multi-Modal Dynamic Graph Transformer for Visual Grounding,
CVPR22(15513-15522)
IEEE DOI 2210
Visualization, Image analysis, Grounding, Computational modeling, Semantics, Natural languages, Vision + language, Scene analysis and understanding BibRef

Mavroudi, E.[Effrosyni], Vidal, R.[René],
Weakly-Supervised Generation and Grounding of Visual Descriptions with Conditional Generative Models,
CVPR22(15523-15533)
IEEE DOI 2210
Visualization, Grounding, Video description, Computational modeling, Random variables, Pattern recognition, Video analysis and understanding BibRef

Chen, S.[Shi], Zhao, Q.[Qi],
REX: Reasoning-aware and Grounded Explanation,
CVPR22(15565-15574)
IEEE DOI 2210
Visualization, Codes, Grounding, Transfer learning, Decision making, Multitasking, Vision + language, Visual reasoning BibRef

Lou, C.[Chao], Han, W.J.[Wen-Juan], Lin, Y.[Yuhuan], Zheng, Z.L.[Zi-Long],
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships,
CVPR22(15586-15595)
IEEE DOI 2210
Visualization, Grounding, Buildings, Benchmark testing, Linguistics, Pattern recognition, Vision + language, Explainable computer vision BibRef

Yang, A.[Antoine], Miech, A.[Antoine], Sivic, J.[Josef], Laptev, I.[Ivan], Schmid, C.[Cordelia],
TubeDETR: Spatio-Temporal Video Grounding with Transformers,
CVPR22(16421-16432)
IEEE DOI 2210
Location awareness, Grounding, Natural languages, Object detection, Benchmark testing, Vision + language BibRef

Luo, J.Y.[Jun-Yu], Fu, J.[Jiahui], Kong, X.[Xianghao], Gao, C.[Chen], Ren, H.B.[Hai-Bing], Shen, H.[Hao], Xia, H.X.[Hua-Xia], Liu, S.[Si],
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection,
CVPR22(16433-16442)
IEEE DOI 2210
Point cloud compression, Visualization, Solid modeling, Grounding, Detectors, Vision + language BibRef

Cai, D.[Daigang], Zhao, L.C.[Li-Chen], Zhang, J.[Jing], Sheng, L.[Lu], Xu, D.[Dong],
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds,
CVPR22(16443-16452)
IEEE DOI 2210
Training, Point cloud compression, Visualization, Grounding, Performance gain, retrieval, categorization, Vision + language, Recognition: detection BibRef

Luo, H.C.[Hong-Chen], Zhai, W.[Wei], Zhang, J.[Jing], Cao, Y.[Yang], Tao, D.C.[Da-Cheng],
Learning Affordance Grounding from Exocentric Images,
CVPR22(2242-2251)
IEEE DOI 2210
Analytical models, Visualization, Grounding, Affordances, Computational modeling, Transforms, Feature extraction, Scene analysis and understanding BibRef

Jiang, X.[Xun], Xu, X.[Xing], Zhang, J.[Jingran], Shen, F.M.[Fu-Min], Cao, Z.[Zuo], Shen, H.T.[Heng Tao],
Semi-supervised Video Paragraph Grounding with Contrastive Encoder,
CVPR22(2456-2465)
IEEE DOI 2210
Training, Grounding, Annotations, Training data, Semisupervised learning, Transformers, Data models, Vision + language BibRef

Li, J.C.[Jun-Cheng], Xie, J.L.[Jun-Lin], Qian, L.[Long], Zhu, L.C.[Lin-Chao], Tang, S.L.[Si-Liang], Wu, F.[Fei], Yang, Y.[Yi], Zhuang, Y.T.[Yue-Ting], Wang, X.E.[Xin Eric],
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning,
CVPR22(3022-3031)
IEEE DOI 2210
Grounding, Current measurement, Computational modeling, Semantics, Diversity reception, Linguistics, Vision + language BibRef

Yu, W.[Wei], Chen, W.X.[Wen-Xin], Yin, S.[Songheng], Easterbrook, S.[Steve], Garg, A.[Animesh],
Modular Action Concept Grounding in Semantic Video Prediction,
CVPR22(3595-3604)
IEEE DOI 2210
Adaptation models, Visualization, Inverse problems, Grounding, Semantics, Object detection, Predictive models, Vision + language BibRef

Soldan, M.[Mattia], Pardo, A.[Alejandro], Alcázar, J.L.[Juan León], Heilbron, F.C.[Fabian Caba], Zhao, C.[Chen], Giancola, S.[Silvio], Ghanem, B.[Bernard],
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions,
CVPR22(5016-5025)
IEEE DOI 2210
Grounding, Annotations, Pipelines, Natural languages, Machine learning, Benchmark testing, Vision + language, Video analysis and understanding BibRef

Yang, L.[Li], Xu, Y.[Yan], Yuan, C.F.[Chun-Feng], Liu, W.[Wei], Li, B.[Bing], Hu, W.M.[Wei-Ming],
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning,
CVPR22(9489-9498)
IEEE DOI 2210
Location awareness, Visualization, Grounding, Natural languages, Object detection, Transformers, Cognition, Recognition: detection, retrieval BibRef

Li, L.H.[Liunian Harold], Zhang, P.C.[Peng-Chuan], Zhang, H.[Haotian], Yang, J.W.[Jian-Wei], Li, C.Y.[Chun-Yuan], Zhong, Y.[Yiwu], Wang, L.J.[Li-Juan], Yuan, L.[Lu], Zhang, L.[Lei], Hwang, J.N.[Jenq-Neng], Chang, K.W.[Kai-Wei], Gao, J.F.[Jian-Feng],
Grounded Language-Image Pre-training,
CVPR22(10955-10965)
IEEE DOI 2210
Visualization, Image recognition, Head, Grounding, Object detection, Data models, Deep learning architectures and techniques, Vision + language BibRef

Li, Y.C.[Yi-Cong], Wang, X.[Xiang], Xiao, J.B.[Jun-Bin], Ji, W.[Wei], Chua, T.S.[Tat-Seng],
Invariant Grounding for Video Question Answering,
CVPR22(2918-2927)
IEEE DOI 2210
Visualization, Correlation, Grounding, Semantics, Predictive models, Linguistics, Question answering (information retrieval), Vision + language BibRef

Yang, Z.Y.[Zheng-Yuan], Zhang, S.Y.[Song-Yang], Wang, L.W.[Li-Wei], Luo, J.B.[Jie-Bo],
SAT: 2D Semantics Assisted Training for 3D Visual Grounding,
ICCV21(1836-1846)
IEEE DOI 2203
Training, Point cloud compression, Representation learning, Visualization, Grounding, Semantics, Vision + language, BibRef

Chen, J.W.[Jun-Wen], Golisano, Y.K.[Yu Kong],
Explainable Video Entailment with Grounded Visual Evidence,
ICCV21(2001-2010)
IEEE DOI 2203
Training, Visualization, Grounding, Computational modeling, Decision making, Focusing, Vision + language, Video analysis and understanding BibRef

Zhao, L.C.[Li-Chen], Cai, D.[Daigang], Sheng, L.[Lu], Xu, D.[Dong],
3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds,
ICCV21(2908-2917)
IEEE DOI 2203
Point cloud compression, Multiplexing, Visualization, Solid modeling, Grounding, Transformers, Vision + language BibRef

Feng, M.[Mingtao], Li, Z.[Zhen], Li, Q.[Qi], Zhang, L.[Liang], Zhang, X.[XiangDong], Zhu, G.M.[Guang-Ming], Zhang, H.[Hui], Wang, Y.[Yaonan], Mian, A.[Ajmal],
Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud,
ICCV21(3702-3711)
IEEE DOI 2203
Point cloud compression, Visualization, Correlation, Grounding, Natural languages, Detection and localization in 2D and 3D, Visual reasoning and logical representation BibRef

Ding, X.P.[Xin-Peng], Wang, N.N.[Nan-Nan], Zhang, S.[Shiwei], Cheng, D.[De], Li, X.M.[Xiao-Meng], Huang, Z.Y.[Zi-Yuan], Tang, M.Q.[Ming-Qian], Gao, X.B.[Xin-Bo],
Support-Set Based Cross-Supervision for Video Grounding,
ICCV21(11553-11562)
IEEE DOI 2203
Training, Visualization, Costs, Correlation, Grounding, Semantics, Image and video retrieval, Vision + language BibRef

Khandelwal, S.[Siddhesh], Suhail, M.[Mohammed], Sigal, L.[Leonid],
Segmentation-grounded Scene Graph Generation,
ICCV21(15859-15869)
IEEE DOI 2203
Image segmentation, Visualization, Grounding, Annotations, Genomics, Scene analysis and understanding, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Patel, S.[Shivansh], Wani, S.[Saim], Jain, U.[Unnat], Schwing, A.[Alexander], Lazebnik, S.[Svetlana], Savva, M.[Manolis], Chang, A.X.[Angel X.],
Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents,
ICCV21(15993-15943)
IEEE DOI 2203
Systematics, Navigation, Grounding, Collaboration, Task analysis, Vision for robotics and autonomous vehicles, Explainable AI, Visual reasoning and logical representation BibRef

Shi, J.[Jing], Zhong, Y.[Yiwu], Xu, N.[Ning], Li, Y.[Yin], Xu, C.L.[Chen-Liang],
A Simple Baseline for Weakly-Supervised Scene Graph Generation,
ICCV21(16373-16382)
IEEE DOI 2203
Visualization, Grounding, Computational modeling, Pipelines, Genomics, Complexity theory, Scene analysis and understanding, Vision + language BibRef

Su, R.[Rui], Yu, Q.[Qian], Xu, D.[Dong],
STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding,
ICCV21(1513-1522)
IEEE DOI 2203
Representation learning, Visualization, Grounding, Detectors, Benchmark testing, Transformers, Electron tubes, Vision + language, Video analysis and understanding BibRef

Cui, C.Y.Q.[Claire Yu-Qing], Khandelwal, A.[Apoorv], Artzi, Y.[Yoav], Snavely, N.[Noah], Averbuch-Elor, H.[Hadar],
Who's Waldo? Linking People Across Text and Images,
ICCV21(1354-1364)
IEEE DOI 2203
Visualization, Codes, Grounding, Force, Benchmark testing, Transformers, Vision + language, Datasets and evaluation BibRef

González, C.[Cristina], Ayobi, N.[Nicolás], Hernández, I.[Isabela], Hernández, J.[José], Pont-Tuset, J.[Jordi], Arbeláez, P.[Pablo],
Panoptic Narrative Grounding,
ICCV21(1344-1353)
IEEE DOI 2203
Measurement, Visualization, Image segmentation, Grounding, Annotations, Semantics, Vision + language, grouping and shape BibRef

Hong, Y.[Yining], Li, Q.[Qing], Zhu, S.C.[Song-Chun], Huang, S.Y.[Si-Yuan],
VLGrammar: Grounded Grammar Induction of Vision and Language,
ICCV21(1645-1654)
IEEE DOI 2203
Visualization, Semantics, Natural languages, Image retrieval, Probabilistic logic, Vision + language, BibRef

Kamath, A.[Aishwarya], Singh, M.[Mannat], Le Cun, Y.[Yann], Synnaeve, G.[Gabriel], Misra, I.[Ishan], Carion, N.[Nicolas],
MDETR: Modulated Detection for End-to-End Multi-Modal Understanding,
ICCV21(1760-1770)
IEEE DOI 2203
Visualization, Vocabulary, Image segmentation, Grounding, Detectors, Vision + language, Visual reasoning and logical representation BibRef

Yuan, Z.H.[Zhi-Hao], Yan, X.[Xu], Liao, Y.H.[Ying-Hong], Zhang, R.[Ruimao], Wang, S.[Sheng], Li, Z.[Zhen], Cui, S.G.[Shu-Guang],
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring,
ICCV21(1771-1780)
IEEE DOI 2203
Location awareness, Point cloud compression, Visualization, Solid modeling, Grounding, Predictive models, Vision + language, Visual reasoning and logical representation BibRef

Deng, J.J.[Jia-Jun], Yang, Z.Y.[Zheng-Yuan], Chen, T.L.[Tian-Lang], Zhou, W.G.[Wen-Gang], Li, H.Q.[Hou-Qiang],
TransVG: End-to-End Visual Grounding with Transformers,
ICCV21(1749-1759)
IEEE DOI 2203
Visualization, Codes, Grounding, Manuals, Transformers, Cognition, Vision + language, Vision + other modalities BibRef

Soldan, M.[Mattia], Xu, M.M.[Meng-Meng], Qu, S.[Sisi], Tegner, J.[Jesper], Ghanem, B.[Bernard],
VLG-Net: Video-Language Graph Matching Network for Video Grounding,
CVEU21(3217-3227)
IEEE DOI 2112
Location awareness, Grounding, Semantics, Syntactics, Graph neural networks BibRef

Lu, X.P.[Xiao-Peng], Fan, Z.[Zhen], Wang, Y.[Yansen], Oh, J.[Jean], Rosé, C.P.[Carolyn P.],
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling,
XSAnim21(2631-2639)
IEEE DOI 2112
Integrated optics, Visualization, Grounding, Computational modeling, Knowledge discovery BibRef

Song, S.[Sijie], Lin, X.D.[Xu-Dong], Liu, J.Y.[Jia-Ying], Guo, Z.M.[Zong-Ming], Chang, S.F.[Shih-Fu],
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos,
CVPR21(1346-1355)
IEEE DOI 2111
Visualization, Correlation, Grounding, Computational modeling, Semantics, Benchmark testing BibRef

Tian, Y.P.[Ya-Peng], Hu, D.[Di], Xu, C.L.[Chen-Liang],
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation,
CVPR21(2744-2753)
IEEE DOI 2111
Training, Visualization, Codes, Grounding, Computational modeling, Pattern recognition BibRef

Nan, G.S.[Guo-Shun], Qiao, R.[Rui], Xiao, Y.[Yao], Liu, J.[Jun], Leng, S.C.[Si-Cong], Zhang, H.[Hao], Lu, W.[Wei],
Interventional Video Grounding with Dual Contrastive Learning,
CVPR21(2764-2774)
IEEE DOI 2111
Visualization, Correlation, Grounding, Benchmark testing, Knowledge discovery, Data models, Pattern recognition BibRef

Zhao, Y.[Yang], Zhao, Z.[Zhou], Zhang, Z.[Zhu], Lin, Z.J.[Zhi-Jie],
Cascaded Prediction Network via Segment Tree for Temporal Video Grounding,
CVPR21(4195-4204)
IEEE DOI 2111
Costs, Grounding, Navigation, Fuses, Benchmark testing, Pattern recognition BibRef

Liu, Y.F.[Yong-Fei], Wan, B.[Bo], Ma, L.[Lin], He, X.M.[Xu-Ming],
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding,
CVPR21(5608-5617)
IEEE DOI 2111
Location awareness, Learning systems, Visualization, Grounding, Semantics, Noise reduction, Benchmark testing BibRef

Liu, H.L.[Hao-Lin], Lin, A.[Anran], Han, X.G.[Xiao-Guang], Yang, L.[Lei], Yu, Y.Z.[Yi-Zhou], Cui, S.G.[Shu-Guang],
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images,
CVPR21(6028-6037)
IEEE DOI 2111
Heating systems, Geometry, Visualization, Grounding, Fuses, Feature extraction BibRef

Lin, X.R.[Xiang-Ru], Li, G.[Guanbin], Yu, Y.Z.[Yi-Zhou],
Scene-Intuitive Agent for Remote Embodied Visual Grounding,
CVPR21(7032-7041)
IEEE DOI 2111
Training, Visualization, Grounding, Navigation, Fuses, Semantics, Pipelines BibRef

Liu, D.Z.[Dai-Zong], Qu, X.Y.[Xiao-Ye], Dong, J.F.[Jian-Feng], Zhou, P.[Pan], Cheng, Y.[Yu], Wei, W.[Wei], Xu, Z.[Zichuan], Xie, Y.[Yulai],
Context-aware Biaffine Localizing Network for Temporal Sentence Grounding,
CVPR21(11230-11239)
IEEE DOI 2111
Location awareness, Codes, Grounding, Cognition, Pattern recognition, Task analysis BibRef

Meng, Z.[Zihang], Yu, L.C.[Li-Cheng], Zhang, N.[Ning], Berg, T.[Tamara], Damavandi, B.[Babak], Singh, V.[Vikas], Bearman, A.[Amy],
Connecting What to Say With Where to Look by Modeling Human Attention Traces,
CVPR21(12674-12683)
IEEE DOI 2111
Measurement, Visualization, Grounding, Unified modeling language, Training data, Transformers BibRef

Sun, M.J.[Ming-Jie], Xiao, J.[Jimin], Lim, E.G.[Eng Gee],
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning,
CVPR21(14055-14064)
IEEE DOI 2111
Art, Grounding, Reinforcement learning, Cognition, Pattern recognition, Proposals BibRef

Wang, L.W.[Li-Wei], Huang, J.[Jing], Li, Y.[Yin], Xu, K.[Kun], Yang, Z.Y.[Zheng-Yuan], Yu, D.[Dong],
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation,
CVPR21(14085-14095)
IEEE DOI 2111
Training, Visualization, Technological innovation, Costs, Grounding, Detectors BibRef

Feng, G.[Guang], Hu, Z.W.[Zhi-Wei], Zhang, L.[Lihe], Lu, H.C.[Hu-Chuan],
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation,
CVPR21(15501-15510)
IEEE DOI 2111
Measurement, Visualization, Image segmentation, Grounding, Semantics, Transforms, Information representation BibRef

Huang, B.B.[Bin-Bin], Lian, D.Z.[Dong-Ze], Luo, W.X.[Wei-Xin], Gao, S.H.[Sheng-Hua],
Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding,
CVPR21(16883-16892)
IEEE DOI 2111
Visualization, Grounding, Convolution, Heuristic algorithms, Computational modeling, Linguistics BibRef

Zhou, H.[Hao], Zhang, C.Y.[Chong-Yang], Luo, Y.[Yan], Chen, Y.J.[Yan-Jun], Hu, C.P.[Chuan-Ping],
Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding,
CVPR21(8441-8450)
IEEE DOI 2111
Performance evaluation, Uncertainty, Grounding, Annotations, Feature extraction, Robustness BibRef

Khan, A.U.[Aisha Urooj], Kuehne, H.[Hilde], Duarte, K.[Kevin], Gan, C.[Chuang], Lobo, N.[Niels], Shah, M.[Mubarak],
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules,
CVPR21(8461-8470)
IEEE DOI 2111
Training, Visualization, Vocabulary, Grounding, Focusing, Detectors, Knowledge discovery BibRef

Zhang, S.Y.[Sheng-Yu], Jiang, T.[Tan], Wang, T.[Tan], Kuang, K.[Kun], Zhao, Z.[Zhou], Zhu, J.[Jianke], Yu, J.[Jin], Yang, H.X.[Hong-Xia], Wu, F.[Fei],
DeVLBert: Out-of-distribution Visio-Linguistic Pretraining with Causality,
CiV21(1744-1747)
IEEE DOI 2109
Visualization, Correlation, Image retrieval, Knowledge discovery BibRef

Nguyen, A.T.[Andre T.], Richards, L.E.[Luke E.], Kebe, G.Y.[Gaoussou Youssouf], Raff, E.[Edward], Darvish, K.[Kasra], Ferraro, F.[Frank], Matuszek, C.[Cynthia],
Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning,
MULA21(1613-1622)
IEEE DOI 2109
Manifolds, Measurement, Learning systems, Natural languages, Robot sensing systems BibRef

Shrestha, A.[Amar], Pugdeethosapol, K.[Krittaphat], Fang, H.[Haowen], Qiu, Q.[Qinru],
MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level,
ICPR21(8275-8282)
IEEE DOI 2105
Visualization, Grounding, Fuses, Magnetic resonance imaging, Natural languages, Games, Pattern recognition BibRef

Zhang, Z., Zhao, Z., Zhao, Y., Wang, Q., Liu, H., Gao, L.,
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences,
CVPR20(10665-10674)
IEEE DOI 2008
Grounding, Task analysis, Visualization, Cognition, Feature extraction, Natural languages BibRef

Sadhu, A.[Arka], Chen, K.[Kan], Nevatia, R.[Ram],
Video Object Grounding Using Semantic Roles in Language Description,
CVPR20(10414-10424)
IEEE DOI 2008
grounds objects in videos referred to in natural language descriptions. Semantics, Encoding, Proposals, Grounding, Visualization, Task analysis, Feature extraction BibRef

Ma, C.Y.[Chih-Yao], Kalantidis, Y.[Yannis], AlRegib, G.[Ghassan], Vajda, P.[Peter], Rohrbach, M.[Marcus], Kira, Z.[Zsolt],
Learning to Generate Grounded Visual Captions Without Localization Supervision,
ECCV20(XVIII:353-370).
Springer DOI 2012
BibRef

Gouthaman, K.V., Mittal, A.[Anurag],
Reducing Language Biases in Visual Question Answering with Visually-grounded Question Encoder,
ECCV20(XIII:18-34).
Springer DOI 2011
BibRef

Zeng, R.H.[Run-Hao], Xu, H.M.[Hao-Ming], Huang, W.B.[Wen-Bing], Chen, P.H.[Pei-Hao], Tan, M.K.[Ming-Kui], Gan, C.[Chuang],
Dense Regression Network for Video Grounding,
CVPR20(10284-10293)
IEEE DOI 2008
Grounding, Training, Task analysis, Proposals, Semantics, Magnetic heads, Feature extraction BibRef

Gupta, T.[Tanmay], Vahdat, A.[Arash], Chechik, G.[Gal], Yang, X.D.[Xiao-Dong], Kautz, J.[Jan], Hoiem, D.[Derek],
Contrastive Learning for Weakly Supervised Phrase Grounding,
ECCV20(III:752-768).
Springer DOI 2012
BibRef

Tan, H.L., Leong, M.C., Xu, Q., Li, L., Fang, F., Cheng, Y., Gauthier, N., Sun, Y., Lim, J.H.,
Task-Oriented Multi-Modal Question Answering For Collaborative Applications,
ICIP20(1426-1430)
IEEE DOI 2011
Task analysis, Collaboration, Grounding, Visualization, Cognition, Training, Machine learning, question answering, corpora BibRef

Yang, S.[Sibei], Li, G.B.[Guan-Bin], Yu, Y.Z.[Yi-Zhou],
Propagating Over Phrase Relations for One-stage Visual Grounding,
ECCV20(XIX:589-605).
Springer DOI 2011
BibRef

Xiao, J.B.[Jun-Bin], Shang, X.[Xindi], Yang, X.[Xun], Tang, S.[Sheng], Chua, T.S.[Tat-Seng],
Visual Relation Grounding in Videos,
ECCV20(VI:447-464).
Springer DOI 2011
Code, Relations.
WWW Link. BibRef

Mun, J., Cho, M., Han, B.,
Local-Global Video-Text Interactions for Temporal Grounding,
CVPR20(10807-10816)
IEEE DOI 2008
Semantics, Feature extraction, Grounding, Visualization, Proposals, Task analysis, Context modeling BibRef

Wu, C., Lin, Z., Cohen, S., Bui, T., Maji, S.,
PhraseCut: Language-Based Image Segmentation in the Wild,
CVPR20(10213-10222)
IEEE DOI 2008
Visualization, Grounding, Image segmentation, Task analysis, Genomics, Bioinformatics, Natural languages BibRef

Selvaraju, R.R., Tendulkar, P., Parikh, D., Horvitz, E., Tulio Ribeiro, M., Nushi, B., Kamar, E.,
SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions,
CVPR20(10000-10008)
IEEE DOI 2008
Cognition, Task analysis, Visualization, Image color analysis, Grounding, Text recognition, Computational modeling BibRef

Chen, L.[Lei], Zhai, M.Y.[Meng-Yao], He, J.W.[Jia-Wei], Mori, G.[Greg],
Object Grounding via Iterative Context Reasoning,
MDALC19(1407-1415)
IEEE DOI 2004
Localize set of queries in the image. image classification, image representation, image segmentation, inference mechanisms, iterative methods, query processing, weakly supervised learning BibRef

Sinha, A.[Abhishek], Akilesh, B., Sarkar, M.[Mausoom], Krishnamurthy, B.[Balaji],
Attention Based Natural Language Grounding by Navigating Virtual Environment,
WACV19(236-244)
IEEE DOI 1904
learning (artificial intelligence), natural language processing, virtual reality, Grounding BibRef

Selvaraju, R.R., Lee, S., Shen, Y., Jin, H., Ghosh, S., Heck, L., Batra, D., Parikh, D.,
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded,
ICCV19(2591-2600)
IEEE DOI 2004
gradient methods, image retrieval, natural language processing, neural nets, question answering (information retrieval), HINT, Correlation BibRef

Zhang, Y., Niebles, J.C., Soto, A.,
Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining,
WACV19(349-357)
IEEE DOI 1904
data mining, data visualisation, image representation, learning (artificial intelligence), Computer architecture BibRef

Shi, J.[Jing], Xu, J.[Jia], Gong, B.Q.[Bo-Qing], Xu, C.L.[Chen-Liang],
Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses,
CVPR19(10436-10444).
IEEE DOI 2002
BibRef

Datta, S.[Samyak], Sikka, K.[Karan], Roy, A.[Anirban], Ahuja, K.[Karuna], Parikh, D.[Devi], Divakaran, A.[Ajay],
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment,
ICCV19(2601-2610)
IEEE DOI 2004
image representation, image retrieval, learning (artificial intelligence), Image coding BibRef

Fang, Z.Y.[Zhi-Yuan], Kong, S.[Shu], Fowlkes, C.C.[Charless C.], Yang, Y.Z.[Ye-Zhou],
Modularized Textual Grounding for Counterfactual Resilience,
CVPR19(6371-6381).
IEEE DOI 2002
BibRef

Liu, X.J.[Xue-Jing], Li, L.[Liang], Wang, S.H.[Shu-Hui], Zha, Z.J.[Zheng-Jun], Meng, D.C.[De-Chao], Huang, Q.M.[Qing-Ming],
Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding,
ICCV19(2611-2620)
IEEE DOI 2004
Localize the object in the image from a query. feature extraction, image classification, image reconstruction, image retrieval, Adaptive systems BibRef

Zhuang, B., Wu, Q., Shen, C., Reid, I.D., van den Hengel, A.J.[Anton J.],
Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries,
CVPR18(4252-4261)
IEEE DOI 1812
Visualization, Task analysis, Cognition, Proposals, Grounding, Correlation BibRef

Yang, Z.Y.[Zheng-Yuan], Chen, T.L.[Tian-Lang], Wang, L.W.[Li-Wei], Luo, J.B.[Jie-Bo],
Improving One-Stage Visual Grounding by Recursive Sub-query Construction,
ECCV20(XIV:387-404).
Springer DOI 2011
Code, Query.
WWW Link. BibRef

Zhang, H.W.[Han-Wang], Niu, Y.L.[Yu-Lei], Chang, S.F.[Shih-Fu],
Grounding Referring Expressions in Images by Variational Context,
CVPR18(4158-4166)
IEEE DOI 1812
Grounding, Context modeling, Task analysis, Visualization, Pediatrics, Bayes methods, Natural languages BibRef

Yu, L.C.[Li-Cheng], Lin, Z.[Zhe], Shen, X.H.[Xiao-Hui], Yang, J.M.[Ji-Mei], Lu, X.[Xin], Bansal, M.[Mohit], Berg, T.L.[Tamara L.],
MAttNet: Modular Attention Network for Referring Expression Comprehension,
CVPR18(1307-1315)
IEEE DOI 1812
Localize image region described by natural language expression. Visualization, Computational modeling, Task analysis, Cats, Adaptation models, Feature extraction, Knowledge discovery BibRef

Liu, D.Q.[Da-Qing], Zhang, H.W.[Han-Wang], Zha, Z.J.[Zheng-Jun], Wu, F.[Feng],
Learning to Assemble Neural Module Tree Networks for Visual Grounding,
ICCV19(4672-4681)
IEEE DOI 2004
approximation theory, data visualisation, grammars, learning (artificial intelligence), Training BibRef

Sadhu, A., Chen, K., Nevatia, R.,
Zero-Shot Grounding of Objects From Natural Language Queries,
ICCV19(4693-4702)
IEEE DOI 2004
image classification, learning (artificial intelligence), Visualization, natural language processing, object detection, query processing. BibRef

Yang, Z.Y.[Zheng-Yuan], Gong, B.Q.[Bo-Qing], Wang, L.W.[Li-Wei], Huang, W.B.[Wen-Bing], Yu, D.[Dong], Luo, J.B.[Jie-Bo],
A Fast and Accurate One-Stage Approach to Visual Grounding,
ICCV19(4682-4692)
IEEE DOI 2004
document image processing, feature extraction, image fusion, image segmentation, natural language processing, Encoding BibRef

Rohrbach, A.[Anna], Rohrbach, M.[Marcus], Tang, S.[Siyu], Oh, S.J.[Seong Joon], Schiele, B.[Bernt],
Generating Descriptions with Grounded and Co-referenced People,
CVPR17(4196-4206)
IEEE DOI 1711
Movie description. Grounding, Head, Joining processes, Motion pictures, Videos, Visualization BibRef

Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A.B., Fidler, S.,
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books,
ICCV15(19-27)
IEEE DOI 1602
Grounding BibRef

Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Internet Label Information .


Last update:Jan 29, 2023 at 20:54:24