19.4.3.2.2 Visual Grounding, Grounding Expressions

Chapter Contents (Back)
Question Answer. Grounding. Visual Grounding. Visual Dialog. Mostly a subset of the related:
See also Visual Question Answering, Query, VQA, Visual Dialog.

Visual7W visual question answering,
Large-scale visual question answering (QA) dataset, with object-level groundings and multimodal answers. WWW Link.
Dataset, Visual Question Answering.

Liang, J.W.[Jun-Wei], Jiang, L.[Lu], Cao, L.L.[Liang-Liang], Kalantidis, Y.[Yannis], Li, L.J.[Li-Jia], Hauptmann, A.G.[Alexander G.],
Focal Visual-Text Attention for Memex Question Answering,
PAMI(41), No. 8, August 2019, pp. 1893-1908.
IEEE DOI 1907
BibRef
Earlier: A1, A2, A3, A5, A6, Only:
Focal Visual-Text Attention for Visual Question Answering,
CVPR18(6135-6143)
IEEE DOI 1812
Task analysis, Knowledge discovery, Visualization, Grounding, Metadata, Cognition, Photo albums, question answering, memex. Visualization, Videos, Computational modeling, Correlation. BibRef

Riquelme, F.[Felipe], de Goyeneche, A.[Alfredo], Zhang, Y.D.[Yun-Dong], Niebles, J.C.[Juan Carlos], Soto, A.[Alvaro],
Explaining VQA predictions using visual grounding and a knowledge base,
IVC(101), 2020, pp. 103968.
Elsevier DOI 2009
Deep Learning, Attention, Supervision, Knowledge Base, Interpretability, Explainability BibRef

Niu, Y.L.[Yu-Lei], Zhang, H.W.[Han-Wang], Lu, Z.W.[Zhi-Wu], Chang, S.F.[Shih-Fu],
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions,
PAMI(43), No. 1, January 2021, pp. 347-359.
IEEE DOI 2012
Grounding, Context modeling, Visualization, Task analysis, Pediatrics, Bayes methods, Annotations, referring expression generation BibRef

Yang, S.[Sibei], Li, G.[Guanbin], Yu, Y.Z.[Yi-Zhou],
Relationship-Embedded Representation Learning for Grounding Referring Expressions,
PAMI(43), No. 8, August 2021, pp. 2765-2779.
IEEE DOI 2107
BibRef
Earlier:
Cross-Modal Relationship Inference for Grounding Referring Expressions,
CVPR19(4140-4149).
IEEE DOI 2002
Locate the object instance in an image described by a referring expression. Visualization, Semantics, Grounding, Proposals, Data mining, Logic gates, Feature extraction, Referring expressions, gated graph convolutional network. Locate target object based on natural language descriptions. BibRef

Yang, Z.Y.[Zheng-Yuan], Kumar, T.[Tushar], Chen, T.L.[Tian-Lang], Su, J.S.[Jing-Song], Luo, J.B.[Jie-Bo],
Grounding-Tracking-Integration,
CirSysVideo(31), No. 9, September 2021, pp. 3433-3443.
IEEE DOI 2109
Grounding, Target tracking, Visualization, History, Task analysis, Object tracking, Annotations, Tracking by language BibRef

Zhang, W.X.[Wei-Xia], Ma, C.[Chao], Wu, Q.[Qi], Yang, X.K.[Xiao-Kang],
Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning,
CirSysVideo(31), No. 9, September 2021, pp. 3469-3481.
IEEE DOI 2109
Navigation, Training, Trajectory, Visualization, Task analysis, Grounding, Generators, Vision-and-language, embodied navigation, adversarial learning BibRef

Zhai, S.L.[Song-Lin], Guo, G.B.[Gui-Bing], Yuan, F.J.[Fa-Jie], Liu, Y.[Yuan], Wang, X.W.[Xing-Wei],
VSE-fs: Fast Full-Sample Visual Semantic Embedding,
IEEE_Int_Sys(36), No. 4, July 2021, pp. 3-12.
IEEE DOI 2109
Construct a joint embedding space between visual features and semantic information. Computational modeling, Training, Integrated circuits, Time complexity, Semantics, Visualization, Intelligent systems, Negative Sampling BibRef

Sun, M.J.[Ming-Jie], Xiao, J.[Jimin], Lim, E.G.[Eng Gee], Liu, S.[Si], Goulermas, J.Y.[John Y.],
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding,
PAMI(43), No. 11, November 2021, pp. 4189-4195.
IEEE DOI 2110
Image reconstruction, Training, Proposals, Visualization, Task analysis, Linguistics, Grounding, discriminative triad matching BibRef

Bargal, S.A.[Sarah Adel], Zunino, A.[Andrea], Petsiuk, V.[Vitali], Zhang, J.M.[Jian-Ming], Saenko, K.[Kate], Murino, V.[Vittorio], Sclaroff, S.[Stan],
Guided Zoom: Zooming into Network Evidence to Refine Fine-Grained Model Decisions,
PAMI(43), No. 11, November 2021, pp. 4196-4202.
IEEE DOI 2110
Grounding, Training, Predictive models, Annotations, Location awareness, Correlation, Visualization, Explainable AI, convolutional neural networks BibRef


Zhang, S.Y.[Sheng-Yu], Jiang, T.[Tan], Wang, T.[Tan], Kuang, K.[Kun], Zhao, Z.[Zhou], Zhu, J.[Jianke], Yu, J.[Jin], Yang, H.X.[Hong-Xia], Wu, F.[Fei],
DeVLBert: Out-of-distribution Visio-Linguistic Pretraining with Causality,
CiV21(1744-1747)
IEEE DOI 2109
Visualization, Correlation, Image retrieval, Computer architecture, Knowledge discovery BibRef

Nguyen, A.T.[Andre T.], Richards, L.E.[Luke E.], Kebe, G.Y.[Gaoussou Youssouf], Raff, E.[Edward], Darvish, K.[Kasra], Ferraro, F.[Frank], Matuszek, C.[Cynthia],
Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning,
MULA21(1613-1622)
IEEE DOI 2109
Manifolds, Measurement, Learning systems, Natural languages, Robot sensing systems BibRef

Shrestha, A.[Amar], Pugdeethosapol, K.[Krittaphat], Fang, H.[Haowen], Qiu, Q.[Qinru],
MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level,
ICPR21(8275-8282)
IEEE DOI 2105
Visualization, Grounding, Fuses, Magnetic resonance imaging, Natural languages, Games, Pattern recognition BibRef

Zhang, Z., Zhao, Z., Zhao, Y., Wang, Q., Liu, H., Gao, L.,
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences,
CVPR20(10665-10674)
IEEE DOI 2008
Grounding, Task analysis, Visualization, Cognition, Feature extraction, Natural languages BibRef

Burns, A.[Andrea], Tan, R.[Reuben], Saenko, K.[Kate], Sclaroff, S.[Stan], Plummer, B.[Bryan],
Language Features Matter: Effective Language Representations for Vision-Language Tasks,
ICCV19(7473-7482)
IEEE DOI 2004
Code, Visualization.
WWW Link. data visualisation, graph theory, image representation, learning (artificial intelligence), Grounding BibRef

Sadhu, A.[Arka], Chen, K.[Kan], Nevatia, R.[Ram],
Video Object Grounding Using Semantic Roles in Language Description,
CVPR20(10414-10424)
IEEE DOI 2008
grounds objects in videos referred to in natural language descriptions. Semantics, Encoding, Proposals, Grounding, Visualization, Task analysis, Feature extraction BibRef

Ma, C.Y.[Chih-Yao], Kalantidis, Y.[Yannis], AlRegib, G.[Ghassan], Vajda, P.[Peter], Rohrbach, M.[Marcus], Kira, Z.[Zsolt],
Learning to Generate Grounded Visual Captions Without Localization Supervision,
ECCV20(XVIII:353-370).
Springer DOI 2012
BibRef

Zeng, R.H.[Run-Hao], Xu, H.M.[Hao-Ming], Huang, W.B.[Wen-Bing], Chen, P.H.[Pei-Hao], Tan, M.K.[Ming-Kui], Gan, C.[Chuang],
Dense Regression Network for Video Grounding,
CVPR20(10284-10293)
IEEE DOI 2008
Grounding, Training, Task analysis, Proposals, Semantics, Magnetic heads, Feature extraction BibRef

Gupta, T.[Tanmay], Vahdat, A.[Arash], Chechik, G.[Gal], Yang, X.D.[Xiao-Dong], Kautz, J.[Jan], Hoiem, D.[Derek],
Contrastive Learning for Weakly Supervised Phrase Grounding,
ECCV20(III:752-768).
Springer DOI 2012
BibRef

Tan, H.L., Leong, M.C., Xu, Q., Li, L., Fang, F., Cheng, Y., Gauthier, N., Sun, Y., Lim, J.H.,
Task-Oriented Multi-Modal Question Answering For Collaborative Applications,
ICIP20(1426-1430)
IEEE DOI 2011
Task analysis, Collaboration, Grounding, Visualization, Cognition, Training, Machine learning, question answering, corpora BibRef

Yang, S.[Sibei], Li, G.B.[Guan-Bin], Yu, Y.Z.[Yi-Zhou],
Propagating Over Phrase Relations for One-stage Visual Grounding,
ECCV20(XIX:589-605).
Springer DOI 2011
BibRef

Xiao, J.B.[Jun-Bin], Shang, X.[Xindi], Yang, X.[Xun], Tang, S.[Sheng], Chua, T.S.[Tat-Seng],
Visual Relation Grounding in Videos,
ECCV20(VI:447-464).
Springer DOI 2011
Code, Relations.
WWW Link. BibRef

Mun, J., Cho, M., Han, B.,
Local-Global Video-Text Interactions for Temporal Grounding,
CVPR20(10807-10816)
IEEE DOI 2008
Semantics, Feature extraction, Grounding, Visualization, Proposals, Task analysis, Context modeling BibRef

Wu, C., Lin, Z., Cohen, S., Bui, T., Maji, S.,
PhraseCut: Language-Based Image Segmentation in the Wild,
CVPR20(10213-10222)
IEEE DOI 2008
Visualization, Grounding, Image segmentation, Task analysis, Genomics, Bioinformatics, Natural languages BibRef

Selvaraju, R.R., Tendulkar, P., Parikh, D., Horvitz, E., Tulio Ribeiro, M., Nushi, B., Kamar, E.,
SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions,
CVPR20(10000-10008)
IEEE DOI 2008
Cognition, Task analysis, Visualization, Image color analysis, Grounding, Text recognition, Computational modeling BibRef

Chen, L.[Lei], Zhai, M.Y.[Meng-Yao], He, J.W.[Jia-Wei], Mori, G.[Greg],
Object Grounding via Iterative Context Reasoning,
MDALC19(1407-1415)
IEEE DOI 2004
Localize set of queries in the image. image classification, image representation, image segmentation, inference mechanisms, iterative methods, query processing, weakly supervised learning BibRef

Zhang, Y., Niebles, J.C., Soto, A.,
Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining,
WACV19(349-357)
IEEE DOI 1904
data mining, data visualisation, image representation, learning (artificial intelligence), Computer architecture BibRef

Shi, J.[Jing], Xu, J.[Jia], Gong, B.[Boqing], Xu, C.L.[Chen-Liang],
Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses,
CVPR19(10436-10444).
IEEE DOI 2002
BibRef

Datta, S.[Samyak], Sikka, K.[Karan], Roy, A.[Anirban], Ahuja, K.[Karuna], Parikh, D.[Devi], Divakaran, A.[Ajay],
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment,
ICCV19(2601-2610)
IEEE DOI 2004
image representation, image retrieval, learning (artificial intelligence), Image coding BibRef

Fang, Z.Y.[Zhi-Yuan], Kong, S.[Shu], Fowlkes, C.C.[Charless C.], Yang, Y.Z.[Ye-Zhou],
Modularized Textual Grounding for Counterfactual Resilience,
CVPR19(6371-6381).
IEEE DOI 2002
BibRef

Liu, X.J.[Xue-Jing], Li, L.[Liang], Wang, S.H.[Shu-Hui], Zha, Z.J.[Zheng-Jun], Meng, D.C.[De-Chao], Huang, Q.M.[Qing-Ming],
Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding,
ICCV19(2611-2620)
IEEE DOI 2004
Localize the object in the image from a query. feature extraction, image classification, image reconstruction, image retrieval, Adaptive systems BibRef

Zhuang, B., Wu, Q., Shen, C., Reid, I.D., van den Hengel, A.J.[Anton J.],
Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries,
CVPR18(4252-4261)
IEEE DOI 1812
Visualization, Task analysis, Cognition, Proposals, Grounding, Computer vision, Correlation BibRef

Yang, Z.Y.[Zheng-Yuan], Chen, T.L.[Tian-Lang], Wang, L.[Liwei], Luo, J.B.[Jie-Bo],
Improving One-Stage Visual Grounding by Recursive Sub-query Construction,
ECCV20(XIV:387-404).
Springer DOI 2011
Code, Query.
WWW Link. BibRef

Zhang, H.W.[Han-Wang], Niu, Y.L.[Yu-Lei], Chang, S.F.[Shih-Fu],
Grounding Referring Expressions in Images by Variational Context,
CVPR18(4158-4166)
IEEE DOI 1812
Grounding, Context modeling, Task analysis, Visualization, Pediatrics, Bayes methods, Natural languages BibRef

Yu, L.C.[Li-Cheng], Lin, Z.[Zhe], Shen, X.H.[Xiao-Hui], Yang, J.M.[Ji-Mei], Lu, X.[Xin], Bansal, M.[Mohit], Berg, T.L.[Tamara L.],
MAttNet: Modular Attention Network for Referring Expression Comprehension,
CVPR18(1307-1315)
IEEE DOI 1812
Localize image region described by natural language expression. Visualization, Computational modeling, Task analysis, Cats, Adaptation models, Feature extraction, Knowledge discovery BibRef

Liu, D.Q.[Da-Qing], Zhang, H.W.[Han-Wang], Zha, Z.J.[Zheng-Jun], Wu, F.[Feng],
Learning to Assemble Neural Module Tree Networks for Visual Grounding,
ICCV19(4672-4681)
IEEE DOI 2004
approximation theory, data visualisation, grammars, learning (artificial intelligence), Training BibRef

Sadhu, A., Chen, K., Nevatia, R.,
Zero-Shot Grounding of Objects From Natural Language Queries,
ICCV19(4693-4702)
IEEE DOI 2004
image classification, learning (artificial intelligence), Visualization, natural language processing, object detection, query processing. BibRef

Yang, Z.Y.[Zheng-Yuan], Gong, B.Q.[Bo-Qing], Wang, L.W.[Li-Wei], Huang, W.B.[Wen-Bing], Yu, D.[Dong], Luo, J.B.[Jie-Bo],
A Fast and Accurate One-Stage Approach to Visual Grounding,
ICCV19(4682-4692)
IEEE DOI 2004
document image processing, feature extraction, image fusion, image segmentation, natural language processing, Encoding BibRef

Rohrbach, A.[Anna], Rohrbach, M.[Marcus], Tang, S.[Siyu], Oh, S.J.[Seong Joon], Schiele, B.[Bernt],
Generating Descriptions with Grounded and Co-referenced People,
CVPR17(4196-4206)
IEEE DOI 1711
Movie description. Grounding, Head, Joining processes, Motion pictures, Videos, Visualization BibRef

Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A.B., Fidler, S.,
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books,
ICCV15(19-27)
IEEE DOI 1602
Grounding BibRef

Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Internet Label Information .


Last update:Oct 11, 2021 at 11:04:06