Chen, Z.X.[Zhi-Xuan],
Bie, Y.[Yequan],
Jin, H.B.[Hai-Bo],
Chen, H.[Hao],
Large Language Model With Region-Guided Referring and Grounding for
CT Report Generation,
MedImg(44), No. 8, August 2025, pp. 3139-3150.
IEEE DOI Code:
WWW Link.
2508
Computed tomography, Grounding, Feature extraction, Training,
Medical diagnostic imaging, Accuracy, Geometry, Lungs, Visualization,
large language model
BibRef
Liu, Y.[Yi],
Hou, H.W.[Hao-Wen],
Ma, F.[Fei],
Ni, S.G.[Shi-Guang],
Yu, F.R.[Fei Richard],
MLLM-TA: Leveraging Multimodal Large Language Models for Precise
Temporal Video Grounding\,
SPLetters(32), 2025, pp. 281-285.
IEEE DOI
2501
Visualization, Grounding, Large language models, Feature extraction,
Benchmark testing, Vectors, Training, video grounding
BibRef
Li, G.Z.[Guo-Zhang],
Ding, X.P.[Xin-Peng],
Cheng, D.[De],
Li, J.[Jie],
Wang, N.N.[Nan-Nan],
Gao, X.B.[Xin-Bo],
ETC: Temporal Boundary Expand Then Clarify for Weakly Supervised
Video Grounding With Multimodal Large Language Model,
MultMed(27), 2025, pp. 1772-1782.
IEEE DOI
2504
Proposals, Grounding, Visualization, Annotations, Noise measurement,
Location awareness, Large language models, Data augmentation,
video grounding
BibRef
Yu, C.L.[Chun-Lin],
Wang, H.Q.[Han-Qing],
Shi, Y.[Ye],
Luo, H.Y.[Hao-Yang],
Yang, S.[Sibei],
Yu, J.Y.[Jing-Yi],
Wang, J.Y.[Jing-Ya],
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large
Language Model,
CVPR25(1691-1701)
IEEE DOI Code:
WWW Link.
2508
Solid modeling, Grounding, Affordances, Large language models,
Benchmark testing, Cognition, Intelligent agents, Context modeling,
multi-modal large language model
BibRef
Huang, Y.[Yangyu],
Gao, T.Y.[Tian-Yi],
Xu, H.R.[Hao-Ran],
Zhao, Q.H.[Qi-Hao],
Song, Y.[Yang],
Gui, Z.P.[Zhi-Peng],
Lv, T.C.[Teng-Chao],
Chen, H.[Hao],
Cui, L.[Lei],
Li, S.[Scarlett],
Wei, F.[Furu],
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs,
CVPR25(3899-3908)
IEEE DOI Code:
WWW Link.
2508
Hands, Grounding, Geology, Large language models, Earthquakes,
Feature extraction, Information retrieval, benchmark
BibRef
Chen, W.B.[Wen-Bo],
Xu, Z.[Zhen],
Xu, R.[Ruotao],
Wu, S.[Si],
Wong, H.S.[Hau-San],
Task-aware Cross-modal Feature Refinement Transformer with Large
Language Models for Visual Grounding,
CVPR25(3931-3941)
IEEE DOI
2508
Bridges, Visualization, Grounding, Large language models, Semantics,
Transformers, Feature extraction, Feeds, visual grounding, multimodal
BibRef
Wu, S.[Size],
Jin, S.[Sheng],
Zhang, W.W.[Wen-Wei],
Xu, L.[Lumin],
Liu, W.T.[Wen-Tao],
Li, W.[Wei],
Loy, C.C.[Chen Change],
F-LMM: Grounding Frozen Large Multimodal Models,
CVPR25(24710-24721)
IEEE DOI Code:
WWW Link.
2508
Visualization, Codes, Attention mechanisms, Grounding,
Oral communication, Benchmark testing, Cognition, Decoding,
visual segmentation
BibRef
Qian, R.[Rui],
Yin, X.[Xin],
Dou, D.[Dejing],
Reasoning to Attend: Try to Understand How
CVPR25(24722-24731)
IEEE DOI Code:
WWW Link.
2508
Visualization, Vocabulary, Grounding, Computational modeling,
Large language models, Semantics, Cognition, Decoding,
large multimodal models
BibRef
Chen, Y.[Yanyuan],
Xu, D.[Dexuan],
Huang, Y.[Yu],
Zhan, S.K.[Song-Kun],
Wang, H.[Hanpin],
Chen, D.X.[Dong-Xue],
Wang, X.P.[Xue-Ping],
Qiu, M.[Meikang],
Li, H.[Hang],
MIMO: A medical vision language model with visual referring
multimodal input and pixel grounding multimodal output,
CVPR25(24732-24741)
IEEE DOI Code:
WWW Link.
2508
Visualization, Grounding, Terminology, Large language models,
Computational modeling, Semantics,
medical visual question answering
BibRef
Huang, H.F.[Hai-Feng],
Chen, X.[Xinyi],
Chen, Y.L.[Yi-Lun],
Li, H.[Hao],
Han, X.[Xiaoshen],
Wang, Z.[Zehan],
Wang, T.[Tai],
Pang, J.M.[Jiang-Miao],
Zhao, Z.[Zhou],
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors,
CVPR25(22540-22550)
IEEE DOI
2508
Codes, Grounding, Shape, Pipelines, robotic manipulation,
d large vision-language models
BibRef
Man, Y.Z.[Yun-Ze],
Huang, D.A.[De-An],
Liu, G.L.[Gui-Lin],
Sheng, S.W.[Shi-Wei],
Liu, S.L.[Shi-Long],
Gui, L.Y.[Liang-Yan],
Kautz, J.[Jan],
Wang, Y.X.[Yu-Xiong],
Yu, Z.[Zhiding],
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought,
CVPR25(14268-14280)
IEEE DOI
2508
Visualization, Accuracy, Grounding, Large language models,
Benchmark testing, Cognition
BibRef
Yin, H.[Heng],
Ren, Y.Q.[Yu-Qiang],
Yan, K.[Ke],
Ding, S.H.[Shou-Hong],
Hao, Y.T.[Yong-Tao],
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large
Language Models,
CVPR25(14358-14368)
IEEE DOI
2508
Location awareness, Visualization, Grounding, Annotations,
Large language models, Pipelines, Training data, Object detection,
visual grounding
BibRef
Liao, Y.H.[Yuan-Hong],
Mahmood, R.[Rafid],
Fidler, S.[Sanja],
Acuna, D.[David],
Can Large Vision-Language Models Correct Semantic Grounding Errors By
Themselves?,
CVPR25(14667-14678)
IEEE DOI
2508
Training, Vocabulary, Accuracy, Grounding, Statistical analysis,
Semantics, Refining, Training data, Data models, Iterative methods,
self-correction
BibRef
Yuan, Z.H.[Zhi-Hao],
Peng, Y.[Yibo],
Ren, J.[Jinke],
Liao, Y.H.[Ying-Hong],
Han, Y.[Yatong],
Feng, C.M.[Chun-Mei],
Zhao, H.S.[Heng-Shuang],
Li, G.B.[Guan-Bin],
Cui, S.G.[Shu-Guang],
Li, Z.[Zhen],
Empowering Large Language Models with 3D Situation Awareness,
CVPR25(19435-19445)
IEEE DOI
2508
Grounding, Large language models, Manuals, Observers,
Data models, Trajectory, Videos, point cloud, vision and language
BibRef
Kang, S.[Seil],
Kim, J.[Jinyeong],
Kim, J.[Junhyeok],
Hwang, S.J.[Seong Jae],
Your Large Vision-Language Model Only Needs A Few Attention Heads For
Visual Grounding,
CVPR25(9339-9350)
IEEE DOI
2508
Location awareness, Visualization, Image segmentation, Head,
Attention mechanisms, Grounding, Semantics, Text to image,
large vision-language model
BibRef
Liu, Q.Y.[Qian-Yi],
Zhang, S.Q.[Si-Qi],
Qiao, Y.[Yanyuan],
Zhu, J.[Junyou],
Li, X.[Xiang],
Guo, L.[Longteng],
Wang, Q.[Qunbo],
He, X.J.[Xing-Jian],
Wu, Q.[Qi],
Liu, J.[Jing],
GroundingMate: Aiding Object Grounding for Goal-Oriented
Vision-and-Language Navigation,
WACV25(1775-1784)
IEEE DOI
2505
Bridges, Navigation, Grounding, Large language models, Computational modeling,
Natural languages, Cognition, Data mining, Object recognition
BibRef
Yan, S.[Siming],
Bai, M.[Min],
Chen, W.F.[Wei-Feng],
Zhou, X.[Xiong],
Huang, Q.X.[Qi-Xing],
Li, L.E.[Li Erran],
Vigor: Improving Visual Grounding of Large Vision Language Models with
Fine-grained Reward Modeling,
ECCV24(LXI: 37-53).
Springer DOI
2412
BibRef
Chowdhury, S.[Sanjoy],
Nag, S.[Sayan],
Dasgupta, S.[Subhrajyoti],
Chen, J.[Jun],
Elhoseiny, M.[Mohamed],
Gao, R.H.[Ruo-Han],
Manocha, D.[Dinesh],
Meerkat: Audio-visual Large Language Model for Grounding in Space and
Time,
ECCV24(LXIV: 52-70).
Springer DOI
2412
BibRef
Kuckreja, K.[Kartik],
Danish, M.S.[Muhammad Sohail],
Naseer, M.[Muzammal],
Das, A.[Abhijit],
Khan, S.[Salman],
Khan, F.S.[Fahad Shahbaz],
GeoChat: Grounded Large Vision-Language Model for Remote Sensing,
CVPR24(27831-27840)
IEEE DOI
2410
Visualization, Scene classification, Grounding, Oral communication,
Object detection, Benchmark testing, Data models
BibRef
Song, C.H.[Chan Hee],
Sadler, B.M.[Brian M.],
Wu, J.[Jiaman],
Chao, W.L.[Wei-Lun],
Washington, C.[Clayton],
Su, Y.[Yu],
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with
Large Language Models,
ICCV23(2986-2997)
IEEE DOI
2401
BibRef
You, K.[Keen],
Zhang, H.T.[Hao-Tian],
Schoop, E.[Eldon],
Weers, F.[Floris],
Swearngin, A.[Amanda],
Nichols, J.[Jeffrey],
Yang, Y.F.[Yin-Fei],
Gan, Z.[Zhe],
FERRET-UI: Grounded Mobile UI Understanding with Multimodal LLMs,
ECCV24(LXIV: 240-255).
Springer DOI
2412
BibRef
Tong, S.B.[Sheng-Bang],
Liu, Z.[Zhuang],
Zhai, Y.X.[Yue-Xiang],
Ma, Y.[Yi],
LeCun, Y.[Yann],
Xie, S.[Saining],
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs,
CVPR24(9568-9578)
IEEE DOI
2410
Representation learning, Visualization, Systematics, Correlation,
Grounding, Large language models, Multimodal LLMs, Vision Language Model
BibRef
Xu, J.R.[Jia-Rui],
Zhou, X.Y.[Xing-Yi],
Yan, S.[Shen],
Gu, X.[Xiuye],
Arnab, A.[Anurag],
Sun, C.[Chen],
Wang, X.L.[Xiao-Long],
Schmid, C.[Cordelia],
Pixel Aligned Language Models,
CVPR24(13030-13039)
IEEE DOI
2410
Location awareness, Visualization, Grounding,
Large language models, Machine vision, Computational modeling
BibRef
Wu, P.H.[Peng-Hao],
Xie, S.[Saining],
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs,
CVPR24(13084-13094)
IEEE DOI
2410
Training, Visualization, Grounding, Computational modeling, Seals,
Benchmark testing, multimodal large language model,
visual search
BibRef
He, R.[Ruozhen],
Cascante-Bonilla, P.[Paola],
Yang, Z.Y.[Zi-Yan],
Berg, A.C.[Alexander C.],
Ordonez, V.[Vicente],
Improved Visual Grounding through Self-Consistent Explanations,
CVPR24(13095-13105)
IEEE DOI
2410
Location awareness, Visualization, Vocabulary, Grounding,
Large language models, Data augmentation, Data models,
visual grounding
BibRef
Feng, C.[Chun],
Hsu, J.[Joy],
Liu, W.Y.[Wei-Yu],
Wu, J.J.[Jia-Jun],
Naturally Supervised 3D Visual Grounding with Language-Regularized
Concept Learners,
CVPR24(13269-13278)
IEEE DOI
2410
Visualization, Solid modeling, Accuracy, Grounding,
Large language models, 3D visual grounding, Language constraints
BibRef
He, J.W.[Jun-Wen],
Wang, Y.F.[Yi-Fan],
Wang, L.J.[Li-Jun],
Lu, H.C.[Hu-Chuan],
He, J.Y.[Jun-Yan],
Lan, J.P.[Jin-Peng],
Luo, B.[Bin],
Xie, X.[Xuansong],
Multi-Modal Instruction Tuned LLMs with Fine-Grained Visual
Perception,
CVPR24(13980-13990)
IEEE DOI Code:
WWW Link.
2410
Image segmentation, Visualization, Technological innovation,
Grounding, Computational modeling, Large language models, Natural languages
BibRef
Huang, B.[Bin],
Wang, X.[Xin],
Chen, H.[Hong],
Song, Z.[Zihan],
Zhu, W.W.[Wen-Wu],
VTimeLLM: Empower LLM to Grasp Video Moments,
CVPR24(14271-14280)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Grounding, Large language models,
Benchmark testing, Cognition
BibRef
Yuan, Z.H.[Zhi-Hao],
Ren, J.[Jinke],
Feng, C.M.[Chun-Mei],
Zhao, H.S.[Heng-Shuang],
Cui, S.G.[Shu-Guang],
Li, Z.[Zhen],
Visual Programming for Zero-Shot Open-Vocabulary 3D Visual Grounding,
CVPR24(20623-20633)
IEEE DOI Code:
WWW Link.
2410
Visualization, Vocabulary, Grounding, Annotations, Navigation,
Large language models, Visual Grounding, Point Cloud, Vision and Language
BibRef
Chen, G.[Gongwei],
Shen, L.[Leyang],
Shao, R.[Rui],
Deng, X.[Xiang],
Nie, L.Q.[Li-Qiang],
LION: Empowering Multimodal Large Language Model with Dual-Level
Visual Knowledge,
CVPR24(26530-26540)
IEEE DOI
2410
Visualization, Accuracy, Grounding, Large language models, Semantics,
Benchmark testing
BibRef
Qu, M.X.[Meng-Xue],
Chen, X.D.[Xiao-Dong],
Liu, W.[Wu],
Li, A.[Alicia],
Zhao, Y.[Yao],
ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large
Language Models,
PVUW24(1847-1856)
IEEE DOI
2410
Grounding, Annotations, Large language models, Supervised learning,
Natural languages
BibRef
Zhang, Y.[Yichi],
Ma, Z.Q.[Zi-Qiao],
Gao, X.F.[Xiao-Feng],
Shakiah, S.[Suhaila],
Gao, Q.[Qiaozi],
Chai, J.[Joyce],
Groundhog Grounding Large Language Models to Holistic Segmentation,
CVPR24(14227-14238)
IEEE DOI
2410
Training, Visualization, Grounding, Shape, Large language models,
Semantics, Feature extraction, Multi-Modal, Language Grounding,
Vision-Language Model
BibRef
Kim, K.[Kibum],
Yoon, K.[Kanghoon],
Jeon, J.[Jaehyeong],
In, Y.[Yeonjun],
Moon, J.[Jinyoung],
Kim, D.H.[Dong-Hyun],
Park, C.[Chanyoung],
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph
Generation,
CVPR24(28306-28316)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Grounding, Large language models, Semantics,
Genomics, Focusing, Scene Understanding, Large Language Model,
Long-Tail Problem
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Visual Grounding in Visual Question Answering .