Zhao, Z.[Zihao],
Wang, S.[Sheng],
Gu, J.[Jinchen],
Zhu, Y.[Yitao],
Mei, L.[Lanzhuju],
Zhuang, Z.X.[Zi-Xu],
Cui, Z.M.[Zhi-Ming],
Wang, Q.[Qian],
Shen, D.G.[Ding-Gang],
ChatCAD+: Toward a Universal and Reliable Interactive CAD Using LLMs,
MedImg(43), No. 11, November 2024, pp. 3755-3766.
IEEE DOI
2411
Solid modeling, Reliability, Medical diagnostic imaging, Chatbots,
Visualization, Brain modeling, Databases, Large language models,
computer-assisted diagnosis
BibRef
Luo, H.[Haonan],
Zeng, Y.J.[Yi-Jie],
Yang, L.[Li],
Chen, K.[Kexun],
Shen, Z.X.[Zhi-Xuan],
Lv, F.[Fengmao],
VLAI: Exploration and Exploitation based on Visual-Language Aligned
Information for Robotic Object Goal Navigation,
IVC(151), 2024, pp. 105259.
Elsevier DOI Code:
WWW Link.
2411
Object goal navigation, Visual-to-language,
Embodied artificial intelligence, Large language model
BibRef
Mansourian, A.[Ali],
Oucheikh, R.[Rachid],
ChatGeoAI: Enabling Geospatial Analysis for Public through Natural
Language, with Large Language Models,
IJGI(13), No. 10, 2024, pp. 348.
DOI Link
2411
BibRef
Li, D.[Diya],
Zhao, Y.[Yue],
Wang, Z.F.[Zhi-Fang],
Jung, C.[Calvin],
Zhang, Z.[Zhe],
Large Language Model-Driven Structured Output: A Comprehensive
Benchmark and Spatial Data Generation Framework,
IJGI(13), No. 11, 2024, pp. 405.
DOI Link
2412
BibRef
Li, Y.X.[Yun-Xin],
Hu, B.T.[Bao-Tian],
Chen, X.Y.[Xin-Yu],
Ma, L.[Lin],
Xu, Y.[Yong],
Zhang, M.[Min],
LMEye: An Interactive Perception Network for Large Language Models,
MultMed(26), 2024, pp. 10952-10964.
IEEE DOI
2412
Visualization, Task analysis, Data models, Tuning, Large language models,
Training, Cognition, interactive perception network
BibRef
Shao, R.[Run],
Zhang, Z.Y.[Zhao-Yang],
Tao, C.[Chao],
Zhang, Y.S.[Yun-Sheng],
Peng, C.L.[Chang-Le],
Li, H.F.[Hai-Feng],
Homogeneous tokenizer matters: Homogeneous visual tokenizer for
remote sensing image understanding,
PandRS(218), 2024, pp. 294-310.
Elsevier DOI Code:
WWW Link.
2412
Remote sensing image understanding, Visual tokenizer,
Homogeneous, Semantically independent region, Visual transformer model
BibRef
Wang, Z.H.[Zhe-Hui],
Luo, T.[Tao],
Liu, C.[Cheng],
Liu, W.C.[Wei-Chen],
Goh, R.S.M.[Rick Siow Mong],
Wong, W.F.[Weng-Fai],
Enabling Energy-Efficient Deployment of Large Language Models on
Memristor Crossbar: A Synergy of Large and Small,
PAMI(47), No. 2, February 2025, pp. 916-933.
IEEE DOI
2501
Memristors, Random access memory,
Nonvolatile memory, Computational modeling, Neural networks
BibRef
Zhan, Y.[Yang],
Xiong, Z.[Zhitong],
Yuan, Y.[Yuan],
SkyEyeGPT: Unifying remote sensing vision-language tasks via
instruction tuning with large language model,
PandRS(221), 2025, pp. 64-77.
Elsevier DOI
2503
Remote sensing vision-language, Large language model,
Multi-modal, Instruction tuning
BibRef
Zhu, Y.[Yong],
Wen, Z.Y.[Zhen-Yu],
Li, X.[Xiong],
Shi, X.F.[Xiu-Fang],
Wu, X.[Xiang],
Dong, H.[Hui],
Chen, J.M.[Ji-Ming],
ChatNav: Leveraging LLM to Zero-Shot Semantic Reasoning in Object
Navigation,
CirSysVideo(35), No. 3, March 2025, pp. 2369-2381.
IEEE DOI
2503
Semantics, Navigation, Robots, Cognition, TV, Accuracy, Chatbots,
Large language models, Decision making, Pipelines,
gravity-repulsion model
BibRef
Marasco, E.[Emanuela],
Bourlai, T.[Thirimachos],
Enhancing trust in Large Language Models for streamlined
decision-making in military operations,
IVC(158), 2025, pp. 105489.
Elsevier DOI
2505
Machine unlearning, Military, Trustworthy AI, Large Language Models
BibRef
Qiao, D.[Dewen],
Ao, X.[Xiang],
Liu, Y.[Yu],
Chen, X.T.[Xue-Tao],
Song, F.Y.[Fu-Yuan],
Qin, Z.[Zheng],
Jin, W.Q.[Wen-Qiang],
Tri-AFLLM: Resource-Efficient Adaptive Asynchronous Accelerated
Federated LLMs,
CirSysVideo(35), No. 5, May 2025, pp. 4198-4211.
IEEE DOI
2505
Training, Computational modeling, Adaptation models, Data models,
Accuracy, Optimization, Servers, Data privacy, Prompt engineering,
momentum gradient descent
BibRef
Zhang, Y.X.[Yi-Xuan],
Liu, C.B.[Chuan-Bin],
Liu, Y.Z.[Yi-Zhi],
Gao, Y.F.[Yi-Fan],
Lu, Z.Y.[Zhi-Ying],
Xie, H.T.[Hong-Tao],
Zhang, Y.D.[Yong-Dong],
Leveraging Concise Concepts With Probabilistic Modeling for
Interpretable Visual Recognition,
MultMed(27), 2025, pp. 3117-3131.
IEEE DOI
2506
Visualization, Probabilistic logic, Semantics, Training, Redundancy,
Predictive models, Large language models, Adaptation models,
probabilistic modeling
BibRef
Chen, L.F.[Ling-Feng],
Hu, P.[Panhe],
Pan, Z.L.[Zhi-Liang],
Liu, Q.[Qi],
Zhang, S.H.[Shuang-Hui],
Liu, Z.[Zhen],
Large Language Models Can Achieve Explainable and Training-Free
One-Shot HRRP ATR,
SPLetters(32), 2025, pp. 3395-3399.
IEEE DOI
2509
Indexes, Target recognition, Scattering, Radar, Training,
Large language models, Frequency-domain analysis, Data mining,
in-context learning
BibRef
Yang, S.Y.[Song-Yuan],
Yu, W.J.[Wei-Jiang],
Yang, W.J.[Wen-Jing],
Liu, X.W.[Xin-Wang],
Tan, H.B.[Hui-Bin],
Lan, L.[Long],
Xiao, N.[Nong],
WildVideo: Benchmarking LMMs for Understanding Video-Language
Interaction,
PAMI(47), No. 10, October 2025, pp. 9330-9344.
IEEE DOI
2510
Videos, Benchmark testing, Visualization, Cognition, Training,
Oral communication, Data mining,
video question answering
BibRef
Han, Y.D.[Yu-Dong],
Guo, Q.[Qingpei],
Pan, L.Y.[Li-Yuan],
Liu, L.[Liu],
Guan, Y.[Yu],
Yang, M.[Ming],
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video
Understanding,
CVPR25(8512-8522)
IEEE DOI Code:
WWW Link.
2508
Visualization, Computational modeling, Redundancy,
Statistical learning, Semantics, Cooperative systems, dynamic network
BibRef
Liu, Y.[Yexin],
Liang, Z.Y.[Zheng-Yang],
Wang, Y.Z.[Yue-Ze],
Wu, X.F.[Xian-Feng],
Tang, F.L.[Fei-Long],
He, M.[Muyang],
Li, J.[Jian],
Liu, Z.[Zheng],
Yang, H.[Harry],
Lim, S.[Sernam],
Zhao, B.[Bo],
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering
Incorrectly,
CVPR25(9087-9097)
IEEE DOI
2508
Training, Measurement, Visualization, Pipelines, Refining,
Benchmark testing, Robustness, Decoding, Tuning, MLLM, benchmark,
visual understanding
BibRef
Wang, Z.T.[Zhen-Ting],
Hu, S.M.[Shu-Ming],
Zhao, S.Y.[Shi-Yu],
Lin, X.W.[Xiao-Wen],
Juefei-Xu, F.[Felix],
Li, Z.[Zhuowei],
Han, L.[Ligong],
Subramanyam, H.[Harihar],
Chen, L.[Li],
Chen, J.[Jianfa],
Jiang, N.[Nan],
Lyu, L.[Lingjuan],
Ma, S.Q.[Shi-Qing],
Metaxas, D.N.[Dimitris N.],
Jain, A.[Ankit],
MLLM-as-a-Judge for Image Safety without Human Labeling,
CVPR25(14657-14666)
IEEE DOI
2508
Visualization, Image synthesis, Large language models, Media,
Cognition, Safety, Labeling
BibRef
Zhu, M.[Muzhi],
Tian, Y.Z.[Yu-Zhuo],
Chen, H.[Hao],
Zhou, C.[Chunluan],
Guo, Q.[Qingpei],
Liu, Y.[Yang],
Yang, M.[Ming],
Shen, C.H.[Chun-Hua],
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by
Imitating Human Annotator Trajectories,
CVPR25(3686-3696)
IEEE DOI Code:
WWW Link.
2508
Visualization, Protocols, Annotations, Filtering, Decision making,
Stars, Robustness, Trajectory, Visual perception, mllm, VLM, agent
BibRef
Zhu, L.[Lanyun],
Chen, T.R.[Tian-Run],
Xu, Q.X.[Qian-Xiong],
Liu, X.[Xuanyi],
Ji, D.[Deyi],
Wu, H.Y.[Hai-Yang],
Soh, D.W.[De Wen],
Liu, J.[Jun],
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based
Reasoning Segmentation,
CVPR25(30231-30240)
IEEE DOI
2508
Learning systems, Attention mechanisms, Accuracy, Design methodology,
Computational modeling, Optimization methods, Ensemble learning
BibRef
Niu, J.[Junbo],
Li, Y.F.[Yi-Fei],
Miao, Z.Y.[Zi-Yang],
Ge, C.J.[Chun-Jiang],
Zhou, Y.H.[Yuan-Hang],
He, Q.H.[Qi-Hao],
Dong, X.Y.[Xiao-Yi],
Duan, H.D.[Hao-Dong],
Ding, S.[Shuangrui],
Qian, R.[Rui],
Zhang, P.[Pan],
Zang, Y.H.[Yu-Hang],
Cao, Y.H.[Yu-Hang],
He, C.H.[Cong-Hui],
Wang, J.Q.[Jia-Qi],
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video
Understanding?,
CVPR25(18902-18913)
IEEE DOI Code:
WWW Link.
2508
Analytical models, Adaptation models, Pipelines, Benchmark testing,
Real-time systems, Cognition, Delays, Videos
BibRef
Xue, X.Y.[Xiang-Yuan],
Lu, Z.[Zeyu],
Huang, D.[Di],
Wang, Z.D.[Zi-Dong],
Ouyang, W.L.[Wan-Li],
Bai, L.[Lei],
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously
Designing Collaborative AI Systems,
CVPR25(24614-24624)
IEEE DOI Code:
WWW Link.
2508
Codes, Semantics, Collaboration, Benchmark testing, Closed loop systems,
Artificial intelligence, Complex systems, Multi-agent systems
BibRef
Zhao, Z.[Zijia],
Huo, Y.Q.[Yu-Qi],
Yue, T.T.[Tong-Tian],
Guo, L.T.[Long-Teng],
Lu, H.Y.[Hao-Yu],
Wang, B.N.[Bing-Ning],
Chen, W.P.[Wei-Peng],
Liu, J.[Jing],
Efficient Motion-Aware Video MLLM,
CVPR25(24159-24168)
IEEE DOI
2508
Analytical models, Visualization, Costs, Fuses, Scalability,
Redundancy, Semantics, Benchmark testing, Vectors, Videos
BibRef
Wu, R.H.[Rong-Huan],
Su, W.[Wanchao],
Liao, J.[Jing],
Chat2SVG: Vector Graphics Generation with Large Language Models and
Image Diffusion Models,
CVPR25(23690-23700)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Shape, Large language models, Layout,
Pipelines, Diffusion models, Vectors, Complexity theory,
image diffusion model
BibRef
Yang, S.[Senqiao],
Chen, Y.[Yukang],
Tian, Z.[Zhuotao],
Wang, C.Y.[Cheng-Yao],
Li, J.Y.[Jing-Yao],
Yu, B.[Bei],
Jia, J.Y.[Jia-Ya],
VisionZip: Longer is Better but Not Necessary in Vision Language
Models,
CVPR25(19792-19802)
IEEE DOI Code:
WWW Link.
2508
Visualization, Analytical models, Computational modeling,
Redundancy, Video sequences, Performance gain, Feature extraction,
vision language model
BibRef
Xie, J.Y.[Jing-Yi],
Yang, J.T.[Jin-Tao],
Luo, Z.[Zhunchen],
Cao, Y.[Yunbo],
Gao, Q.[Qiang],
Zhang, M.Y.[Meng-Yuan],
Hu, W.P.[Wen-Peng],
AdaDARE-y: Balancing Stability and Plasticity in Multi-modal LLMs
through Efficient Adaptation,
CVPR25(19758-19768)
IEEE DOI
2508
Adaptation models, Visualization, Technological innovation,
Large language models, Computational modeling
BibRef
Tao, K.[Keda],
Qin, C.[Can],
You, H.X.[Hao-Xuan],
Sui, Y.[Yang],
Wang, H.[Huan],
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language
Models,
CVPR25(18992-19001)
IEEE DOI
2508
Training, Visualization, Image coding, Large language models,
Redundancy, Merging, Decoding, Iterative decoding, Videos, token compression
BibRef
Tao, C.X.[Chen-Xin],
Su, S.Q.[Shi-Qian],
Zhu, X.Z.[Xi-Zhou],
Zhang, C.Y.[Chen-Yu],
Chen, Z.[Zhe],
Liu, J.[Jiawen],
Wang, W.H.[Wen-Hai],
Lu, L.W.[Le-Wei],
Huang, G.[Gao],
Qiao, Y.[Yu],
Dai, J.F.[Ji-Feng],
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with
Holistic Vision-Language Embedding,
CVPR25(14559-14569)
IEEE DOI
2508
Training, Visualization, Large language models, Predictive models,
Encoding, Data models, Tuning, Faces
BibRef
Yin, H.[Hao],
Si, G.Z.[Gunag-Zong],
Wang, Z.[Zilei],
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking
Pathways to Faster Inference,
CVPR25(9382-9391)
IEEE DOI Code:
WWW Link.
2508
Visualization, Codes, Large language models, Perturbation methods,
Computational modeling, Semantics, Information processing,
attention mechanism
BibRef
Yang, L.R.[Long-Rong],
Shen, D.[Dong],
Cai, C.X.[Chao-Xiang],
Chen, K.B.[Kai-Bing],
Yang, F.[Fan],
Gao, T.T.[Ting-Ting],
Zhang, D.[Di],
Li, X.[Xi],
Libra-Merging: Importance-Redundancy and Pruning-Merging Trade-Off
for Acceleration Plug-In in Large Vision-Language Model,
CVPR25(9402-9412)
IEEE DOI Code:
WWW Link.
2508
Visualization, Costs, Codes, Merging, Faces
BibRef
Liang, Y.[Yinan],
Wang, Z.W.[Zi-Wei],
Xu, X.W.[Xiu-Wei],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language
Models,
CVPR25(9445-9454)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Accuracy, Upper bound, Risk minimization,
Costs, Refining, Training data, Data models, Cognition
BibRef
Heo, M.[Miran],
Chen, M.H.[Min-Hung],
Huang, D.A.[De-An],
Liu, S.[Sifei],
Radhakrishnan, S.[Subhashree],
Kim, S.J.[Seon Joo],
Wang, Y.C.A.F.[Yu-Chi-Ang Frank],
Hachiuma, R.[Ryo],
Omni-RGPT: Unifying Image and Video Region-level Understanding via
Token Marks,
CVPR25(3919-3930)
IEEE DOI
2508
Bridges, Visualization, Target tracking, Large language models,
Benchmark testing, Commonsense reasoning, Videos
BibRef
Ouali, Y.[Yassine],
Bulat, A.[Adrian],
Xenos, A.[Alexandros],
Zaganidis, A.[Anestis],
Metaxas, I.M.[Ioannis Maniadis],
Martinez, B.[Brais],
Tzimiropoulos, G.[Georgios],
VladVA: Discriminative Fine-tuning of LVLMs,
CVPR25(4101-4111)
IEEE DOI
2508
Training, Representation learning, Adaptation models,
Computational modeling, Benchmark testing, Predictive models, Standards
BibRef
Schnaus, D.[Dominik],
Araslanov, N.[Nikita],
Cremers, D.[Daniel],
It's a (Blind) Match! Towards Vision-Language Correspondence without
Parallel Data,
CVPR25(24983-24992)
IEEE DOI
2508
Accuracy, Foundation models, Annotations, Computational modeling,
Semantics, Optimal matching, vision-language models,
representation learning
BibRef
Zhao, Y.Q.[Ya-Qi],
Yin, Y.Y.[Yuan-Yang],
Li, L.[Lin],
Lin, M.[Mingan],
Huang, V.S.J.[Victor Shea-Jay],
Chen, S.W.[Si-Wei],
Chen, W.P.[Wei-Peng],
Yin, B.[Baoqun],
Zhou, Z.[Zenan],
Zhang, W.T.[Wen-Tao],
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual
Knowledge,
CVPR25(24950-24959)
IEEE DOI
2508
Visualization, Accuracy, Large language models, Buildings, Faces
BibRef
Ye, X.[Xubing],
Gan, Y.[Yukang],
Huang, X.[Xiaoke],
Ge, Y.X.[Yi-Xiao],
Tang, Y.S.[Yan-Song],
VoCo-LLaMA: Towards Vision Compression with Large Language Models,
CVPR25(29836-29846)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Image coding, Correlation,
Large language models, Force, Computational efficiency, Tuning,
multimodal learning
BibRef
Hu, Y.[Yangliu],
Song, Z.K.[Zi-Kai],
Feng, N.[Na],
Luo, Y.[Yawei],
Yu, J.Q.[Jun-Qing],
Chen, Y.P.P.[Yi-Ping Phoebe],
Yang, W.[Wei],
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for
Fine-Grained Understanding,
CVPR25(29108-29117)
IEEE DOI
2508
Training, Visualization, Annotations, Large language models,
Natural languages, Benchmark testing, Propulsion, Videos
BibRef
Chen, J.[Joya],
Zeng, Z.Y.[Zi-Yun],
Lin, Y.Q.[Yi-Qi],
Li, W.[Wei],
Ma, Z.[Zejun],
Shou, M.Z.[Mike Zheng],
Live: Learning Video LLM with Streaming Speech Transcription at Scale,
CVPR25(29083-29095)
IEEE DOI
2508
Training, Video on demand, Computational modeling, Training data,
Benchmark testing, Real-time systems,
Videos
BibRef
Wang, Z.W.[Zi-Wei],
Chen, W.Z.[Wei-Zhi],
Yang, L.[Leyang],
Zhou, S.[Sheng],
Zhao, S.[Shengchu],
Zhan, H.[Hanbei],
Jin, J.C.[Jiong-Chao],
Li, L.C.[Liang-Cheng],
Shao, Z.[Zirui],
Bu, J.J.[Jia-Jun],
MP-GUI: Modality Perception with MLLMs for GUI Understanding,
CVPR25(29711-29721)
IEEE DOI Code:
WWW Link.
2508
Training, Visualization, Semantics, Pipelines, Training data,
Feature extraction, Spatial databases, Graphical user interfaces,
Synthetic data
BibRef
Vayani, A.[Ashmal],
Dissanayake, D.[Dinura],
Watawana, H.[Hasindri],
Ahsan, N.[Noor],
Sasikumar, N.[Nevasini],
Thawakar, O.[Omkar],
Ademtew, H.B.[Henok Biadglign],
Hmaiti, Y.[Yahya],
Kumar, A.[Amandeep],
Kuckreja, K.[Kartik],
Maslych, M.[Mykola],
Ghallabi, W.A.[Wafa Al],
Mihaylov, M.[Mihail],
Qin, C.[Chao],
Shaker, A.M.[Abdelrahman M],
Zhang, M.[Mike],
Ihsani, M.K.[Mahardika Krisna],
Esplana, A.[Amiel],
Gokani, M.[Monil],
Mirkin, S.[Shachar],
Singh, H.[Harsh],
Srivastava, A.[Ashay],
Hamerlik, E.[Endre],
Izzati, F.A.[Fathinah Asma],
Maani, F.A.[Fadillah Adamsyah],
Cavada, S.[Sebastian],
Chim, J.[Jenny],
Gupta, R.[Rohit],
Manjunath, S.[Sanjay],
Zhumakhanova, K.[Kamila],
Rabevohitra, F.H.[Feno Heriniaina],
Amirudin, A.[Azril],
Ridzuan, M.[Muhammad],
Kareem, D.[Daniya],
More, K.[Ketan],
Li, K.[Kunyang],
Shakya, P.[Pramesh],
Saad, M.[Muhammad],
Ghasemaghaei, A.[Amirpouya],
Djanibekov, A.[Amirbek],
Azizov, D.[Dilshod],
Jankovic, B.[Branislava],
Bhatia, N.[Naman],
Cabrera, A.[Alvaro],
Obando-Ceron, J.[Johan],
Otieno, O.[Olympiah],
Farestam, F.[Fabian],
Rabbani, M.[Muztoba],
Baliah, S.[Sanoojan],
Sanjeev, S.[Santosh],
Shtanchaev, A.[Abduragim],
Fatima, M.[Maheen],
Nguyen, T.[Thao],
Kareem, A.[Amrin],
Aremu, T.[Toluwani],
Xavier, N.[Nathan],
Bhatkal, A.[Amit],
Toyin, H.[Hawau],
Chadha, A.[Aman],
Cholakkal, H.[Hisham],
Anwer, R.M.[Rao Muhammad],
Felsberg, M.[Michael],
Laaksonen, J.[Jorma],
Solorio, T.[Thamar],
Choudhury, M.[Monojit],
Laptev, I.[Ivan],
Shah, M.[Mubarak],
Khan, S.[Salman],
Khan, F.S.[Fahad Shahbaz],
All Languages Matter: Evaluating LMMs on Culturally Diverse 100
Languages,
CVPR25(19565-19575)
IEEE DOI Code:
WWW Link.
2508
Visualization, Sensitivity, Benchmark testing, Germanium,
Distance measurement, Cognition, Multilingual,
multilingual multimodal benchmark
BibRef
Cao, A.[Anjia],
Wei, X.[Xing],
Ma, Z.H.[Zhi-Heng],
FLAME: Frozen Large Language Models Enable Data-Efficient
Language-Image Pre-training,
CVPR25(4080-4090)
IEEE DOI Code:
WWW Link.
2508
Large language models, Semantics, Fires, Text to image,
Data augmentation, Multilingual, Faces
BibRef
Bi, J.[Jing],
Guo, J.J.[Jun-Jia],
Tang, Y.L.[Yun-Long],
Wen, L.G.B.[Liang-Gong Bruce],
Liu, Z.[Zhang],
Wang, B.J.[Bing-Jie],
Xu, C.L.[Chen-Liang],
Unveiling Visual Perception in Language Models: An Attention Head
Analysis Approach,
CVPR25(4135-4144)
IEEE DOI
2508
Visualization, Adaptation models, Systematics, Correlation,
Large language models, Linguistics, Data models, Visual perception,
llm
BibRef
Li, S.[Shiyao],
Hu, Y.C.[Ying-Chun],
Ning, X.F.[Xue-Fei],
Liu, X.H.[Xi-Hui],
Hong, K.[Ke],
Jia, X.T.[Xiao-Tao],
Li, X.[Xiuhong],
Yan, Y.Q.[Ya-Qi],
Ran, P.[Pei],
Dai, G.H.[Guo-Hao],
Yan, S.[Shengen],
Yang, H.Z.[Hua-Zhong],
Wang, Y.[Yu],
MBQ: Modality-Balanced Quantization for Large Vision-Language Models,
CVPR25(4167-4177)
IEEE DOI Code:
WWW Link.
2508
Quantization (signal), Sensitivity, Accuracy, Fuses,
Large language models, Graphics processing units, Calibration,
Kernel
BibRef
Liu, Z.[Zhuoming],
Li, Y.Q.[Yi-Quan],
Nguyen, K.D.[Khoi Duc],
Zhong, Y.[Yiwu],
Li, Y.[Yin],
PAVE: Patching and Adapting Video Large Language Models,
CVPR25(3306-3317)
IEEE DOI Code:
WWW Link.
2508
Adaptation models, Solid modeling, Large language models,
Computational modeling, Cognition, multi-modality
BibRef
Malakouti, S.[Sina],
Aghazadeh, A.[Aysan],
Khandelwal, A.[Ashmit],
Kovashka, A.[Adriana],
Benchmarking VLMs' Reasoning About Persuasive Atypical Images,
WACV25(4788-4798)
IEEE DOI
2505
Visualization, Codes, Large language models, Focusing, Media,
Benchmark testing, Cognition, Data mining, Object recognition
BibRef
Lee, H.[Hankyeol],
Seo, G.[Gawon],
Choi, W.[Wonseok],
Jung, G.[Geunyoung],
Song, K.[Kyungwoo],
Jung, J.Y.[Ji-Young],
Enhancing Visual Classification Using Comparative Descriptors,
WACV25(5274-5283)
IEEE DOI Code:
WWW Link.
2505
Measurement, Visualization, Accuracy, Filtering,
Computational modeling, Large language models, Semantics,
Image classification
BibRef
Ee, Y.K.[Yeo Keat],
Zhang, H.[Hao],
Matyasko, A.[Alexander],
Fernando, B.[Basura],
Deduce and Select Evidences with Language Models for Training-Free
Video Goal Inference,
WACV25(5937-5947)
IEEE DOI
2505
Visualization, Accuracy, Filtering, Large language models,
Computational modeling, Robustness, Cognition, training-free
BibRef
Fu, R.[Rao],
Liu, J.Y.[Jing-Yu],
Chen, X.[Xilun],
Nie, Y.X.[Yi-Xin],
Xiong, W.H.[Wen-Han],
Scene-LLM: Extending Language Model for 3D Visual Reasoning,
WACV25(2195-2206)
IEEE DOI
2505
Location awareness, Solid modeling, Visualization,
Large language models, Cognition, 3d understanding
BibRef
Awais, M.[Muhammad],
Alharthi, A.H.S.A.[Ali Husain Salem Abdulla],
Kumar, A.[Amandeep],
Cholakkal, H.[Hisham],
Anwer, R.M.[Rao Muhammad],
AgroGPT: Efficient Agricultural Vision-Language Model with Expert
Tuning,
WACV25(5687-5696)
IEEE DOI Code:
WWW Link.
2505
Codes, Computational modeling, Large language models, Pipelines,
Oral communication, Agriculture, Data models, Tuning
BibRef
Kruzhkov, E.[Evgenii],
Behnke, S.[Sven],
LiLMaps: Learnable Implicit Language Maps,
WACV25(7711-7720)
IEEE DOI
2505
Visualization, Large language models, Human-robot interaction,
Object detection, Solids, Market research, Decoding, Optimization,
incremental implicit mapping
BibRef
Singh, C.K.[Chandan Kumar],
Kumar, D.[Devesh],
Sanap, V.[Vipul],
Sinha, R.[Rajesh],
LLM-RSPF: Large Language Model-Based Robotic System Planning
Framework for Domain Specific Use-cases,
WACV25(7277-7286)
IEEE DOI
2505
Solid modeling, Accuracy, Systematics, Service robots, Ontologies,
Throughput, Robustness, Planning, Robots, coht, task planning
BibRef
Sun, L.[Li],
Ahuja, C.[Chaitanya],
Chen, P.[Peng],
D'Zmura, M.[Matt],
Batmanghelich, K.[Kayhan],
Bontrager, P.[Philip],
Multi-Modal Large Language Models are Effective Vision Learners,
WACV25(8617-8626)
IEEE DOI
2505
Representation learning, Resistance, Visualization, Large language models,
Feature extraction, Robustness, Data models, multi-modal
BibRef
Tateno, M.[Masatoshi],
Yagi, T.[Takuma],
Furuta, R.[Ryosuke],
Sato, Y.[Yoichi],
Learning Multiple Object States from Actions via Large Language
Models,
WACV25(9555-9565)
IEEE DOI
2505
Analytical models, Accuracy, Annotations, Computational modeling,
Large language models, Catalysts, Multi label classification
BibRef
Bahadir, C.D.[Cagla Deniz],
Akar, G.B.[Gozde B.],
Sabuncu, M.R.[Mert R.],
LLM-Generated Rewrite and Context Modulation for Enhanced Vision
Language Models in Digital Pathology,
WACV25(327-336)
IEEE DOI
2505
Training, Pathology, Sensitivity, Computational modeling, Modulation,
Text to image, Standards, Context modeling, Biomedical imaging,
large language models
BibRef
Chu, X.X.[Xiang-Xiang],
Su, J.L.[Jian-Lin],
Zhang, B.[Bo],
Shen, C.H.[Chun-Hua],
VisionlLaMA: A Unified LLaMA Backbone for Vision Tasks,
ECCV24(LXVI: 1-18).
Springer DOI
2412
Code:
WWW Link.
BibRef
Long, F.C.[Fu-Chen],
Qiu, Z.F.[Zhao-Fan],
Yao, T.[Ting],
Mei, T.[Tao],
VideoStudio: Generating Consistent-content and Multi-scene Videos,
ECCV24(LX: 468-485).
Springer DOI
2412
Code:
WWW Link.
BibRef
Kong, X.H.[Xiang-Hao],
Chen, J.[Jinyu],
Wang, W.G.[Wen-Guan],
Su, H.[Hang],
Hu, X.L.[Xiao-Lin],
Yang, Y.[Yi],
Liu, S.[Si],
Controllable Navigation Instruction Generation with Chain of Thought
Prompting,
ECCV24(XXIX: 37-54).
Springer DOI
2412
Instruction generation.
BibRef
Zhu, W.Y.C.[William Yi-Cheng],
Ye, K.[Keren],
Ke, J.J.[Jun-Jie],
Yu, J.H.[Jia-Hui],
Guibas, L.J.[Leonidas J.],
Milanfar, P.[Peyman],
Yang, F.[Feng],
ARTVLM: Attribute Recognition Through Vision-based Prefix Language
Modeling,
ECCV24(XXVII: 127-145).
Springer DOI
2412
Code:
WWW Link.
BibRef
Kim, D.[Donggyun],
Cho, S.[Seongwoong],
Kim, S.[Semin],
Luo, C.[Chong],
Hong, S.[Seunghoon],
Chameleon: A Data-efficient Generalist for Dense Visual Prediction in
the Wild,
ECCV24(XXIII: 422-441).
Springer DOI
2412
Code:
WWW Link.
BibRef
Ke, F.[Fucai],
Cai, Z.X.[Zhi-Xi],
Jahangard, S.[Simindokht],
Wang, W.Q.[Wei-Qing],
Haghighi, P.D.[Pari Delir],
Rezatofighi, H.[Hamid],
Hydra: A Hyper Agent for Dynamic Compositional Visual Reasoning,
ECCV24(XX: 132-149).
Springer DOI
2412
BibRef
Bao, X.Y.[Xiao-Yi],
Sun, S.Y.[Si-Yang],
Ma, S.L.[Shuai-Lei],
Zheng, K.C.[Ke-Cheng],
Guo, Y.X.[Yu-Xin],
Zhao, G.S.[Guo-Sheng],
Zheng, Y.[Yun],
Wang, X.G.[Xin-Gang],
Cores: Orchestrating the Dance of Reasoning and Segmentation,
ECCV24(XVIII: 187-204).
Springer DOI
2412
BibRef
Liu, Z.[Zuyan],
Liu, B.[Benlin],
Wang, J.H.[Jia-Hui],
Dong, Y.H.[Yu-Hao],
Chen, G.Y.[Guang-Yi],
Rao, Y.M.[Yong-Ming],
Krishna, R.[Ranjay],
Lu, J.W.[Ji-Wen],
Efficient Inference of Vision Instruction-following Models with Elastic
Cache,
ECCV24(XVII: 54-69).
Springer DOI
2412
Code:
WWW Link.
BibRef
Alaluf, Y.[Yuval],
Richardson, E.[Elad],
Tulyakov, S.[Sergey],
Aberman, K.[Kfir],
Cohen-Or, D.[Daniel],
MYVLM: Personalizing VLMS for User-specific Queries,
ECCV24(XIII: 73-91).
Springer DOI
2412
BibRef
Ma, Z.X.[Zi-Xian],
Huang, W.[Weikai],
Zhang, J.[Jieyu],
Gupta, T.[Tanmay],
Krishna, R.[Ranjay],
m&m's: A Benchmark to Evaluate Tool-use for multi-step multi-modal
Tasks,
ECCV24(X: 18-34).
Springer DOI
2412
WWW Link. and
WWW Link.
BibRef
Miao, Y.[Yang],
Engelmann, F.[Francis],
Vysotska, O.[Olga],
Zhao, Z.H.[Zhong-Han],
Chai, W.H.[Wen-Hao],
Wang, X.[Xuan],
Li, B.[Boyi],
Hao, S.Y.[Sheng-Yu],
Cao, S.D.[Shi-Dong],
Ye, T.[Tian],
Wang, G.A.[Gao-Ang],
See and Think: Embodied Agent in Virtual Environment,
ECCV24(VIII: 187-204).
Springer DOI
2412
BibRef
Liu, Y.[Yuan],
Duan, H.D.[Hao-Dong],
Zhang, Y.H.[Yuan-Han],
Li, B.[Bo],
Zhang, S.Y.[Song-Yang],
Zhao, W.[Wangbo],
Yuan, Y.[Yike],
Wang, J.Q.[Jia-Qi],
He, C.H.[Cong-Hui],
Liu, Z.W.[Zi-Wei],
Chen, K.[Kai],
Lin, D.[Dahua],
MMBENCH: Is Your Multi-Modal Model an All-Around Player?,
ECCV24(VI: 216-233).
Springer DOI
2412
BibRef
Liu, Y.[Yang],
Ding, P.X.[Peng-Xiang],
Huang, S.[Siteng],
Zhang, M.[Min],
Zhao, H.[Han],
Wang, D.L.[Dong-Lin],
PITE: Pixel-Temporal Alignment for Large Video-Language Model,
ECCV24(V: 160-176).
Springer DOI
2412
BibRef
Panagopoulou, A.[Artemis],
Xue, L.[Le],
Yu, N.[Ning],
Li, J.[Junnan],
Li, D.X.[Dong-Xu],
Joty, S.[Shafiq],
Xu, R.[Ran],
Savarese, S.[Silvio],
Xiong, C.M.[Cai-Ming],
Niebles, J.C.[Juan Carlos],
X-instructblip: A Framework for Aligning Image, 3d, Audio, Video to
LLMs and its Emergent Cross-modal Reasoning,
ECCV24(XLV: 177-197).
Springer DOI
2412
BibRef
Mirza, M.J.[M. Jehanzeb],
Karlinsky, L.[Leonid],
Lin, W.[Wei],
Doveh, S.[Sivan],
Micorek, J.[Jakub],
Kozinski, M.[Mateusz],
Kuehne, H.[Hilde],
Possegger, H.[Horst],
Meta-prompting for Automating Zero-shot Visual Recognition with LLMs,
ECCV24(II: 370-387).
Springer DOI
2412
BibRef
Liu, Z.Y.[Zhao-Yang],
Lai, Z.Q.[Ze-Qiang],
Gao, Z.W.[Zhang-Wei],
Cui, E.[Erfei],
Li, Z.H.[Zi-Heng],
Zhu, X.Z.[Xi-Zhou],
Lu, L.W.[Le-Wei],
Chen, Q.F.[Qi-Feng],
Qiao, Y.[Yu],
Dai, J.F.[Ji-Feng],
Wang, W.H.[Wen-Hai],
ControlLLM: Augment Language Models with Tools by Searching on Graphs,
ECCV24(XII: 89-105).
Springer DOI
2412
BibRef
Yao, Y.[Yi],
Hsu, C.F.[Chan-Feng],
Lin, J.H.[Jhe-Hao],
Xie, H.X.[Hong-Xia],
Lin, T.[Terence],
Huang, Y.N.[Yi-Ning],
Shuai, H.H.[Hong-Han],
Cheng, W.H.[Wen-Huang],
The Fabrication of Reality and Fantasy: Scene Generation with
LLM-assisted Prompt Interpretation,
ECCV24(XXII: 422-438).
Springer DOI
2412
BibRef
Wu, Y.X.[Yi-Xuan],
Wang, Y.Z.[Yi-Zhou],
Tang, S.X.[Shi-Xiang],
Wu, W.H.[Wen-Hao],
He, T.[Tong],
Ouyang, W.L.[Wan-Li],
Torr, P.H.S.[Philip H.S.],
Wu, J.[Jian],
Dettoolchain: A New Prompting Paradigm to Unleash Detection Ability of
MLLM,
ECCV24(XXXII: 164-182).
Springer DOI
2412
BibRef
Wang, H.[Han],
Ye, Y.J.[Yong-Jie],
Wang, Y.J.[Yan-Jie],
Nie, Y.X.[Yu-Xiang],
Huang, C.[Can],
Elysium: Exploring Object-level Perception in Videos via MLLM,
ECCV24(XXII: 166-185).
Springer DOI
2412
BibRef
Guo, Z.H.[Zong-Hao],
Xu, R.[Ruyi],
Yao, Y.[Yuan],
Cui, J.[Junbo],
Ni, Z.[Zanlin],
Ge, C.J.[Chun-Jiang],
Chua, T.S.[Tat-Seng],
Liu, Z.Y.[Zhi-Yuan],
Huang, G.[Gao],
LLAVA-UHD: An LMM Perceiving Any Aspect Ratio and High-resolution
Images,
ECCV24(LXXXIII: 390-406).
Springer DOI
2412
BibRef
Zhou, G.Z.[Geng-Ze],
Hong, Y.C.[Yi-Cong],
Wang, Z.[Zun],
Wang, X.E.[Xin Eric],
Wu, Q.[Qi],
NAVGPT-2: Unleashing Navigational Reasoning Capability for Large
Vision-language Models,
ECCV24(VII: 260-278).
Springer DOI
2412
BibRef
Wei, H.R.[Hao-Ran],
Kong, L.Y.[Ling-Yu],
Chen, J.Y.[Jin-Yue],
Zhao, L.[Liang],
Ge, Z.[Zheng],
Wei, J.R.Y.H.R.[Jin-Rong Yang Hao-Ran],
Wang, T.[Tiancai],
Ge, Z.[Zheng],
Zhang, X.Y.[Xiang-Yu],
Tao, W.B.[Wen-Bing],
Vary: Scaling up the Vision Vocabulary for Large Vision-language Model,
ECCV24(IV: 408-424).
Springer DOI
2412
BibRef
He, S.T.[Shu-Ting],
Ding, H.H.[Heng-Hui],
Jiang, X.D.[Xu-Dong],
Wen, B.[Bihan],
Segpoint: Segment Any Point Cloud via Large Language Model,
ECCV24(XXII: 349-367).
Springer DOI
2412
BibRef
Murugesan, B.[Balamurali],
Silva-Rodríguez, J.[Julio],
Ben Ayed, I.[Ismail],
Dolz, J.[Jose],
Robust Calibration of Large Vision-language Adapters,
ECCV24(XXIV: 147-165).
Springer DOI
2412
BibRef
Xu, R.[Runsen],
Wang, X.L.[Xiao-Long],
Wang, T.[Tai],
Chen, Y.L.[Yi-Lun],
Pang, J.M.[Jiang-Miao],
Lin, D.[Dahua],
Pointllm: Empowering Large Language Models to Understand Point Clouds,
ECCV24(XXV: 131-147).
Springer DOI
2412
BibRef
Cai, K.W.[Kai-Wen],
Duan, Z.K.[Zhe-Kai],
Liu, G.[Gaowen],
Fleming, C.[Charles],
Lu, C.X.X.[Chris Xiao-Xuan],
Self-adapting Large Visual-language Models to Edge Devices Across
Visual Modalities,
ECCV24(XXVIII: 301-318).
Springer DOI
2412
BibRef
Yu, R.[Runpeng],
Yu, W.H.[Wei-Hao],
Wang, X.C.[Xin-Chao],
Attention Prompting on Image for Large Vision-language Models,
ECCV24(XXX: 251-268).
Springer DOI
2412
BibRef
Luo, Y.L.[Yu-Lin],
An, R.[Ruichuan],
Zou, B.[Bocheng],
Tang, Y.M.[Yi-Ming],
Liu, J.M.[Jia-Ming],
Zhang, S.H.[Shang-Hang],
Llm as Dataset Analyst: Subpopulation Structure Discovery with Large
Language Model,
ECCV24(XXXIII: 235-252).
Springer DOI
2412
BibRef
Huang, Z.J.[Zhi-Jian],
Tang, T.[Tao],
Chen, S.X.[Shao-Xiang],
Lin, S.[Sihao],
Jie, Z.Q.[Ze-Qun],
Ma, L.[Lin],
Wang, G.[Guangrun],
Liang, X.D.[Xiao-Dan],
Making Large Language Models Better Planners with Reasoning-decision
Alignment,
ECCV24(XXXVI: 73-90).
Springer DOI
2412
BibRef
Zhan, Y.F.[Yu-Fei],
Zhu, Y.[Yousong],
Chen, Z.Y.[Zhi-Yang],
Yang, F.[Fan],
Tang, M.[Ming],
Wang, J.Q.[Jin-Qiao],
Griffon: Spelling Out All Object Locations at Any Granularity with
Large Language Models,
ECCV24(XLII: 405-422).
Springer DOI
2412
BibRef
Li, Y.W.[Yan-Wei],
Wang, C.Y.[Cheng-Yao],
Jia, J.Y.[Jia-Ya],
Llama-vid: An Image is Worth 2 Tokens in Large Language Models,
ECCV24(XLVI: 323-340).
Springer DOI
2412
BibRef
Ju, C.[Chen],
Wang, H.[Haicheng],
Cheng, H.Z.[Hao-Zhe],
Chen, X.[Xu],
Zhai, Z.H.[Zhong-Hua],
Huang, W.L.[Wei-Lin],
Lan, J.S.[Jin-Song],
Xiao, S.[Shuai],
Zheng, B.[Bo],
Turbo: Informativity-driven Acceleration Plug-in for Vision-language
Large Models,
ECCV24(XLVI: 436-455).
Springer DOI
2412
BibRef
Zhao, Q.[Qinyu],
Xu, M.[Ming],
Gupta, K.[Kartik],
Asthana, A.[Akshay],
Zheng, L.[Liang],
Gould, S.[Stephen],
The First to Know: How Token Distributions Reveal Hidden Knowledge in
Large Vision-language Models?,
ECCV24(XLVIII: 127-142).
Springer DOI
2412
BibRef
Lee, B.K.[Byung-Kwan],
Park, B.[Beomchan],
Kim, C.W.[Chae Won],
Ro, Y.M.[Yong Man],
Moai: Mixture of All Intelligence for Large Language and Vision Models,
ECCV24(XLIX: 273-302).
Springer DOI
2412
BibRef
Liu, R.[Ruyang],
Li, C.[Chen],
Tang, H.R.[Hao-Ran],
Ge, Y.X.[Yi-Xiao],
Shan, Y.[Ying],
Li, G.[Ge],
ST-LLM: Large Language Models Are Effective Temporal Learners,
ECCV24(LVII: 1-18).
Springer DOI
2412
BibRef
Cheng, H.[Hao],
Xiao, E.[Erjia],
Gu, J.D.[Jin-Dong],
Yang, L.[Le],
Duan, J.[Jinhao],
Zhang, J.[Jize],
Cao, J.H.[Jia-Hang],
Xu, K.D.[Kai-Di],
Xu, R.[Renjing],
Unveiling Typographic Deceptions: Insights of the Typographic
Vulnerability in Large Vision-language Models,
ECCV24(LIX: 179-196).
Springer DOI
2412
BibRef
Lin, Z.[Ziyi],
Liu, D.Y.[Dong-Yang],
Zhang, R.R.[Ren-Rui],
Gao, P.[Peng],
Qiu, L.T.[Long-Tian],
Xiao, H.[Han],
Qiu, H.[Han],
Shao, W.Q.[Wen-Qi],
Chen, K.Q.[Ke-Qin],
Han, J.M.[Jia-Ming],
Huang, S.Y.[Si-Yuan],
Zhang, Y.[Yichi],
He, X.M.[Xu-Ming],
Qiao, Y.[Yu],
Li, H.S.[Hong-Sheng],
Sphinx: A Mixer of Weights, Visual Embeddings and Image Scales for
Multi-modal Large Language Models,
ECCV24(LXII: 36-55).
Springer DOI
2412
BibRef
Chiquier, M.[Mia],
Mall, U.[Utkarsh],
Vondrick, C.[Carl],
Evolving Interpretable Visual Classifiers with Large Language Models,
ECCV24(LXIV: 183-201).
Springer DOI
2412
BibRef
Chen, L.[Liang],
Zhao, H.Z.[Hao-Zhe],
Liu, T.Y.[Tian-Yu],
Bai, S.[Shuai],
Lin, J.Y.[Jun-Yang],
Zhou, C.[Chang],
Chang, B.[Baobao],
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-play Inference
Acceleration for Large Vision-language Models,
ECCV24(LXXXI: 19-35).
Springer DOI
2412
BibRef
Shi, B.F.[Bai-Feng],
Wu, Z.Y.[Zi-Yang],
Mao, M.L.[Mao-Lin],
Wang, X.[Xin],
Darrell, T.J.[Trevor J.],
When Do We Not Need Larger Vision Models?,
ECCV24(VIII: 444-462).
Springer DOI
2412
BibRef
Yu, Q.H.[Qi-Hang],
Shen, X.H.[Xiao-Hui],
Chen, L.C.[Liang-Chieh],
Towards Open-ended Visual Recognition with Large Language Models,
ECCV24(XIV: 359-376).
Springer DOI
2412
BibRef
Huang, K.[Kai],
Zou, H.[Hao],
Xi, Y.[Ye],
Wang, B.C.[Bo-Chen],
Xie, Z.[Zhen],
Yu, L.[Liang],
IVTP: Instruction-guided Visual Token Pruning for Large Vision-language
Models,
ECCV24(XVII: 214-230).
Springer DOI
2412
BibRef
Liu, H.T.[Hao-Tian],
Li, C.Y.[Chun-Yuan],
Li, Y.H.[Yu-Heng],
Lee, Y.J.[Yong Jae],
Improved Baselines with Visual Instruction Tuning,
CVPR24(26286-26296)
IEEE DOI
2410
Training, Connectors, Visualization, Systematics, Codes, Computational modeling
BibRef
Schiappa, M.[Madeline],
Abdullah, R.[Raiyaan],
Azad, S.[Shehreen],
Claypoole, J.[Jared],
Cogswell, M.[Michael],
Divakaran, A.[Ajay],
Rawat, Y.[Yogesh],
Probing Conceptual Understanding of Large Visual-Language Models,
WhatNext24(1797-1807)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Shape, Snow, Color, Benchmark testing,
Transformers, Robustness, Conceptual understanding
BibRef
Yue, T.T.[Tong-Tian],
Cheng, J.[Jie],
GUo, L.T.[Long-Teng],
Dai, X.Y.[Xing-Yuan],
Zhao, Z.[Zijia],
He, X.J.[Xing-Jian],
Xiong, G.[Gang],
Lv, Y.S.[Yi-Sheng],
Liu, J.[Jing],
SC- Tune: Unleashing Self-Consistent Referential Comprehension in
Large Vision Language Models,
CVPR24(13073-13083)
IEEE DOI Code:
WWW Link.
2410
Training, Codes, Computational modeling, Focusing, Benchmark testing
BibRef
Wu, T.H.[Tsung-Han],
Lian, L.[Long],
Gonzalez, J.E.[Joseph E.],
Li, B.[Boyi],
Darrell, T.J.[Trevor J.],
Self-Correcting LLM-Controlled Diffusion Models,
CVPR24(6327-6336)
IEEE DOI Code:
WWW Link.
2410
Image synthesis, Pipelines, Text to image, Process control,
Detectors, Superluminescent diodes, Diffusion models
BibRef
Zheng, D.[Duo],
Huang, S.[Shijia],
Zhao, L.[Lin],
Zhong, Y.[Yiwu],
Wang, L.W.[Li-Wei],
Towards Learning a Generalist Model for Embodied Navigation,
CVPR24(13624-13634)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Solid modeling, Navigation,
Soft sensors, Computational modeling, Visual-Language Navigation,
LLM
BibRef
Singh, S.[Simranjit],
Fore, M.[Michael],
Stamoulis, D.[Dimitrios],
GeoLLM-Engine: A Realistic Environment for Building Geospatial
Copilots,
EarthVision24(585-594)
IEEE DOI
2410
Earth, Geology, Natural languages, Benchmark testing,
Parallel processing, Geospatial analysis, Satellite images,
Benchmark
BibRef
Zhang, Y.C.[Yue-Chen],
Qian, S.J.[Sheng-Ju],
Peng, B.[Bohao],
Liu, S.[Shu],
Jia, J.Y.[Jia-Ya],
Prompt Highlighter: Interactive Control for Multi-Modal LLMs,
CVPR24(13215-13224)
IEEE DOI
2410
Training, Semantics, Process control, Focusing,
Reliability, Usability, VLM, LLM, Interactive Control, Image Caption,
Training-Free
BibRef
Wang, D.K.[Dong-Kai],
Xuan, S.Y.[Shi-Yu],
Zhang, S.L.[Shi-Liang],
LocLLM: Exploiting Generalizable Human Keypoint Localization via
Large Language Model,
CVPR24(614-623)
IEEE DOI Code:
WWW Link.
2410
Location awareness, Training, Large language models, Pipelines,
Training data, Cognition, Keypoint Localization,
Large Language Model
BibRef
Liu, H.C.[Han-Chao],
Zhan, X.H.[Xiao-Hang],
Huang, S.L.[Shao-Li],
Mu, T.J.[Tai-Jiang],
Shan, Y.[Ying],
Programmable Motion Generation for Open-Set Motion Control Tasks,
CVPR24(1399-1408)
IEEE DOI
2410
Motion planning, Large language models, Computational modeling,
Semantics, Dynamics, Training data
BibRef
Zhao, L.[Lirui],
Yang, Y.[Yue],
Zhang, K.[Kaipeng],
Shao, W.Q.[Wen-Qi],
Zhang, Y.X.[Yu-Xin],
Qiao, Y.[Yu],
Luo, P.[Ping],
Ji, R.R.[Rong-Rong],
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large
Language Model,
CVPR24(6390-6399)
IEEE DOI Code:
WWW Link.
2410
Training, Technological innovation, Accuracy, Codes,
Large language models, Computational modeling, LLM Agent, LLM Tool Usage
BibRef
Yao, J.[Junyi],
Liu, Y.J.[Yi-Jiang],
Dong, Z.[Zhen],
Guo, M.F.[Ming-Fei],
Hu, H.[Helan],
Keutzer, K.[Kurt],
Du, L.[Li],
Zhou, D.[Daquan],
Zhang, S.H.[Shang-Hang],
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought,
CVPR24(7027-7037)
IEEE DOI
2410
Training, Adaptation models, Visualization, Computational modeling,
Large language models, Semantics, Text to image
BibRef
Cai, Z.P.[Zhi-Peng],
Mueller, M.[Matthias],
Birkl, R.[Reiner],
Wofk, D.[Diana],
Tseng, S.Y.[Shao-Yen],
Cheng, J.[Junda],
Stan, G.B.M.[Gabriela Ben-Melech],
Lai, V.[Vasudev],
Paulitsch, M.[Michael],
L-MAGIC: Language Model Assisted Generation of Images with Coherence,
CVPR24(7049-7058)
IEEE DOI Code:
WWW Link.
2410
Point cloud compression, Solid modeling, Layout, Superresolution,
Estimation, Diffusion models, Image generation, large language models
BibRef
Li, Y.[Yanyu],
Liu, X.[Xian],
Kag, A.[Anil],
Hu, J.[Ju],
Idelbayev, Y.[Yerlan],
Sagar, D.[Dhritiman],
Wang, Y.Z.[Yan-Zhi],
Tulyakov, S.[Sergey],
Ren, J.[Jian],
TextCraftor: Your Text Encoder can be Image Quality Controller,
CVPR24(7985-7995)
IEEE DOI
2410
Training, Measurement, Interpolation, Image synthesis,
Large language models, Pipelines, Text to image, Stable Diffusion,
Image and video synthesis and generation
BibRef
Argaw, D.M.[Dawit Mureja],
Yoon, S.H.[Seung-Hyun],
Heilbron, F.C.[Fabian Caba],
Deilamsalehy, H.[Hanieh],
Bui, T.[Trung],
Wang, Z.W.[Zhao-Wen],
Dernoncourt, F.[Franck],
Chung, J.S.[Joon Son],
Scaling Up Video Summarization Pretraining with Large Language Models,
CVPR24(8332-8341)
IEEE DOI
2410
Analytical models, Large language models, Computational modeling,
Pipelines, Benchmark testing
BibRef
Lai, X.[Xin],
Tian, Z.[Zhuotao],
Chen, Y.K.[Yu-Kang],
Li, Y.W.[Yan-Wei],
Yuan, Y.H.[Yu-Hui],
Liu, S.[Shu],
Jia, J.Y.[Jia-Ya],
LISA: Reasoning Segmentation via Large Language Model,
CVPR24(9579-9589)
IEEE DOI
2410
Image segmentation, Vocabulary, Visualization, Target recognition,
Large language models, Benchmark testing
BibRef
Shang, C.M.[Chen-Ming],
Zhou, S.[Shiji],
Zhang, H.Y.[Heng-Yuan],
Ni, X.Z.[Xin-Zhe],
Yang, Y.[Yujiu],
Wang, Y.W.[Yu-Wang],
Incremental Residual Concept Bottleneck Models,
CVPR24(11030-11040)
IEEE DOI
2410
Measurement, Visualization, Accuracy, Large language models,
Current measurement, Decision making, Closed box
BibRef
Xie, Y.T.[Yu-Tong],
Chen, Q.[Qi],
Wang, S.[Sinuo],
To, M.S.[Minh-Son],
Lee, I.[Iris],
Khoo, E.W.[Ee Win],
Hendy, K.[Kerolos],
Koh, D.[Daniel],
Xia, Y.[Yong],
Wu, Q.[Qi],
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?,
CVPR24(11652-11661)
IEEE DOI Code:
WWW Link.
2410
Data privacy, Medical conditions, Large language models, Radiology,
Data augmentation
BibRef
Dong, Z.K.[Zhi-Kang],
Liu, X.L.[Xiu-Long],
Chen, B.[Bin],
Polak, P.[Pawel],
Zhang, P.[Peng],
MuseChat: A Conversational Music Recommendation System for Videos,
CVPR24(12775-12785)
IEEE DOI Code:
WWW Link.
2410
Accuracy, Large language models, Natural languages, Cognition,
Recommender systems, Multimodal Learning,
Music Information Retrieval
BibRef
Li, F.[Feng],
Jiang, Q.[Qing],
Zhang, H.[Hao],
Ren, T.[Tianhe],
Liu, S.L.[Shi-Long],
Zou, X.[Xueyan],
Xu, H.Z.[Huai-Zhe],
Li, H.Y.[Hong-Yang],
Yang, J.W.[Jian-Wei],
Li, C.Y.[Chun-Yuan],
Zhang, L.[Lei],
Gao, J.F.[Jian-Feng],
Visual in-Context Prompting,
CVPR24(12861-12871)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Image segmentation, Codes,
Large language models, Computer architecture
BibRef
Sachdeva, R.[Ragav],
Zisserman, A.[Andrew],
The Manga Whisperer: Automatically Generating Transcriptions for
Comics,
CVPR24(12967-12976)
IEEE DOI Code:
WWW Link.
2410
Visualization, Codes, Large language models, Visual impairment,
Oral communication, Linguistics
BibRef
Zhong, S.S.[Shan-Shan],
Huang, Z.Z.[Zhong-Zhan],
Gao, S.[Shanghua],
Wen, W.[Wushao],
Lin, L.[Liang],
Zitnik, M.[Marinka],
Zhou, P.[Pan],
Let's Think Outside the Box: Exploring Leap-of-Thought in Large
Language Models with Creative Humor Generation,
CVPR24(13246-13257)
IEEE DOI Code:
WWW Link.
2410
Technological innovation, Codes, Large language models, Games,
Cognition
BibRef
Gao, Z.[Zhi],
Du, Y.T.[Yun-Tao],
Zhang, X.T.[Xin-Tong],
Ma, X.J.[Xiao-Jian],
Han, W.J.[Wen-Juan],
Zhu, S.C.[Song-Chun],
Li, Q.[Qing],
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update,
CVPR24(13258-13268)
IEEE DOI
2410
Visualization, Limiting,
Large language models, Training data, Tagging, Reflection,
Compositional Reasoning
BibRef
Buettner, K.[Kyle],
Malakouti, S.[Sina],
Li, X.L.[Xiang Lorraine],
Kovashka, A.[Adriana],
Incorporating Geo-Diverse Knowledge into Prompting for Increased
Geographical Robustness in Object Recognition,
CVPR24(13515-13524)
IEEE DOI
2410
Geography, Training, Large language models, Training data, Europe, Robustness
BibRef
Liu, R.[Ruyang],
Li, C.[Chen],
Ge, Y.X.[Yi-Xiao],
Li, T.H.[Thomas H.],
Shan, Y.[Ying],
Li, G.[Ge],
BT-Adapter: Video Conversation is Feasible Without Video Instruction
Tuning,
CVPR24(13658-13667)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Visualization, Costs,
Computational modeling, Graphics processing units,
Video Large Language Models
BibRef
Li, J.X.[Jia-Xuan],
Vo, D.M.[Duc Minh],
Sugimoto, A.[Akihiro],
Nakayama, H.[Hideki],
Evcap: Retrieval-Augmented Image Captioning with External Visual-Name
Memory for Open-World Comprehension,
CVPR24(13733-13742)
IEEE DOI
2410
Training, Visualization, Adaptation models, Costs,
Large language models, Memory management, Image Captioning,
External Memory
BibRef
Song, L.[Lin],
Chen, Y.K.[Yu-Kang],
Yang, S.[Shuai],
Ding, X.H.[Xiao-Han],
Ge, Y.X.[Yi-Xiao],
Chen, Y.C.[Ying-Cong],
Shan, Y.[Ying],
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs,
CVPR24(13763-13773)
IEEE DOI
2410
Training, Attention mechanisms, Computational modeling,
Large language models, Benchmark testing, Natural language processing
BibRef
Guo, Q.[Qiushan],
de Mello, S.[Shalini],
Yin, H.X.[Hong-Xu],
Byeon, W.[Wonmin],
Cheung, K.C.[Ka Chun],
Yu, Y.Z.[Yi-Zhou],
Luo, P.[Ping],
Liu, S.[Sifei],
RegionGPT: Towards Region Understanding Vision Language Model,
CVPR24(13796-13806)
IEEE DOI
2410
Training, Visualization, Large language models, Pipelines,
Training data, Object detection, Cognition
BibRef
Yu, T.Y.[Tian-Yu],
Yao, Y.[Yuan],
Zhang, H.Y.[Hao-Ye],
He, T.[Taiwen],
Han, Y.F.[Yi-Feng],
Cui, G.[Ganqu],
Hu, J.Y.[Jin-Yi],
Liu, Z.Y.[Zhi-Yuan],
Zheng, H.T.[Hai-Tao],
Sun, M.[Maosong],
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-Grained Correctional Human Feedback,
CVPR24(13807-13816)
IEEE DOI
2410
Image segmentation, Accuracy, Large language models,
Computational modeling, Benchmark testing, Cognition, vision,
hallucination
BibRef
Xuan, S.Y.[Shi-Yu],
Guo, Q.[Qingpei],
Yang, M.[Ming],
Zhang, S.L.[Shi-Liang],
Pink: Unveiling the Power of Referential Comprehension for
Multi-modal LLMs,
CVPR24(13838-13848)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Costs, Accuracy, Annotations, Large language models
BibRef
Yu, Q.[Qiying],
Sun, Q.[Quan],
Zhang, X.S.[Xiao-Song],
Cui, Y.F.[Yu-Feng],
Zhang, F.[Fan],
Cao, Y.[Yue],
Wang, X.L.[Xin-Long],
Liu, J.J.[Jing-Jing],
CapsFusion: Rethinking Image-Text Data at Scale,
CVPR24(14022-14032)
IEEE DOI
2410
Training, Knowledge engineering, Scalability,
Large language models, Computational modeling, Noise
BibRef
Yao, J.W.[Jia-Wei],
Qian, Q.[Qi],
Hu, J.[Juhua],
Multi-Modal Proxy Learning Towards Personalized Visual Multiple
Clustering,
CVPR24(14066-14075)
IEEE DOI Code:
WWW Link.
2410
Deep learning, Bridges, Visualization, Codes, Large language models,
Face recognition
BibRef
Zou, B.[Bo],
Yang, C.[Chao],
Qiao, Y.[Yu],
Quan, C.B.[Cheng-Bin],
Zhao, Y.J.[You-Jian],
LLaMA-Excitor: General Instruction Tuning via Indirect Feature
Interaction,
CVPR24(14089-14099)
IEEE DOI Code:
WWW Link.
2410
Visualization, Adaptation models, Codes, Computational modeling,
Benchmark testing, Instruction Tuning, PEFT,
Large Language Model
BibRef
Hong, W.[Wenyi],
Wang, W.H.[Wei-Han],
Lv, Q.S.[Qing-Song],
Xu, J.Z.[Jia-Zheng],
Yu, W.[Wenmeng],
Ji, J.H.[Jun-Hui],
Wang, Y.[Yan],
Wang, Z.[Zihan],
Dong, Y.X.[Yu-Xiao],
Ding, M.[Ming],
Tang, J.[Jie],
CogAgent: A Visual Language Model for GUI Agents,
CVPR24(14281-14290)
IEEE DOI Code:
WWW Link.
2410
Visualization, Limiting, Image resolution, Image recognition,
Navigation, Large language models, Benchmark testing
BibRef
Luo, C.[Chuwei],
Shen, Y.F.[Yu-Fan],
Zhu, Z.Q.[Zhao-Qing],
Zheng, Q.[Qi],
Yu, Z.[Zhi],
Yao, C.[Cong],
LayoutLLM: Layout Instruction Tuning with Large Language Models for
Document Understanding,
CVPR24(15630-15640)
IEEE DOI
2410
Large language models, Layout, Manuals, Inspection, Benchmark testing,
Boosting, Document Understanding, Layout, Large Language Models
BibRef
Yang, Y.[Yue],
Sun, F.Y.[Fan-Yun],
Weihs, L.[Luca],
Vanderbilt, E.[Eli],
Herrasti, A.[Alvaro],
Han, W.[Winson],
Wu, J.J.[Jia-Jun],
Haber, N.[Nick],
Krishna, R.[Ranjay],
Liu, L.J.[Ling-Jie],
Callison-Burch, C.[Chris],
Yatskar, M.[Mark],
Kembhavi, A.[Aniruddha],
Clark, C.[Christopher],
Holodeck: Language Guided Generation of 3D Embodied AI Environments,
CVPR24(16277-16287)
IEEE DOI
2410
Training, Navigation, Large language models, Semantics, Layout, Stars,
Embodied AI, 3D Scene Generation, Language-guided Generation
BibRef
Qin, Y.R.[Yi-Ran],
Zhou, E.[Enshen],
Liu, Q.[Qichang],
Yin, Z.F.[Zhen-Fei],
Sheng, L.[Lu],
Zhang, R.M.[Rui-Mao],
Qiao, Y.[Yu],
Shao, J.[Jing],
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active
Perception,
CVPR24(16307-16316)
IEEE DOI Code:
WWW Link.
2410
Visualization, Large language models, Active perception, Planning,
Compounds
BibRef
Zhang, S.[Sixian],
Yu, X.Y.[Xin-Yao],
Song, X.H.[Xin-Hang],
Wang, X.H.[Xiao-Han],
Jiang, S.Q.[Shu-Qiang],
Imagine Before Go: Self-Supervised Generative Map for Object Goal
Navigation,
CVPR24(16414-16425)
IEEE DOI Code:
WWW Link.
2410
Training, Geometry, Navigation, Large language models, Semantics,
Layout, Self-supervised learning, Embodied AI, Object Goal Navigation
BibRef
Li, H.[Hao],
Yang, X.[Xue],
Wang, Z.K.[Zhao-Kai],
Zhu, X.Z.[Xi-Zhou],
Zhou, J.[Jie],
Qiao, Y.[Yu],
Wang, X.G.[Xiao-Gang],
Li, H.S.[Hong-Sheng],
Lu, L.W.[Le-Wei],
Dai, J.F.[Ji-Feng],
Auto MC-Reward: Automated Dense Reward Design with Large Language
Models for Minecraft,
CVPR24(16426-16435)
IEEE DOI
2410
Learning systems, Codes, Large language models, Lava, Semantics,
Reinforcement learning, Syntactics, Large Language Model, Reward Shaping
BibRef
Liu, M.X.[Ming-Xuan],
Hayes, T.L.[Tyler L.],
Ricci, E.[Elisa],
Csurka, G.[Gabriela],
Volpi, R.[Riccardo],
SHiNe: Semantic Hierarchy Nexus for Open-Vocabulary Object Detection,
CVPR24(16634-16644)
IEEE DOI
2410
Vocabulary, Fuses, Large language models, Semantics, Detectors,
Object detection, Open-vocabulary, Object Detection, Vision-Language
BibRef
Kim, J.[Jooyeon],
Cho, E.[Eulrang],
Kim, S.[Sehyung],
Kim, H.W.J.[Hyun-Woo J.],
Retrieval-Augmented Open-Vocabulary Object Detection,
CVPR24(17427-17436)
IEEE DOI Code:
WWW Link.
2410
Portable media players, Visualization, Vocabulary,
Large language models, Semantics, Detectors, Object detection,
Retrieval-Augmentation
BibRef
Saha, O.[Oindrila],
van Horn, G.[Grant],
Maji, S.[Subhransu],
Improved Zero-Shot Classification by Adapting VLMs with Text
Descriptions,
CVPR24(17542-17552)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Large language models, Habitats,
Benchmark testing, Birds, Zero Shot Learning,
Fine-grained Classification
BibRef
Toubal, I.E.[Imad Eddine],
Avinash, A.[Aditya],
Alldrin, N.G.[Neil Gordon],
Dlabal, J.[Jan],
Zhou, W.[Wenlei],
Luo, E.[Enming],
Stretcu, O.[Otilia],
Xiong, H.[Hao],
Lu, C.T.[Chun-Ta],
Zhou, H.[Howard],
Krishna, R.[Ranjay],
Fuxman, A.[Ariel],
Duerig, T.[Tom],
Modeling Collaborator: Enabling Subjective Vision Classification with
Minimal Human Effort via LLM Tool-Use,
CVPR24(17553-17563)
IEEE DOI
2410
Visualization, Computational modeling, Large language models,
Natural languages, Wildlife, Training data, Manuals, tool-use
BibRef
Han, T.[Tengda],
Bain, M.[Max],
Nagrani, A.[Arsha],
Varol, G.[Gül],
Xie, W.[Weidi],
Zisserman, A.[Andrew],
AutoAD III: The Prequel: Back to the Pixels,
CVPR24(18164-18174)
IEEE DOI
2410
Training, Measurement, Visualization, Large language models,
Current measurement, Training data, Computer architecture
BibRef
Qu, H.X.[Hao-Xuan],
Cai, Y.J.[Yu-Jun],
Liu, J.[Jun],
LLMs are Good Action Recognizers,
CVPR24(18395-18406)
IEEE DOI
2410
Accuracy, Large language models,
Linguistics, Benchmark testing, Skeleton
BibRef
Chen, J.[Joya],
Lv, Z.Y.[Zhao-Yang],
Wu, S.W.[Shi-Wei],
Lin, K.Q.[Kevin Qinghong],
Song, C.[Chenan],
Gao, D.F.[Di-Fei],
Liu, J.W.[Jia-Wei],
Gao, Z.T.[Zi-Teng],
Mao, D.X.[Dong-Xing],
Shou, M.Z.[Mike Zheng],
VideoLLM-online: Online Video Large Language Model for Streaming
Video,
CVPR24(18407-18418)
IEEE DOI
2410
Training, Large language models, Soft sensors, Pipelines,
Streaming media, Rendering (computer graphics), Data models
BibRef
Zhu, A.[Anqi],
Ke, Q.H.[Qiu-Hong],
Gong, M.M.[Ming-Ming],
Bailey, J.[James],
Part-Aware Unified Representation of Language and Skeleton for
Zero-Shot Action Recognition,
CVPR24(18761-18770)
IEEE DOI Code:
WWW Link.
2410
Visualization, Source coding, Large language models,
Natural languages, Skeleton, representation learning
BibRef
Chen, T.J.[Tong-Jia],
Yu, H.S.[Hong-Shan],
Yang, Z.G.[Zhen-Geng],
Li, Z.C.[Ze-Chuan],
Sun, W.[Wei],
Chen, C.[Chen],
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor
for General Video Recognition,
CVPR24(18888-18898)
IEEE DOI
2410
Training, Adaptation models, Visualization, Large language models,
Semantics, Pipelines, Refining, Video Reognition,
Multi-modality Video Understanding
BibRef
Zhao, Q.H.[Qi-Hao],
Dai, Y.[Yalun],
Li, H.[Hao],
Hu, W.[Wei],
Zhang, F.[Fan],
Liu, J.[Jun],
LTGC: Long-Tail Recognition via Leveraging LLMs-Driven Generated
Content,
CVPR24(19510-19520)
IEEE DOI
2410
Semantic segmentation, Large language models,
Computational modeling, Data visualization, Tail, Benchmark testing
BibRef
Siddiqui, Y.[Yawar],
Alliegro, A.[Antonio],
Artemov, A.[Alexey],
Tommasi, T.[Tatiana],
Sirigatti, D.[Daniele],
Rosov, V.[Vladislav],
Dai, A.[Angela],
Nießner, M.[Matthias],
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers,
CVPR24(19615-19625)
IEEE DOI
2410
Geometry, Vocabulary, Solid modeling, Shape, Large language models,
Transformers, Mesh Generation, Generative Models for 3D,
Transformers
BibRef
Li, Z.[Zhe],
Gao, Z.Y.[Zhang-Yang],
Tan, C.[Cheng],
Ren, B.[Bocheng],
Yang, L.T.[Laurence T.],
Li, S.Z.[Stan Z.],
General Point Model Pretraining with Autoencoding and Autoregressive,
CVPR24(20954-20964)
IEEE DOI Code:
WWW Link.
2410
Point cloud compression, Representation learning, Codes,
Large language models, Vector quantization, Computational modeling
BibRef
Li, K.C.[Kun-Chang],
Wang, Y.[Yali],
He, Y.[Yinan],
Li, Y.Z.[Yi-Zhuo],
Wang, Y.[Yi],
Liu, Y.[Yi],
Wang, Z.[Zun],
Xu, J.[Jilan],
Chen, G.[Guo],
Lou, P.[Ping],
Wang, L.M.[Li-Min],
Qiao, Y.[Yu],
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark,
CVPR24(22195-22206)
IEEE DOI Code:
WWW Link.
2410
Training, Systematics, Large language models, Image annotation,
Manuals, Benchmark testing
BibRef
Dunlap, L.[Lisa],
Zhang, Y.H.[Yu-Hui],
Wang, X.H.[Xiao-Han],
Zhong, R.Q.[Rui-Qi],
Darrell, T.J.[Trevor J.],
Steinhardt, J.[Jacob],
Gonzalez, J.E.[Joseph E.],
Yeung-Levy, S.[Serena],
Describing Differences in Image Sets with Natural Language,
CVPR24(24199-24208)
IEEE DOI Code:
WWW Link.
2410
Analytical models, Large language models, Computational modeling,
Natural languages, Human in the loop
BibRef
Ishmam, A.M.[Alvi Md],
Thomas, C.[Christopher],
Semantic Shield: Defending Vision-Language Models Against Backdooring
and Poisoning via Fine-Grained Knowledge Alignment,
CVPR24(24820-24830)
IEEE DOI
2410
Training, Visualization, Correlation, Computational modeling,
Large language models, Semantics, Adversarial attack and defense,
Vision languge model
BibRef
Yang, Y.J.[Yi-Jun],
Zhou, T.Y.[Tian-Yi],
Li, K.[Kanxue],
Tao, D.P.[Da-Peng],
Li, L.[Lusong],
Shen, L.[Li],
He, X.D.[Xiao-Dong],
Jiang, J.[Jing],
Shi, Y.H.[Yu-Hui],
Embodied Multi-Modal Agent trained by an LLM from a Parallel
TextWorld,
CVPR24(26265-26275)
IEEE DOI
2410
Training, Visualization, Imitation learning, Large language models,
Robustness, Reflection, Embodied AI, Large Language Models, Imitation Learning
BibRef
Hong, Y.[Yining],
Zheng, Z.[Zishuo],
Chen, P.H.[Pei-Hao],
Wang, Y.F.[Yi-Fan],
Li, J.[Junyan],
Gan, C.[Chuang],
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model
in 3D World,
CVPR24(26396-26406)
IEEE DOI
2410
Visualization, Correlation, Navigation, Large language models,
Computational modeling
BibRef
Han, J.M.[Jia-Ming],
Gong, K.X.[Kai-Xiong],
Zhang, Y.Y.[Yi-Yuan],
Wang, J.Q.[Jia-Qi],
Zhang, K.[Kaipeng],
Lin, D.[Dahua],
Qiao, Y.[Yu],
Gao, P.[Peng],
Yue, X.Y.[Xiang-Yu],
OneLLM: One Framework to Align All Modalities with Language,
CVPR24(26574-26585)
IEEE DOI Code:
WWW Link.
2410
Point cloud compression, Large language models, Pipelines,
Benchmark testing, Functional magnetic resonance imaging, Routing
BibRef
Xie, H.X.[Hong-Xia],
Peng, C.J.[Chu-Jun],
Tseng, Y.W.[Yu-Wen],
Chen, H.J.[Hung-Jen],
Hsu, C.F.[Chan-Feng],
Shuai, H.H.[Hong-Han],
Cheng, W.H.[Wen-Huang],
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction
Tuning,
CVPR24(26586-26595)
IEEE DOI Code:
WWW Link.
2410
Visualization, Emotion recognition, Large language models,
Pipelines, Benchmark testing, Cognition
BibRef
Wang, X.Y.[Xin-Yu],
Zhuang, B.[Bohan],
Wu, Q.[Qi],
ModaVerse: Efficiently Transforming Modalities with LLMs,
CVPR24(26596-26606)
IEEE DOI Code:
WWW Link.
2410
Training, Adaptation models, Large language models,
Natural languages, Layout, Data models
BibRef
Lin, J.[Ji],
Yin, H.X.[Hong-Xu],
Ping, W.[Wei],
Molchanov, P.[Pavlo],
Shoeybi, M.[Mohammad],
Han, S.[Song],
VILA: On Pre-training for Visual Language Models,
CVPR24(26679-26689)
IEEE DOI
2410
Degradation, Visualization, Accuracy, Large language models,
Benchmark testing, Cognition
BibRef
Lyu, Y.H.[Yuan-Huiyi],
Zheng, X.[Xu],
Zhou, J.Z.[Jia-Zhou],
Wang, L.[Lin],
UniBind: LLM-Augmented Unified and Balanced Representation Space to
Bind Them All,
CVPR24(26742-26752)
IEEE DOI
2410
Point cloud compression, Visualization, Large language models,
Knowledge based systems, Infrared imaging, Contrastive learning,
Data mining
BibRef
Zhu, L.[Lei],
Wei, F.[Fangyun],
Lu, Y.[Yanye],
Beyond Text: Frozen Large Language Models in Visual Signal
Comprehension,
CVPR24(27037-27047)
IEEE DOI Code:
WWW Link.
2410
Visualization, Vocabulary, Image recognition, Large language models,
Semantics, Transforms, Feature extraction, Multi-modal learning
BibRef
Tang, Z.[Zineng],
Yang, Z.[Ziyi],
Khademi, M.[Mahmoud],
Liu, Y.[Yang],
Zhu, C.G.[Chen-Guang],
Bansal, M.[Mohit],
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any
Generation,
CVPR24(27415-27424)
IEEE DOI
2410
Image synthesis, Large language models, Oral communication,
Encoding, Cognition
BibRef
Yuan, Y.Q.[Yu-Qian],
Li, W.[Wentong],
Liu, J.[Jian],
Tang, D.Q.[Dong-Qi],
Luo, X.J.[Xin-Jie],
Qin, C.[Chi],
Zhang, L.[Lei],
Zhu, J.[Jianke],
Osprey: Pixel Understanding with Visual Instruction Tuning,
CVPR24(28202-28211)
IEEE DOI Code:
WWW Link.
2410
Convolutional codes, Visualization, Computational modeling,
Source coding, Large language models, Semantics
BibRef
Zheng, Z.H.[Zhao-Heng],
Wei, J.[Jingmin],
Hu, X.F.[Xue-Feng],
Zhu, H.D.[Hai-Dong],
Nevatia, R.[Ram],
Large Language Models are Good Prompt Learners for Low-Shot Image
Classification,
CVPR24(28453-28462)
IEEE DOI Code:
WWW Link.
2410
Learning systems, Training, Adaptation models, Codes,
Large language models, Computational modeling
BibRef
He, H.Y.[Hao-Yu],
Pan, Z.Z.[Zi-Zheng],
Liu, J.[Jing],
Cai, J.F.[Jian-Fei],
Zhuang, B.[Bohan],
Efficient Stitchable Task Adaptation,
CVPR24(28555-28565)
IEEE DOI Code:
WWW Link.
2410
Training, Deep learning, Adaptation models, Visualization,
Scalability, Pipelines, Memory management, model stitching,
large language model
BibRef
Tian, X.Y.[Xin-Yu],
Zou, S.[Shu],
Yang, Z.Y.[Zhao-Yuan],
Zhang, J.[Jing],
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models,
CVPR24(28578-28587)
IEEE DOI Code:
WWW Link.
2410
Adaptation models, Visualization, Correlation, Computational modeling,
Large language models, Semantics, few-shot adaptation
BibRef
Lv, J.X.[Jia-Xi],
Huang, Y.[Yi],
Yan, M.[Mingfu],
Huang, J.C.[Jian-Cheng],
Liu, J.Z.[Jian-Zhuang],
Liu, Y.F.[Yi-Fan],
Wen, Y.F.[Ya-Fei],
Chen, X.X.[Xiao-Xin],
Chen, S.F.[Shi-Feng],
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation
via Blender-Oriented GPT Planning,
PBDL24(1430-1440)
IEEE DOI Code:
WWW Link.
2410
Image synthesis, Large language models, Text to image, Fluid flow,
Manuals, Diffusion models
BibRef
Wang, J.C.[Jun-Chi],
Ke, L.[Lei],
LLM-Seg: Bridging Image Segmentation and Large Language Model
Reasoning,
WhatNext24(1765-1774)
IEEE DOI Code:
WWW Link.
2410
Training, Image segmentation, Large language models,
Design methodology, Pipelines, Cognition
BibRef
Hakim, Z.I.A.[Zaber Ibn Abdul],
Sarker, N.H.[Najibul Haque],
Singh, R.P.[Rahul Pratap],
Paul, B.[Bishmoy],
Dabouei, A.[Ali],
Xu, M.[Min],
Leveraging Generative Language Models for Weakly Supervised Sentence
Component Analysis in Video-Language Joint Learning,
MULA24(1975-1985)
IEEE DOI
2410
Training, Adaptation models, Statistical analysis,
Large language models, Estimation, Contrastive learning, Distance measurement
BibRef
Deria, A.[Ankan],
Kumar, K.[Komal],
Chakraborty, S.[Snehashis],
Mahapatra, D.[Dwarikanath],
Roy, S.[Sudipta],
InVERGe: Intelligent Visual Encoder for Bridging Modalities in Report
Generation,
MULA24(2028-2038)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Computational modeling, Radiology,
Transformers, Feature extraction, Decoding, Deep Learning,
Large Language Model
BibRef
Arefeen, M.A.[Md Adnan],
Debnath, B.[Biplob],
Uddin, M.Y.S.[Md Yusuf Sarwar],
Chakradhar, S.[Srimat],
ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based
Video Analysis System,
Reasoning24(2266-2274)
IEEE DOI
2410
Accuracy, Large language models, Natural language processing,
Data models, Video Analytics,
Large Language Models (LLMs)
BibRef
Chen, Y.W.[Yu-Wei],
Chu, S.Y.[Shi-Yong],
Large Language Models in Wargaming: Methodology, Application, and
Robustness,
AML24(2894-2903)
IEEE DOI
2410
Navigation, Large language models, Decision making,
Strategic planning, Solids, Robustness, Natural language processing
BibRef
Lai, Z.X.[Zhi-Xin],
Wu, J.[Jing],
Chen, S.[Suiyao],
Zhou, Y.C.[Yu-Cheng],
Hovakimyan, N.[Naira],
Residual-based Language Models are Free Boosters for Biomedical
Imaging Tasks,
DEF-AI-MIA24(5086-5096)
IEEE DOI Code:
WWW Link.
2410
Visualization, Large language models, Fasteners, Transformers,
LLM, Biomedical Imaging
BibRef
Fang, X.[Xi],
Wang, W.G.[Wei-Gang],
Lv, X.X.[Xiao-Xin],
Yan, J.[Jun],
PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt
Condition,
NTIRE24(6167-6176)
IEEE DOI
2410
Image quality, Databases, Large language models, Semantics,
Quality assessment, Ensemble learning, AIGC, multimodal learning
BibRef
Ye, Z.[Zilyu],
Liu, J.X.[Jin-Xiu],
Cao, J.J.[Jin-Jin],
Chen, Z.Y.[Zhi-Yang],
Xuan, Z.W.[Zi-Wei],
Zhou, M.Y.[Ming-Yuan],
Liu, Q.[Qi],
Qi, G.J.[Guo-Jun],
OpenStory: A Large-Scale Open-Domain Dataset for Subject-Driven
Visual Storytelling,
VDU24(7953-7962)
IEEE DOI
2410
Training, Visualization, Annotations, Large language models,
Pipelines, Manuals
BibRef
Chen, X.Y.[Xiang-Yu],
Liu, J.[Jing],
Wang, Y.[Ye],
Wang, P.P.[Pu Perry],
Brand, M.[Matthew],
Wang, G.H.[Guang-Hui],
Koike-Akino, T.[Toshiaki],
SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision
Models,
ECV24(8050-8055)
IEEE DOI
2410
Adaptation models, Tensors, Computational modeling,
Large language models, Transfer learning, parameter efficiency,
low-rank adaptation
BibRef
Wei, C.[Chen],
Liu, C.X.[Chen-Xi],
Qiao, S.Y.[Si-Yuan],
Zhang, Z.S.[Zhi-Shuai],
Yuille, A.L.[Alan L.],
Yu, J.H.[Jia-Hui],
De-Diffusion Makes Text a Strong Cross-Modal Interface,
CVPR24(13492-13503)
IEEE DOI
2410
Large language models, Natural languages, Text to image,
Transforms, Diffusion models, Decoding, Diffusion, Generative Model,
Vision and Language
BibRef
Chen, B.[Boyuan],
Xu, Z.[Zhuo],
Kirmani, S.[Sean],
Ichter, B.[Brian],
Sadigh, D.[Dorsa],
Guibas, L.J.[Leonidas J.],
Xia, F.[Fei],
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning
Capabilities,
CVPR24(14455-14465)
IEEE DOI Code:
WWW Link.
2410
Training, Solid modeling, Visualization, Pipelines, Training data, Cognition,
spatial reasoning, large language model, multimodal, vision language model
BibRef
Dorkenwald, M.[Michael],
Barazani, N.[Nimrod],
Snoek, C.G.M.[Cees G. M.],
Asano, Y.M.[Yuki M.],
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs,
CVPR24(13548-13558)
IEEE DOI
2410
Training, Computational modeling, Machine vision,
Large language models, Pipelines, Pins, Vision-Language Models,
Efficient Adaption of VLMs
BibRef
Sun, Z.Y.[Ze-Yi],
Fang, Y.[Ye],
Wu, T.[Tong],
Zhang, P.[Pan],
Zang, Y.H.[Yu-Hang],
Kong, S.[Shu],
Xiong, Y.J.[Yuan-Jun],
Lin, D.[Dahua],
Wang, J.Q.[Jia-Qi],
Alpha-CLIP: A CLIP Model Focusing on Wherever you Want,
CVPR24(13019-13029)
IEEE DOI Code:
WWW Link.
2410
Point cloud compression, Visualization, Image recognition, Codes,
Large language models, CLIP, Vision-language pretraining, MLLMs
BibRef
Parashar, S.[Shubham],
Lin, Z.Q.[Zhi-Qiu],
Liu, T.[Tian],
Dong, X.J.[Xiang-Jue],
Li, Y.[Yanan],
Ramanan, D.[Deva],
Caverlee, J.[James],
Kong, S.[Shu],
The Neglected Tails in Vision-Language Models,
CVPR24(12988-12997)
IEEE DOI
2410
Training, Visualization, Accuracy, Large language models,
Text to image, Tail, Flowering plants, Vision-Language Models,
Long tailed recognition
BibRef
Luo, Y.[Yan],
Shi, M.[Min],
Khan, M.O.[Muhammad Osama],
Afzal, M.M.[Muhammad Muneeb],
Huang, H.[Hao],
Yuan, S.[Shuaihang],
Tian, Y.[Yu],
Song, L.[Luo],
Kouhana, A.[Ava],
Elze, T.[Tobias],
Fang, Y.[Yi],
Wang, M.Y.[Meng-Yu],
FairCLIP: Harnessing Fairness in Vision-Language Learning,
CVPR24(12289-12301)
IEEE DOI Code:
WWW Link.
2410
Deep learning, Bridges, Analytical models, Ethics, Codes,
Computational modeling, Fairness Learning, Large Language Models
BibRef
Zara, G.[Giacomo],
Conti, A.[Alessandro],
Roy, S.[Subhankar],
Lathuilière, S.[Stéphane],
Rota, P.[Paolo],
Ricci, E.[Elisa],
The Unreasonable Effectiveness of Large Language-Vision Models for
Source-free Video Domain Adaptation,
ICCV23(10273-10283)
IEEE DOI
2401
BibRef
Zhao, H.B.[Hong-Bo],
Ni, B.L.[Bo-Lin],
Fan, J.S.[Jun-Song],
Wang, Y.X.[Yu-Xi],
Chen, Y.T.[Yun-Tao],
Meng, G.F.[Gao-Feng],
Zhang, Z.X.[Zhao-Xiang],
Continual Forgetting for Pre-Trained Vision Models,
CVPR24(28631-28642)
IEEE DOI Code:
WWW Link.
2410
Privacy, Codes, Large language models,
Face recognition, Object detection, Continual Forgetting, Machine Unlearning
BibRef
Zhan, X.Y.[Xin-Yu],
Yang, L.X.[Li-Xin],
Zhao, Y.F.[Yi-Fei],
Mao, K.[Kangrui],
Xu, H.L.[Han-Lin],
Lin, Z.[Zenan],
Li, K.L.[Kai-Lin],
Lu, C.[Cewu],
OakInk2: A Dataset of Bimanual Hands-Object Manipulation in Complex
Task Completion,
CVPR24(445-456)
IEEE DOI Code:
WWW Link.
2410
Annotations, Affordances, Computational modeling,
Large language models, Decoding
BibRef
Li, Y.C.[Yi-Cong],
Zhao, N.[Na],
Xiao, J.B.[Jun-Bin],
Feng, C.[Chun],
Wang, X.[Xiang],
Chua, T.S.[Tat-Seng],
LASO: Language-Guided Affordance Segmentation on 3D Object,
CVPR24(14251-14260)
IEEE DOI Code:
WWW Link.
2410
Visualization, Solid modeling, Shape, Affordances,
Large language models, Semantics, Multimodal, 3D-Language, Vision-Language
BibRef
Rotstein, N.[Noam],
Bensaïd, D.[David],
Brody, S.[Shaked],
Ganz, R.[Roy],
Kimmel, R.[Ron],
FuseCap: Leveraging Large Language Models for Enriched Fused Image
Captions,
WACV24(5677-5688)
IEEE DOI
2404
Training, Surveys, Visualization, Fuses,
Optical character recognition, Training data, Algorithms,
Image recognition and understanding
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Multi-Modal, Multimodal Large Language Models for Vision, LLM .