13.3.14 Foundation Models, Graph Foundation Models

Chapter Contents (Back)
Foundation Models. Graph Foundation Models.
See also Vision-Language Models, Language-Vision Models, VQA.
See also Large Language Models for Vision, LLM, LVLM.

Zhang, Z.C.[Zi-Cheng], Wu, H.N.[Hao-Ning], Zhang, E.[Erli], Zhai, G.T.[Guang-Tao], Lin, W.S.[Wei-Si],
Q-Bench+: A Benchmark for Multi-Modal Foundation Models on Low-Level Vision From Single Images to Pairs,
PAMI(46), No. 12, December 2024, pp. 10404-10418.
IEEE DOI 2411
Visualization, Benchmark testing, Task analysis, Natural languages, Visual perception, Large language models, perception BibRef

Yu, T.[Ting], Fu, K.[Kunhao], Wang, S.H.[Shu-Hui], Huang, Q.M.[Qing-Ming], Yu, J.[Jun],
Prompting Video-Language Foundation Models With Domain-Specific Fine-Grained Heuristics for Video Question Answering,
CirSysVideo(35), No. 2, February 2025, pp. 1615-1630.
IEEE DOI 2502
Cognition, Computational modeling, Visualization, Context modeling, Data models, Adaptation models, Accuracy, context-aware reasoning BibRef

Hong, D.F.[Dan-Feng], Zhang, B.[Bing], Li, X.Y.[Xu-Yang], Li, Y.X.[Yu-Xuan], Li, C.Y.[Chen-Yu], Yao, J.[Jing], Yokoya, N.[Naoto], Li, H.[Hao], Ghamisi, P.[Pedram], Jia, X.P.[Xiu-Ping], Plaza, A.[Antonio], Gamba, P.[Paolo], Benediktsson, J.A.[Jon Atli], Chanussot, J.[Jocelyn],
SpectralGPT: Spectral Remote Sensing Foundation Model,
PAMI(46), No. 8, August 2024, pp. 5227-5244.
IEEE DOI 2407
Data models, Task analysis, Computational modeling, Transformers, Image reconstruction, Visualization, Artificial intelligence, transformer BibRef

Li, X.Y.[Xu-Yang], Hong, D.F.[Dan-Feng], Chanussot, J.[Jocelyn],
S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data,
CVPR24(27696-27705)
IEEE DOI 2410
Training, Adaptation models, Solid modeling, Sequential analysis, Computational modeling, Remote Sensing, Foundation Models BibRef

Li, C.Y.[Chun-Yuan], Gan, Z.[Zhe], Yang, Z.Y.[Zheng-Yuan], Yang, J.W.[Jian-Wei], Li, L.J.[Lin-Jie], Wang, L.J.[Li-Juan], Gao, J.F.[Jian-Feng],
Multimodal Foundation Models: From Specialists to General-Purpose Assistants,
FTCGV(16), No. 1-2, 2024, pp. 1-214.
DOI Link 2405
Perception and user interface. BibRef

Liu, J.W.[Jia-Wei], Yang, C.[Cheng], Lu, Z.Y.[Zhi-Yuan], Chen, J.[Junze], Li, Y.[Yibo], Zhang, M.[Mengmei], Bai, T.[Ting], Fang, Y.[Yuan], Sun, L.C.[Li-Chao], Yu, P.S.[Philip S.], Shi, C.[Chuan],
Graph Foundation Models: Concepts, Opportunities and Challenges,
PAMI(47), No. 6, June 2025, pp. 5023-5044.
IEEE DOI 2505
Graph Foundation Models. Foundation models, Adaptation models, Data models, Artificial intelligence, Semantics, Transformers, Training, Surveys, large language models BibRef

Wu, J.W.[Jing-Wei], Huang, Z.W.[Zhe-Wei], Liu, C.[Chang],
Advancing video self-supervised learning via image foundation models,
PRL(192), 2025, pp. 22-28.
Elsevier DOI Code:
WWW Link. 2505
Video self-supervised learning, Video representation model, Image foundation model BibRef

Hu, M.Y.[Min-Yang], Chang, H.[Hong], Shan, S.G.[Shi-Guang], Chen, X.L.[Xi-Lin],
Inference Calibration of Vision-Language Foundation Models for Zero-Shot and Few-Shot Learning,
PRL(192), 2025, pp. 15-21.
Elsevier DOI 2505
Vision-Language Model, Contrastive Language-Image Pre-training, Zero/Few-Shot Learning BibRef

Awais, M.[Muhammad], Naseer, M.[Muzammal], Khan, S.[Salman], Anwer, R.M.[Rao Muhammad], Cholakkal, H.[Hisham], Shah, M.[Mubarak], Yang, M.H.[Ming-Hsuan], Khan, F.S.[Fahad Shahbaz],
Foundation Models Defining a New Era in Vision: A Survey and Outlook,
PAMI(47), No. 4, April 2025, pp. 2245-2264.
IEEE DOI 2503
Adaptation models, Computational modeling, Foundation models, Data models, Surveys, Visualization, Reviews, self-supervised learning BibRef

Luo, J.J.[Jian-Jie], Li, Y.[Yehao], Pan, Y.W.[Ying-Wei], Yao, T.[Ting], Feng, J.L.[Jian-Lin], Chao, H.Y.[Hong-Yang], Mei, T.[Tao],
Exploring Vision-Language Foundation Model for Novel Object Captioning,
CirSysVideo(35), No. 1, January 2025, pp. 91-102.
IEEE DOI 2502
Training, Decoding, Visualization, Transformers, Semantics, Search problems, Task analysis, Novel object captioning, CLIP BibRef

Chettaoui, T.[Tahar], Damer, N.[Naser], Boutros, F.[Fadi],
FRoundation: Are foundation models ready for face recognition?,
IVC(156), 2025, pp. 105453.
Elsevier DOI Code:
WWW Link. 2503
Face recognition, Foundation models, Computer vision, Face and gesture analyses, Human analyses, Fine-tuning BibRef

Zhang, J.X.[Jing-Xuan], Wan, G.[Genshun], Gao, J.Q.[Jian-Qing], Ling, Z.H.[Zhen-Hua],
Audio-visual representation learning via knowledge distillation from speech foundation models,
PR(162), 2025, pp. 111432.
Elsevier DOI 2503
Representation learning, Knowledge distillation, Speech foundation model, Lipreading, Audio-visual speech recognition BibRef

Tang, L.[Lv], Jiang, P.T.[Peng-Tao], Xiao, H.[Haoke], Li, B.[Bo],
Towards Training-Free Open-World Segmentation via Image Prompt Foundation Models,
IJCV(133), No. 1, January 2025, pp. 1-15.
Springer DOI 2501
BibRef

Chen, H.[Hong], Wang, X.[Xin], Zeng, G.[Guanning], Zhang, Y.P.[Yi-Peng], Zhou, Y.W.[Yu-Wei], Han, F.[Feilin], Wu, Y.F.[Yao-Fei], Zhu, W.W.[Wen-Wu],
VideoDreamer: Customized Multi-Subject Text-to-Video Generation With Disen-Mix Finetuning on Language-Video Foundation Models,
MultMed(27), 2025, pp. 2875-2885.
IEEE DOI 2506
Text to video, Dogs, Codes, Visualization, Generators, Text to image, Diffusion models, Hands, Training, Noise measurement, Text-to-video, foundation model finetuning BibRef

Wang, D.[Di], Hu, M.[Meiqi], Jin, Y.[Yao], Miao, Y.C.[Yu-Chun], Yang, J.Q.[Jia-Qi], Xu, Y.C.[Yi-Chu], Qin, X.L.[Xiao-Lei], Ma, J.Q.[Jia-Qi], Sun, L.Y.[Ling-Yu], Li, C.X.[Chen-Xing], Fu, C.[Chuan], Chen, H.[Hongruixuan], Han, C.X.[Cheng-Xi], Yokoya, N.[Naoto], Zhang, J.[Jing], Xu, M.Q.[Min-Qiang], Liu, L.[Lin], Zhang, L.[Lefei], Wu, C.[Chen], Du, B.[Bo], Tao, D.C.[Da-Cheng], Zhang, L.P.[Liang-Pei],
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model,
PAMI(47), No. 8, August 2025, pp. 6427-6444.
IEEE DOI 2507
Hyperspectral imaging, Foundation models, Transformers, Feature extraction, Training, Satellites, Computational modeling, large-scale dataset BibRef


Yang, J.[Jiange], Tan, W.H.[Wen-Hui], Jin, C.[Chuhao], Yao, K.[Keling], Liu, B.[Bei], Fu, J.L.[Jian-Long], Song, R.H.[Rui-Hua], Wu, G.S.[Gang-Shan], Wang, L.M.[Li-Min],
Transferring Foundation Models for Generalizable Robotic Manipulation,
WACV25(1999-2010)
IEEE DOI Code:
WWW Link. 2505
Geometry, Correlation, Limiting, Foundation models, Imitation learning, Computational modeling, Semantics. BibRef

Singh, J.[Jaisidh], Shrivastava, I.[Ishaan], Vatsa, M.[Mayank], Singh, R.[Richa], Bharati, A.[Aparna],
Learning the Power of 'No': Foundation Models with Negations,
WACV25(8002-8012)
IEEE DOI Code:
WWW Link. 2505
Training, Foundation models, Large language models, Face recognition, Semantics, Natural languages, Focusing BibRef

Rongali, S.B.[Sai Bhargav], C, M.H.N.[Mohamad Hassan N], Jha, A.[Ankit], Bhargava, N.[Neha], Prasad, S.[Saurabh], Banerjee, B.[Biplab],
Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering,
WACV25(9269-9279)
IEEE DOI 2505
Adaptation models, Visualization, Technological innovation, Grounding, Foundation models, Computational modeling, Semantics, video question answering BibRef

Ranzinger, M.[Mike], Heinrich, G.[Greg], Kautz, J.[Jan], Molchanov, P.[Pavlo],
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One,
CVPR24(12490-12500)
IEEE DOI Code:
WWW Link. 2410
Training, Vocabulary, Visualization, Image resolution, Semantic segmentation, Pipelines, Visual Foundation Model, Zero Shot Classification BibRef

Li, S.[Shikai], Fu, J.L.[Jiang-Lin], Liu, K.Y.[Kai-Yuan], Wang, W.T.[Wen-Tao], Lin, K.Y.[Kwan-Yee], Wu, W.[Wayne],
CosmicMan: A Text-to-Image Foundation Model for Humans,
CVPR24(6955-6965)
IEEE DOI Code:
WWW Link. 2410
Training, Heart, Image resolution, Annotations, Text to image, Production, Data models, Human Generation, foundation model BibRef

Li, J.[Junyi], Wu, J.F.[Jun-Feng], Zhao, W.Z.[Wei-Zhi], Bai, S.[Song], Bai, X.[Xiang],
Partglee: A Foundation Model for Recognizing and Parsing Any Objects,
ECCV24(LXXV: 475-494).
Springer DOI 2412
BibRef

Guo, X.[Xin], Lao, J.W.[Jiang-Wei], Dang, B.[Bo], Zhang, Y.Y.[Ying-Ying], Yu, L.[Lei], Ru, L.X.[Li-Xiang], Zhong, L.[Liheng], Huang, Z.Y.[Zi-Yuan], Wu, K.[Kang], Hu, D.X.[Ding-Xiang], He, H.M.[Hui-Mei], Wang, J.[Jian], Chen, J.D.[Jing-Dong], Yang, M.[Ming], Zhang, Y.J.[Yong-Jun], Li, Y.S.[Yan-Sheng],
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery,
CVPR24(27662-27673)
IEEE DOI 2410
Earth, Location awareness, Prototypes, Contrastive learning, Spatiotemporal phenomena, Satellite images, Remote Sensing, Multi-modal Foundation Model BibRef

Wu, J.F.[Jun-Feng], Jiang, Y.[Yi], Liu, Q.H.[Qi-Hao], Yuan, Z.H.[Ze-Huan], Bai, X.[Xiang], Bai, S.[Song],
General Object Foundation Model for Images and Videos at Scale,
CVPR24(3783-3795)
IEEE DOI Code:
WWW Link. 2410
Training, Visualization, Image segmentation, Grounding, Soft sensors, Large language models BibRef

Lei, T.[Ting], Yin, S.F.[Shao-Feng], Liu, Y.[Yang],
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection,
CVPR24(16657-16667)
IEEE DOI Code:
WWW Link. 2410
Vocabulary, Correlation, Large language models, Semantics, Natural languages, Detectors BibRef

Zhang, H.J.[Hao-Jie], Su, Y.Y.[Yong-Yi], Xu, X.[Xun], Jia, K.[Kui],
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation,
CVPR24(23385-23395)
IEEE DOI 2410
Image segmentation, Costs, Large language models, Robustness, Computational efficiency, Domain Adaptation, Weakly Supervised Adaptation BibRef

Wu, H.N.[Hao-Ning], Zhang, Z.C.[Zi-Cheng], Zhang, E.[Erli], Chen, C.F.[Chao-Feng], Liao, L.[Liang], Wang, A.[Annan], Xu, K.X.[Kai-Xin], Li, C.Y.[Chun-Yi], Hou, J.W.[Jing-Wen], Zhai, G.T.[Guang-Tao], Xue, G.[Geng], Sun, W.X.[Wen-Xiu], Yan, Q.[Qiong], Lin, W.S.[Wei-Si],
Q-Instruct: Improving Low-Level Visual Abilities for Multi-Modality Foundation Models,
CVPR24(25490-25500)
IEEE DOI 2410
Visualization, Accuracy, Large language models, Natural languages, Solids, Quality assessment BibRef

Han, G.X.[Guang-Xing], Lim, S.N.[Ser-Nam],
Few-Shot Object Detection with Foundation Models,
CVPR24(28608-28618)
IEEE DOI 2410
Training, Visualization, Large language models, Computational modeling, Object detection, Benchmark testing, Large Language Model BibRef

Chen, Z.[Zhe], Wu, J.N.[Jian-Nan], Wang, W.H.[Wen-Hai], Su, W.J.[Wei-Jie], Chen, G.[Guo], Xing, S.[Sen], Zhong, M.[Muyan], Zhang, Q.L.[Qing-Long], Zhu, X.Z.[Xi-Zhou], Lu, L.W.[Le-Wei], Li, B.[Bin], Luo, P.[Ping], Lu, T.[Tong], Qiao, Y.[Yu], Dai, J.F.[Ji-Feng],
Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks,
CVPR24(24185-24198)
IEEE DOI 2410
Training, Visualization, Image recognition, Large language models, Data models, Question answering (information retrieval), vision-language model BibRef

Stevens, S.[Samuel], Wu, J.[Jiaman], Thompson, M.J.[Matthew J], Campolongo, E.G.[Elizabeth G], Song, C.H.[Chan Hee], Carlyn, D.E.[David Edward], Dong, L.[Li], Dahdul, W.M.[Wasila M], Stewart, C.[Charles], Berger-Wolf, T.[Tanya], Chao, W.L.[Wei-Lun], Su, Y.[Yu],
BioCLIP: A Vision Foundation Model for the Tree of Life,
CVPR24(19412-19424)
IEEE DOI 2410
Fungi, Visualization, Biological system modeling, Plants (biology), Vegetation, Data mining, machine learning, imageomics, evolutionary biology & ecology BibRef

Li, G.[Gen], Sun, D.Q.[De-Qing], Sevilla-Lara, L.[Laura], Jampani, V.[Varun],
One-Shot Open Affordance Learning with Foundation Models,
CVPR24(3086-3096)
IEEE DOI Code:
WWW Link. 2410
Visualization, Affordances, Training data, Benchmark testing, Data models, Foundation Models, Vision-Language Models BibRef

Ma, Z.X.[Zi-Xian], Hong, J.[Jerry], Gul, M.O.[Mustafa Omer], Gandhi, M.[Mona], Gao, I.[Irena], Krishna, R.[Ranjay],
@ CREPE: Can Vision-Language Foundation Models Reason Compositionally?,
CVPR23(10910-10921)
IEEE DOI 2309
BibRef

Sun, X.M.[Xi-Meng], Zhang, P.C.[Peng-Chuan], Zhang, P.Z.[Pei-Zhao], Shah, H.[Hardik], Saenko, K.[Kate], Xia, X.[Xide],
DIME-FM: DIstilling Multimodal and Efficient Foundation Models,
ICCV23(15475-15487)
IEEE DOI 2401
BibRef

Majumdar, A.[Arjun], Ajay, A.[Anurag], Zhang, X.H.[Xiao-Han], Putta, P.[Pranav], Yenamandra, S.[Sriram], Henaff, M.[Mikael], Silwal, S.[Sneha], Mcvay, P.[Paul], Maksymets, O.[Oleksandr], Arnaud, S.[Sergio], Yadav, K.[Karmesh], Li, Q.[Qiyang], Newman, B.[Ben], Sharma, M.[Mohit], Berges, V.[Vincent], Zhang, S.Q.[Shi-Qi], Agrawal, P.[Pulkit], Bisk, Y.[Yonatan], Batra, D.[Dhruv], Kalakrishnan, M.[Mrinal], Meier, F.[Franziska], Paxton, C.[Chris], Sax, A.[Alexander], Rajeswaran, A.[Aravind],
OpenEQA: Embodied Question Answering in the Era of Foundation Models,
CVPR24(16488-16498)
IEEE DOI 2410
Protocols, Natural languages, Semantics, Benchmark testing, Question answering (information retrieval), Vision-Language Models BibRef

Slyman, E.[Eric], Lee, S.[Stefan], Cohen, S.[Scott], Kafle, K.[Kushal],
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication,
CVPR24(13905-13916)
IEEE DOI 2410
Training, Measurement, Costs, Semantics, Skin, Data models, multimodal, fairness, vision-language, foundation models, human-centered ai, deduplication BibRef

El Banani, M.[Mohamed], Raj, A.[Amit], Maninis, K.K.[Kevis-Kokitsi], Kar, A.[Abhishek], Li, Y.Z.[Yuan-Zhen], Rubinstein, M.[Michael], Sun, D.Q.[De-Qing], Guibas, L.J.[Leonidas J.], Johnson, J.[Justin], Jampani, V.[Varun],
Probing the 3D Awareness of Visual Foundation Models,
CVPR24(21795-21806)
IEEE DOI Code:
WWW Link. 2410
Training, Visualization, Solid modeling, Image segmentation, Codes, Representation Learning, 3D Vision, Foundation Models, 3D Awareness BibRef

Taher, M.R.H.[Mohammad Reza Hosseinzadeh], Gotway, M.B.[Michael B.], Liang, J.M.[Jian-Ming],
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision,
CVPR24(11269-11281)
IEEE DOI 2410
Deep learning, Visualization, Image coding, Semantics, Anatomical structure, Self-supervised learning BibRef

Hong, L.Y.[Ling-Yi], Yan, S.L.[Shi-Lin], Zhang, R.R.[Ren-Rui], Li, W.Y.[Wan-Yun], Zhou, X.Y.[Xin-Yu], Guo, P.[Pinxue], Jiang, K.X.[Kai-Xun], Chen, Y.T.[Yi-Ting], Li, J.L.[Jing-Lun], Chen, Z.Y.[Zhao-Yu], Zhang, W.Q.[Wen-Qiang],
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning,
CVPR24(19079-19091)
IEEE DOI 2410
Location awareness, Visualization, Adaptation models, Target tracking, Benchmark testing, Multi-Modality BibRef

Zhong, F.W.[Fang-Wei], Wu, K.[Kui], Ci, H.[Hai], Wang, C.[Churan], Chen, H.[Hao],
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL,
ECCV24(LXXIII: 139-155).
Springer DOI 2412
Project:
WWW Link. BibRef

Wang, Y.[Yi], Li, K.C.[Kun-Chang], Li, X.H.[Xin-Hao], Yu, J.S.[Jia-Shuo], He, Y.[Yinan], Chen, G.[Guo], Pei, B.Q.[Bao-Qi], Zheng, R.K.[Rong-Kun], Wang, Z.[Zun], Shi, Y.S.[Yan-Song], Jiang, T.X.[Tian-Xiang], Li, S.Z.[Song-Ze], Xu, J.[Jilan], Zhang, H.J.[Hong-Jie], Huang, Y.F.[Yi-Fei], Qiao, Y.[Yu], Wang, Y.[Yali], Wang, L.M.[Li-Min],
Internvideo2: Scaling Foundation Models for Multimodal Video Understanding,
ECCV24(LXXXV: 396-416).
Springer DOI 2412
BibRef

Tian, Y.[Yuan], Lu, G.[Guo], Zhai, G.T.[Guang-Tao],
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression,
ECCV24(XLIX: 163-183).
Springer DOI 2412
BibRef

Tian, Y.[Yuan], Lu, G.[Guo], Zhai, G.T.[Guang-Tao], Gao, Z.Y.[Zhi-Yong],
Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression,
ICCV23(13564-13576)
IEEE DOI 2401
BibRef

Zhang, C.[Chenhui], Wang, S.[Sherrie],
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data,
EarthVision24(7839-7849)
IEEE DOI 2410
Location awareness, Earth, Visualization, Satellites, Natural languages, Training data, Land surface, foundation model, benchmark BibRef

Gao, Z.T.[Zi-Teng], Tong, Z.[Zhan], Lin, K.Q.[Kevin Qinghong], Chen, J.[Joya], Shou, M.Z.[Mike Zheng],
Bootstrapping SparseFormers from Vision Foundation Models,
CVPR24(17710-17721)
IEEE DOI Code:
WWW Link. 2410
Training, Visualization, Accuracy, Semantic segmentation, Transformers BibRef

Chen, F.[Feng], Giuffrida, M.V.[Mario Valerio], Tsaftaris, S.A.[Sotirios A.],
Adapting Vision Foundation Models for Plant Phenotyping,
CVPPA23(604-613)
IEEE DOI 2401
BibRef

Wang, T.[Tan], Lin, K.[Kevin], Li, L.J.[Lin-Jie], Lin, C.C.[Chung-Ching], Yang, Z.Y.[Zheng-Yuan], Zhang, H.W.[Han-Wang], Liu, Z.C.[Zi-Cheng], Wang, L.J.[Li-Juan],
Equivariant Similarity for Vision-Language Foundation Models,
ICCV23(11964-11974)
IEEE DOI 2401
BibRef

Ge, Y.Y.[Yu-Ying], Macaluso, A.[Annabella], Li, L.E.[Li Erran], Luo, P.[Ping], Wang, X.L.[Xiao-Long],
Policy Adaptation from Foundation Model Feedback,
CVPR23(19059-19069)
IEEE DOI 2309
BibRef

Dombrowski, M.[Mischa], Reynaud, H.[Hadrien], Baugh, M.[Matthew], Kainz, B.[Bernhard],
Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models,
ICCV23(988-998)
IEEE DOI Code:
WWW Link. 2401
BibRef

Salin, E.[Emmanuelle], Ayache, S.[Stéphane], Favre, B.[Benoit],
Towards an Exhaustive Evaluation of Vision-Language Foundation Models,
MMFM23(339-352)
IEEE DOI 2401
BibRef

Wang, W.H.[Wen-Hai], Dai, J.F.[Ji-Feng], Chen, Z.[Zhe], Huang, Z.H.[Zhen-Hang], Li, Z.Q.[Zhi-Qi], Zhu, X.Z.[Xi-Zhou], Hu, X.W.[Xiao-Wei], Lu, T.[Tong], Lu, L.W.[Le-Wei], Li, H.S.[Hong-Sheng], Wang, X.G.[Xiao-Gang], Qiao, Y.[Yu],
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions,
CVPR23(14408-14419)
IEEE DOI 2309
BibRef

Shin, G.[Gyungin], Xie, W.[Weidi], Albanie, S.[Samuel],
NamedMask: Distilling Segmenters from Complementary Foundation Models,
L3D-IVU23(4961-4970)
IEEE DOI 2309
BibRef

Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Object Recognition, General Techniques .


Last update:Jul 7, 2025 at 14:35:55