Zhang, Z.C.[Zi-Cheng],
Wu, H.N.[Hao-Ning],
Zhang, E.[Erli],
Zhai, G.T.[Guang-Tao],
Lin, W.S.[Wei-Si],
Q-Bench+: A Benchmark for Multi-Modal Foundation Models on Low-Level
Vision From Single Images to Pairs,
PAMI(46), No. 12, December 2024, pp. 10404-10418.
IEEE DOI
2411
Visualization, Benchmark testing, Task analysis, Natural languages,
Visual perception, Large language models, perception
BibRef
Yu, T.[Ting],
Fu, K.[Kunhao],
Wang, S.H.[Shu-Hui],
Huang, Q.M.[Qing-Ming],
Yu, J.[Jun],
Prompting Video-Language Foundation Models With Domain-Specific
Fine-Grained Heuristics for Video Question Answering,
CirSysVideo(35), No. 2, February 2025, pp. 1615-1630.
IEEE DOI
2502
Cognition, Computational modeling, Visualization, Context modeling,
Data models, Adaptation models, Accuracy,
context-aware reasoning
BibRef
Hong, D.F.[Dan-Feng],
Zhang, B.[Bing],
Li, X.Y.[Xu-Yang],
Li, Y.X.[Yu-Xuan],
Li, C.Y.[Chen-Yu],
Yao, J.[Jing],
Yokoya, N.[Naoto],
Li, H.[Hao],
Ghamisi, P.[Pedram],
Jia, X.P.[Xiu-Ping],
Plaza, A.[Antonio],
Gamba, P.[Paolo],
Benediktsson, J.A.[Jon Atli],
Chanussot, J.[Jocelyn],
SpectralGPT: Spectral Remote Sensing Foundation Model,
PAMI(46), No. 8, August 2024, pp. 5227-5244.
IEEE DOI
2407
Data models, Task analysis, Computational modeling, Transformers,
Image reconstruction, Visualization, Artificial intelligence, transformer
BibRef
Li, X.Y.[Xu-Yang],
Hong, D.F.[Dan-Feng],
Chanussot, J.[Jocelyn],
S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral
Remote Sensing Data,
CVPR24(27696-27705)
IEEE DOI
2410
Training, Adaptation models, Solid modeling, Sequential analysis,
Computational modeling, Remote Sensing, Foundation Models
BibRef
Li, C.Y.[Chun-Yuan],
Gan, Z.[Zhe],
Yang, Z.Y.[Zheng-Yuan],
Yang, J.W.[Jian-Wei],
Li, L.J.[Lin-Jie],
Wang, L.J.[Li-Juan],
Gao, J.F.[Jian-Feng],
Multimodal Foundation Models:
From Specialists to General-Purpose Assistants,
FTCGV(16), No. 1-2, 2024, pp. 1-214.
DOI Link
2405
Perception and user interface.
BibRef
Liu, J.W.[Jia-Wei],
Yang, C.[Cheng],
Lu, Z.Y.[Zhi-Yuan],
Chen, J.[Junze],
Li, Y.[Yibo],
Zhang, M.[Mengmei],
Bai, T.[Ting],
Fang, Y.[Yuan],
Sun, L.C.[Li-Chao],
Yu, P.S.[Philip S.],
Shi, C.[Chuan],
Graph Foundation Models: Concepts, Opportunities and Challenges,
PAMI(47), No. 6, June 2025, pp. 5023-5044.
IEEE DOI
2505
Graph Foundation Models.
Foundation models, Adaptation models, Data models, Artificial intelligence,
Semantics, Transformers, Training, Surveys, large language models
BibRef
Wu, J.W.[Jing-Wei],
Huang, Z.W.[Zhe-Wei],
Liu, C.[Chang],
Advancing video self-supervised learning via image foundation models,
PRL(192), 2025, pp. 22-28.
Elsevier DOI Code:
WWW Link.
2505
Video self-supervised learning, Video representation model,
Image foundation model
BibRef
Hu, M.Y.[Min-Yang],
Chang, H.[Hong],
Shan, S.G.[Shi-Guang],
Chen, X.L.[Xi-Lin],
Inference Calibration of Vision-Language Foundation Models for
Zero-Shot and Few-Shot Learning,
PRL(192), 2025, pp. 15-21.
Elsevier DOI
2505
Vision-Language Model,
Contrastive Language-Image Pre-training, Zero/Few-Shot Learning
BibRef
Awais, M.[Muhammad],
Naseer, M.[Muzammal],
Khan, S.[Salman],
Anwer, R.M.[Rao Muhammad],
Cholakkal, H.[Hisham],
Shah, M.[Mubarak],
Yang, M.H.[Ming-Hsuan],
Khan, F.S.[Fahad Shahbaz],
Foundation Models Defining a New Era in Vision: A Survey and Outlook,
PAMI(47), No. 4, April 2025, pp. 2245-2264.
IEEE DOI
2503
Adaptation models, Computational modeling, Foundation models,
Data models, Surveys, Visualization, Reviews,
self-supervised learning
BibRef
Luo, J.J.[Jian-Jie],
Li, Y.[Yehao],
Pan, Y.W.[Ying-Wei],
Yao, T.[Ting],
Feng, J.L.[Jian-Lin],
Chao, H.Y.[Hong-Yang],
Mei, T.[Tao],
Exploring Vision-Language Foundation Model for Novel Object
Captioning,
CirSysVideo(35), No. 1, January 2025, pp. 91-102.
IEEE DOI
2502
Training, Decoding, Visualization, Transformers, Semantics,
Search problems, Task analysis, Novel object captioning, CLIP
BibRef
Chettaoui, T.[Tahar],
Damer, N.[Naser],
Boutros, F.[Fadi],
FRoundation: Are foundation models ready for face recognition?,
IVC(156), 2025, pp. 105453.
Elsevier DOI Code:
WWW Link.
2503
Face recognition, Foundation models, Computer vision,
Face and gesture analyses, Human analyses, Fine-tuning
BibRef
Zhang, J.X.[Jing-Xuan],
Wan, G.[Genshun],
Gao, J.Q.[Jian-Qing],
Ling, Z.H.[Zhen-Hua],
Audio-visual representation learning via knowledge distillation from
speech foundation models,
PR(162), 2025, pp. 111432.
Elsevier DOI
2503
Representation learning, Knowledge distillation,
Speech foundation model, Lipreading, Audio-visual speech recognition
BibRef
Tang, L.[Lv],
Jiang, P.T.[Peng-Tao],
Xiao, H.[Haoke],
Li, B.[Bo],
Towards Training-Free Open-World Segmentation via Image Prompt
Foundation Models,
IJCV(133), No. 1, January 2025, pp. 1-15.
Springer DOI
2501
BibRef
Chen, H.[Hong],
Wang, X.[Xin],
Zeng, G.[Guanning],
Zhang, Y.P.[Yi-Peng],
Zhou, Y.W.[Yu-Wei],
Han, F.[Feilin],
Wu, Y.F.[Yao-Fei],
Zhu, W.W.[Wen-Wu],
VideoDreamer: Customized Multi-Subject Text-to-Video Generation With
Disen-Mix Finetuning on Language-Video Foundation Models,
MultMed(27), 2025, pp. 2875-2885.
IEEE DOI
2506
Text to video, Dogs, Codes, Visualization, Generators, Text to image,
Diffusion models, Hands, Training, Noise measurement, Text-to-video,
foundation model finetuning
BibRef
Wang, D.[Di],
Hu, M.[Meiqi],
Jin, Y.[Yao],
Miao, Y.C.[Yu-Chun],
Yang, J.Q.[Jia-Qi],
Xu, Y.C.[Yi-Chu],
Qin, X.L.[Xiao-Lei],
Ma, J.Q.[Jia-Qi],
Sun, L.Y.[Ling-Yu],
Li, C.X.[Chen-Xing],
Fu, C.[Chuan],
Chen, H.[Hongruixuan],
Han, C.X.[Cheng-Xi],
Yokoya, N.[Naoto],
Zhang, J.[Jing],
Xu, M.Q.[Min-Qiang],
Liu, L.[Lin],
Zhang, L.[Lefei],
Wu, C.[Chen],
Du, B.[Bo],
Tao, D.C.[Da-Cheng],
Zhang, L.P.[Liang-Pei],
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model,
PAMI(47), No. 8, August 2025, pp. 6427-6444.
IEEE DOI
2507
Hyperspectral imaging, Foundation models, Transformers,
Feature extraction, Training, Satellites, Computational modeling,
large-scale dataset
BibRef
Singh, J.[Jaisidh],
Shrivastava, I.[Ishaan],
Vatsa, M.[Mayank],
Singh, R.[Richa],
Bharati, A.[Aparna],
Learning the Power of 'No': Foundation Models with Negations,
WACV25(8002-8012)
IEEE DOI Code:
WWW Link.
2505
Training, Foundation models, Large language models,
Face recognition, Semantics, Natural languages, Focusing
BibRef
Rongali, S.B.[Sai Bhargav],
C, M.H.N.[Mohamad Hassan N],
Jha, A.[Ankit],
Bhargava, N.[Neha],
Prasad, S.[Saurabh],
Banerjee, B.[Biplab],
Foundation Models and Adaptive Feature Selection:
A Synergistic Approach to Video Question Answering,
WACV25(9269-9279)
IEEE DOI
2505
Adaptation models, Visualization, Technological innovation,
Grounding, Foundation models, Computational modeling, Semantics,
video question answering
BibRef
Ranzinger, M.[Mike],
Heinrich, G.[Greg],
Kautz, J.[Jan],
Molchanov, P.[Pavlo],
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains
Into One,
CVPR24(12490-12500)
IEEE DOI Code:
WWW Link.
2410
Training, Vocabulary, Visualization, Image resolution,
Semantic segmentation, Pipelines, Visual Foundation Model,
Zero Shot Classification
BibRef
Li, S.[Shikai],
Fu, J.L.[Jiang-Lin],
Liu, K.Y.[Kai-Yuan],
Wang, W.T.[Wen-Tao],
Lin, K.Y.[Kwan-Yee],
Wu, W.[Wayne],
CosmicMan: A Text-to-Image Foundation Model for Humans,
CVPR24(6955-6965)
IEEE DOI Code:
WWW Link.
2410
Training, Heart, Image resolution, Annotations, Text to image,
Production, Data models, Human Generation, foundation model
BibRef
Li, J.[Junyi],
Wu, J.F.[Jun-Feng],
Zhao, W.Z.[Wei-Zhi],
Bai, S.[Song],
Bai, X.[Xiang],
Partglee: A Foundation Model for Recognizing and Parsing Any Objects,
ECCV24(LXXV: 475-494).
Springer DOI
2412
BibRef
Guo, X.[Xin],
Lao, J.W.[Jiang-Wei],
Dang, B.[Bo],
Zhang, Y.Y.[Ying-Ying],
Yu, L.[Lei],
Ru, L.X.[Li-Xiang],
Zhong, L.[Liheng],
Huang, Z.Y.[Zi-Yuan],
Wu, K.[Kang],
Hu, D.X.[Ding-Xiang],
He, H.M.[Hui-Mei],
Wang, J.[Jian],
Chen, J.D.[Jing-Dong],
Yang, M.[Ming],
Zhang, Y.J.[Yong-Jun],
Li, Y.S.[Yan-Sheng],
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards
Universal Interpretation for Earth Observation Imagery,
CVPR24(27662-27673)
IEEE DOI
2410
Earth, Location awareness, Prototypes, Contrastive learning,
Spatiotemporal phenomena, Satellite images, Remote Sensing,
Multi-modal Foundation Model
BibRef
Wu, J.F.[Jun-Feng],
Jiang, Y.[Yi],
Liu, Q.H.[Qi-Hao],
Yuan, Z.H.[Ze-Huan],
Bai, X.[Xiang],
Bai, S.[Song],
General Object Foundation Model for Images and Videos at Scale,
CVPR24(3783-3795)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Image segmentation, Grounding, Soft sensors,
Large language models
BibRef
Lei, T.[Ting],
Yin, S.F.[Shao-Feng],
Liu, Y.[Yang],
Exploring the Potential of Large Foundation Models for
Open-Vocabulary HOI Detection,
CVPR24(16657-16667)
IEEE DOI Code:
WWW Link.
2410
Vocabulary, Correlation, Large language models, Semantics,
Natural languages, Detectors
BibRef
Zhang, H.J.[Hao-Jie],
Su, Y.Y.[Yong-Yi],
Xu, X.[Xun],
Jia, K.[Kui],
Improving the Generalization of Segmentation Foundation Model under
Distribution Shift via Weakly Supervised Adaptation,
CVPR24(23385-23395)
IEEE DOI
2410
Image segmentation, Costs, Large language models, Robustness,
Computational efficiency, Domain Adaptation,
Weakly Supervised Adaptation
BibRef
Wu, H.N.[Hao-Ning],
Zhang, Z.C.[Zi-Cheng],
Zhang, E.[Erli],
Chen, C.F.[Chao-Feng],
Liao, L.[Liang],
Wang, A.[Annan],
Xu, K.X.[Kai-Xin],
Li, C.Y.[Chun-Yi],
Hou, J.W.[Jing-Wen],
Zhai, G.T.[Guang-Tao],
Xue, G.[Geng],
Sun, W.X.[Wen-Xiu],
Yan, Q.[Qiong],
Lin, W.S.[Wei-Si],
Q-Instruct: Improving Low-Level Visual Abilities for Multi-Modality
Foundation Models,
CVPR24(25490-25500)
IEEE DOI
2410
Visualization, Accuracy, Large language models, Natural languages,
Solids, Quality assessment
BibRef
Han, G.X.[Guang-Xing],
Lim, S.N.[Ser-Nam],
Few-Shot Object Detection with Foundation Models,
CVPR24(28608-28618)
IEEE DOI
2410
Training, Visualization, Large language models,
Computational modeling, Object detection, Benchmark testing,
Large Language Model
BibRef
Chen, Z.[Zhe],
Wu, J.N.[Jian-Nan],
Wang, W.H.[Wen-Hai],
Su, W.J.[Wei-Jie],
Chen, G.[Guo],
Xing, S.[Sen],
Zhong, M.[Muyan],
Zhang, Q.L.[Qing-Long],
Zhu, X.Z.[Xi-Zhou],
Lu, L.W.[Le-Wei],
Li, B.[Bin],
Luo, P.[Ping],
Lu, T.[Tong],
Qiao, Y.[Yu],
Dai, J.F.[Ji-Feng],
Intern VL: Scaling up Vision Foundation Models and Aligning for
Generic Visual-Linguistic Tasks,
CVPR24(24185-24198)
IEEE DOI
2410
Training, Visualization, Image recognition, Large language models,
Data models, Question answering (information retrieval),
vision-language model
BibRef
Stevens, S.[Samuel],
Wu, J.[Jiaman],
Thompson, M.J.[Matthew J],
Campolongo, E.G.[Elizabeth G],
Song, C.H.[Chan Hee],
Carlyn, D.E.[David Edward],
Dong, L.[Li],
Dahdul, W.M.[Wasila M],
Stewart, C.[Charles],
Berger-Wolf, T.[Tanya],
Chao, W.L.[Wei-Lun],
Su, Y.[Yu],
BioCLIP: A Vision Foundation Model for the Tree of Life,
CVPR24(19412-19424)
IEEE DOI
2410
Fungi, Visualization, Biological system modeling, Plants (biology),
Vegetation, Data mining, machine learning, imageomics,
evolutionary biology & ecology
BibRef
Li, G.[Gen],
Sun, D.Q.[De-Qing],
Sevilla-Lara, L.[Laura],
Jampani, V.[Varun],
One-Shot Open Affordance Learning with Foundation Models,
CVPR24(3086-3096)
IEEE DOI Code:
WWW Link.
2410
Visualization, Affordances, Training data, Benchmark testing,
Data models, Foundation Models, Vision-Language Models
BibRef
Ma, Z.X.[Zi-Xian],
Hong, J.[Jerry],
Gul, M.O.[Mustafa Omer],
Gandhi, M.[Mona],
Gao, I.[Irena],
Krishna, R.[Ranjay],
@ CREPE: Can Vision-Language Foundation Models Reason
Compositionally?,
CVPR23(10910-10921)
IEEE DOI
2309
BibRef
Sun, X.M.[Xi-Meng],
Zhang, P.C.[Peng-Chuan],
Zhang, P.Z.[Pei-Zhao],
Shah, H.[Hardik],
Saenko, K.[Kate],
Xia, X.[Xide],
DIME-FM: DIstilling Multimodal and Efficient Foundation Models,
ICCV23(15475-15487)
IEEE DOI
2401
BibRef
Majumdar, A.[Arjun],
Ajay, A.[Anurag],
Zhang, X.H.[Xiao-Han],
Putta, P.[Pranav],
Yenamandra, S.[Sriram],
Henaff, M.[Mikael],
Silwal, S.[Sneha],
Mcvay, P.[Paul],
Maksymets, O.[Oleksandr],
Arnaud, S.[Sergio],
Yadav, K.[Karmesh],
Li, Q.[Qiyang],
Newman, B.[Ben],
Sharma, M.[Mohit],
Berges, V.[Vincent],
Zhang, S.Q.[Shi-Qi],
Agrawal, P.[Pulkit],
Bisk, Y.[Yonatan],
Batra, D.[Dhruv],
Kalakrishnan, M.[Mrinal],
Meier, F.[Franziska],
Paxton, C.[Chris],
Sax, A.[Alexander],
Rajeswaran, A.[Aravind],
OpenEQA: Embodied Question Answering in the Era of Foundation Models,
CVPR24(16488-16498)
IEEE DOI
2410
Protocols, Natural languages, Semantics, Benchmark testing,
Question answering (information retrieval),
Vision-Language Models
BibRef
Slyman, E.[Eric],
Lee, S.[Stefan],
Cohen, S.[Scott],
Kafle, K.[Kushal],
FairDeDup: Detecting and Mitigating Vision-Language Fairness
Disparities in Semantic Dataset Deduplication,
CVPR24(13905-13916)
IEEE DOI
2410
Training, Measurement, Costs, Semantics, Skin, Data models, multimodal,
fairness, vision-language, foundation models, human-centered ai, deduplication
BibRef
El Banani, M.[Mohamed],
Raj, A.[Amit],
Maninis, K.K.[Kevis-Kokitsi],
Kar, A.[Abhishek],
Li, Y.Z.[Yuan-Zhen],
Rubinstein, M.[Michael],
Sun, D.Q.[De-Qing],
Guibas, L.J.[Leonidas J.],
Johnson, J.[Justin],
Jampani, V.[Varun],
Probing the 3D Awareness of Visual Foundation Models,
CVPR24(21795-21806)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Solid modeling, Image segmentation, Codes,
Representation Learning, 3D Vision, Foundation Models, 3D Awareness
BibRef
Taher, M.R.H.[Mohammad Reza Hosseinzadeh],
Gotway, M.B.[Michael B.],
Liang, J.M.[Jian-Ming],
Representing Part-Whole Hierarchies in Foundation Models by Learning
Localizability, Composability, and Decomposability from Anatomy via
Self-Supervision,
CVPR24(11269-11281)
IEEE DOI
2410
Deep learning, Visualization, Image coding, Semantics,
Anatomical structure, Self-supervised learning
BibRef
Hong, L.Y.[Ling-Yi],
Yan, S.L.[Shi-Lin],
Zhang, R.R.[Ren-Rui],
Li, W.Y.[Wan-Yun],
Zhou, X.Y.[Xin-Yu],
Guo, P.[Pinxue],
Jiang, K.X.[Kai-Xun],
Chen, Y.T.[Yi-Ting],
Li, J.L.[Jing-Lun],
Chen, Z.Y.[Zhao-Yu],
Zhang, W.Q.[Wen-Qiang],
OneTracker: Unifying Visual Object Tracking with Foundation Models
and Efficient Tuning,
CVPR24(19079-19091)
IEEE DOI
2410
Location awareness, Visualization, Adaptation models,
Target tracking, Benchmark testing, Multi-Modality
BibRef
Zhong, F.W.[Fang-Wei],
Wu, K.[Kui],
Ci, H.[Hai],
Wang, C.[Churan],
Chen, H.[Hao],
Empowering Embodied Visual Tracking with Visual Foundation Models and
Offline RL,
ECCV24(LXXIII: 139-155).
Springer DOI
2412
Project:
WWW Link.
BibRef
Wang, Y.[Yi],
Li, K.C.[Kun-Chang],
Li, X.H.[Xin-Hao],
Yu, J.S.[Jia-Shuo],
He, Y.[Yinan],
Chen, G.[Guo],
Pei, B.Q.[Bao-Qi],
Zheng, R.K.[Rong-Kun],
Wang, Z.[Zun],
Shi, Y.S.[Yan-Song],
Jiang, T.X.[Tian-Xiang],
Li, S.Z.[Song-Ze],
Xu, J.[Jilan],
Zhang, H.J.[Hong-Jie],
Huang, Y.F.[Yi-Fei],
Qiao, Y.[Yu],
Wang, Y.[Yali],
Wang, L.M.[Li-Min],
Internvideo2: Scaling Foundation Models for Multimodal Video
Understanding,
ECCV24(LXXXV: 396-416).
Springer DOI
2412
BibRef
Tian, Y.[Yuan],
Lu, G.[Guo],
Zhai, G.T.[Guang-Tao],
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised
Video Semantic Compression,
ECCV24(XLIX: 163-183).
Springer DOI
2412
BibRef
Tian, Y.[Yuan],
Lu, G.[Guo],
Zhai, G.T.[Guang-Tao],
Gao, Z.Y.[Zhi-Yong],
Non-Semantics Suppressed Mask Learning for Unsupervised Video
Semantic Compression,
ICCV23(13564-13576)
IEEE DOI
2401
BibRef
Zhang, C.[Chenhui],
Wang, S.[Sherrie],
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth
observation data,
EarthVision24(7839-7849)
IEEE DOI
2410
Location awareness, Earth, Visualization, Satellites, Natural languages,
Training data, Land surface, foundation model, benchmark
BibRef
Gao, Z.T.[Zi-Teng],
Tong, Z.[Zhan],
Lin, K.Q.[Kevin Qinghong],
Chen, J.[Joya],
Shou, M.Z.[Mike Zheng],
Bootstrapping SparseFormers from Vision Foundation Models,
CVPR24(17710-17721)
IEEE DOI Code:
WWW Link.
2410
Training, Visualization, Accuracy, Semantic segmentation,
Transformers
BibRef
Chen, F.[Feng],
Giuffrida, M.V.[Mario Valerio],
Tsaftaris, S.A.[Sotirios A.],
Adapting Vision Foundation Models for Plant Phenotyping,
CVPPA23(604-613)
IEEE DOI
2401
BibRef
Wang, T.[Tan],
Lin, K.[Kevin],
Li, L.J.[Lin-Jie],
Lin, C.C.[Chung-Ching],
Yang, Z.Y.[Zheng-Yuan],
Zhang, H.W.[Han-Wang],
Liu, Z.C.[Zi-Cheng],
Wang, L.J.[Li-Juan],
Equivariant Similarity for Vision-Language Foundation Models,
ICCV23(11964-11974)
IEEE DOI
2401
BibRef
Ge, Y.Y.[Yu-Ying],
Macaluso, A.[Annabella],
Li, L.E.[Li Erran],
Luo, P.[Ping],
Wang, X.L.[Xiao-Long],
Policy Adaptation from Foundation Model Feedback,
CVPR23(19059-19069)
IEEE DOI
2309
BibRef
Dombrowski, M.[Mischa],
Reynaud, H.[Hadrien],
Baugh, M.[Matthew],
Kainz, B.[Bernhard],
Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models,
ICCV23(988-998)
IEEE DOI Code:
WWW Link.
2401
BibRef
Salin, E.[Emmanuelle],
Ayache, S.[Stéphane],
Favre, B.[Benoit],
Towards an Exhaustive Evaluation of Vision-Language Foundation Models,
MMFM23(339-352)
IEEE DOI
2401
BibRef
Wang, W.H.[Wen-Hai],
Dai, J.F.[Ji-Feng],
Chen, Z.[Zhe],
Huang, Z.H.[Zhen-Hang],
Li, Z.Q.[Zhi-Qi],
Zhu, X.Z.[Xi-Zhou],
Hu, X.W.[Xiao-Wei],
Lu, T.[Tong],
Lu, L.W.[Le-Wei],
Li, H.S.[Hong-Sheng],
Wang, X.G.[Xiao-Gang],
Qiao, Y.[Yu],
InternImage: Exploring Large-Scale Vision Foundation Models with
Deformable Convolutions,
CVPR23(14408-14419)
IEEE DOI
2309
BibRef
Shin, G.[Gyungin],
Xie, W.[Weidi],
Albanie, S.[Samuel],
NamedMask: Distilling Segmenters from Complementary Foundation Models,
L3D-IVU23(4961-4970)
IEEE DOI
2309
BibRef
Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Object Recognition, General Techniques .