14.5.9.6 Vision Transformers, ViT

Chapter Contents (Back)
Vision Transformers. Transformers. A subset
See also Attention in Vision Transformers. Shift, Scale, and Distortion Invariance. Shifted Window:
See also SWIN Transformer. Video specific:
See also Video Transformers. Semantic Segmentation:
See also Vision Transformers for Semantic Segmentation.
See also Zero-Shot Learning.
See also Detection Transformer, DETR Applications.

Bazi, Y.[Yakoub], Bashmal, L.[Laila], Al Rahhal, M.M.[Mohamad M.], Al Dayil, R.[Reham], Al Ajlan, N.[Naif],
Vision Transformers for Remote Sensing Image Classification,
RS(13), No. 3, 2021, pp. xx-yy.
DOI Link 2102
BibRef

Li, T.[Tao], Zhang, Z.[Zheng], Pei, L.[Lishen], Gan, Y.[Yan],
HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval,
SPLetters(29), 2022, pp. 827-831.
IEEE DOI 2204
Transformers, Binary codes, Task analysis, Training, Image retrieval, Feature extraction, Databases, Binary embedding, image retrieval BibRef

Jiang, B.[Bo], Zhao, K.K.[Kang-Kang], Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI 2204
Measurement, Transformers, Image representation, Feature extraction, Visualization, transformer BibRef

Chen, Z.M.[Zhao-Min], Cui, Q.[Quan], Zhao, B.[Borui], Song, R.J.[Ren-Jie], Zhang, X.Q.[Xiao-Qin], Yoshie, O.[Osamu],
SST: Spatial and Semantic Transformers for Multi-Label Image Recognition,
IP(31), 2022, pp. 2570-2583.
IEEE DOI 2204
Correlation, Semantics, Transformers, Image recognition, Task analysis, Training, Feature extraction, label correlation BibRef

Wang, G.H.[Guang-Hui], Li, B.[Bin], Zhang, T.[Tao], Zhang, S.[Shubi],
A Network Combining a Transformer and a Convolutional Neural Network for Remote Sensing Image Change Detection,
RS(14), No. 9, 2022, pp. xx-yy.
DOI Link 2205
BibRef

Luo, G.[Gen], Zhou, Y.[Yiyi], Sun, X.S.[Xiao-Shuai], Wang, Y.[Yan], Cao, L.J.[Liu-Juan], Wu, Y.J.[Yong-Jian], Huang, F.Y.[Fei-Yue], Ji, R.R.[Rong-Rong],
Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks,
IP(31), 2022, pp. 3386-3398.
IEEE DOI 2205
Transformers, Task analysis, Computational modeling, Benchmark testing, Visualization, Convolution, Head, reference expression comprehension BibRef

Wang, J.Y.[Jia-Yun], Chakraborty, R.[Rudrasis], Yu, S.X.[Stella X.],
Transformer for 3D Point Clouds,
PAMI(44), No. 8, August 2022, pp. 4419-4431.
IEEE DOI 2207
Convolution, Feature extraction, Shape, Semantics, Task analysis, Measurement, point cloud, transformation, deformable, segmentation, 3D detection BibRef

Li, Z.K.[Ze-Kun], Liu, Y.F.[Yu-Fan], Li, B.[Bing], Feng, B.L.[Bai-Lan], Wu, K.[Kebin], Peng, C.W.[Cheng-Wei], Hu, W.M.[Wei-Ming],
SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction,
CirSysVideo(32), No. 9, September 2022, pp. 6160-6173.
IEEE DOI 2209
Transformers, Semantics, Task analysis, Detectors, Image segmentation, Head, Convolution, Transformer, dense prediction, multi-level interaction BibRef

Wu, J.J.[Jia-Jing], Wei, Z.Q.[Zhi-Qiang], Zhang, J.P.[Jin-Peng], Zhang, Y.S.[Yu-Shi], Jia, D.N.[Dong-Ning], Yin, B.[Bo], Yu, Y.C.[Yun-Chao],
Full-Coupled Convolutional Transformer for Surface-Based Duct Refractivity Inversion,
RS(14), No. 17, 2022, pp. xx-yy.
DOI Link 2209
BibRef

Jiang, K.[Kai], Peng, P.[Peng], Lian, Y.[Youzao], Xu, W.S.[Wei-Sheng],
The encoding method of position embeddings in vision transformer,
JVCIR(89), 2022, pp. 103664.
Elsevier DOI 2212
Vision transformer, Position embeddings, Gabor filters BibRef

Han, K.[Kai], Wang, Y.H.[Yun-He], Chen, H.T.[Han-Ting], Chen, X.H.[Xing-Hao], Guo, J.Y.[Jian-Yuan], Liu, Z.H.[Zhen-Hua], Tang, Y.[Yehui], Xiao, A.[An], Xu, C.J.[Chun-Jing], Xu, Y.X.[Yi-Xing], Yang, Z.H.[Zhao-Hui], Zhang, Y.[Yiman], Tao, D.C.[Da-Cheng],
A Survey on Vision Transformer,
PAMI(45), No. 1, January 2023, pp. 87-110.
IEEE DOI 2212
Survey, Vision Transformer. Transformers, Task analysis, Encoding, Computational modeling, Visualization, Object detection, high-level vision, video BibRef

Hou, Q.[Qibin], Jiang, Z.H.[Zi-Hang], Yuan, L.[Li], Cheng, M.M.[Ming-Ming], Yan, S.C.[Shui-Cheng], Feng, J.S.[Jia-Shi],
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition,
PAMI(45), No. 1, January 2023, pp. 1328-1334.
IEEE DOI 2212
Transformers, Encoding, Visualization, Convolutional codes, Mixers, Computer architecture, Training data, Vision permutator, deep neural network BibRef

Yu, W.H.[Wei-Hao], Si, C.Y.[Chen-Yang], Zhou, P.[Pan], Luo, M.[Mi], Zhou, Y.C.[Yi-Chen], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng], Wang, X.C.[Xin-Chao],
MetaFormer Baselines for Vision,
PAMI(46), No. 2, February 2024, pp. 896-912.
IEEE DOI 2401
BibRef
And: A1, A4, A3, A2, A5, A8, A6, A7:
MetaFormer is Actually What You Need for Vision,
CVPR22(10809-10819)
IEEE DOI 2210
The abstracted architecture of Transformer. Computational modeling, Focusing, Transformers, Pattern recognition, Task analysis, retrieval BibRef

Zhou, D.[Daquan], Hou, Q.[Qibin], Yang, L.J.[Lin-Jie], Jin, X.J.[Xiao-Jie], Feng, J.S.[Jia-Shi],
Token Selection is a Simple Booster for Vision Transformers,
PAMI(45), No. 11, November 2023, pp. 12738-12746.
IEEE DOI 2310
BibRef

Yuan, L.[Li], Hou, Q.[Qibin], Jiang, Z.H.[Zi-Hang], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
VOLO: Vision Outlooker for Visual Recognition,
PAMI(45), No. 5, May 2023, pp. 6575-6586.
IEEE DOI 2304
Transformers, Computer architecture, Computational modeling, Training, Data models, Task analysis, Visualization, image classification BibRef

Ren, S.[Sucheng], Zhou, D.[Daquan], He, S.F.[Sheng-Feng], Feng, J.S.[Jia-Shi], Wang, X.C.[Xin-Chao],
Shunted Self-Attention via Multi-Scale Token Aggregation,
CVPR22(10843-10852)
IEEE DOI 2210
Degradation, Deep learning, Costs, Computational modeling, Merging, Efficient learning and inferences BibRef

Wu, Y.H.[Yu-Huan], Liu, Y.[Yun], Zhan, X.[Xin], Cheng, M.M.[Ming-Ming],
P2T: Pyramid Pooling Transformer for Scene Understanding,
PAMI(45), No. 11, November 2023, pp. 12760-12771.
IEEE DOI 2310
BibRef

Li, Y.[Yehao], Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Mei, T.[Tao],
Contextual Transformer Networks for Visual Recognition,
PAMI(45), No. 2, February 2023, pp. 1489-1500.
IEEE DOI 2301
Transformers, Convolution, Visualization, Task analysis, Image recognition, Object detection, Transformer, image recognition BibRef

Wang, H.[Hang], Du, Y.[Youtian], Zhang, Y.[Yabin], Li, S.[Shuai], Zhang, L.[Lei],
One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing,
IP(32), 2023, pp. 190-202.
IEEE DOI 2301
Visualization, Proposals, Transformers, Task analysis, Detectors, Message passing, Predictive models, gated message passing BibRef

Kim, B.[Boah], Kim, J.[Jeongsol], Ye, J.C.[Jong Chul],
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing,
IP(32), 2023, pp. 203-218.
IEEE DOI 2301
Task analysis, Transformers, Servers, Distance learning, Computer aided instruction, Tail, Head, Distributed learning, task-agnostic learning BibRef

Park, S.[Sangjoon], Ye, J.C.[Jong Chul],
Multi-Task Distributed Learning Using Vision Transformer With Random Patch Permutation,
MedImg(42), No. 7, July 2023, pp. 2091-2105.
IEEE DOI 2307
Task analysis, Transformers, Head, Tail, Servers, Multitasking, Distance learning, Federated learning, split learning, privacy preservation BibRef

Kiya, H.[Hitoshi], Iijima, R.[Ryota], Maungmaung, A.[Aprilpyone], Kinoshit, Y.[Yuma],
Image and Model Transformation with Secret Key for Vision Transformer,
IEICE(E106-D), No. 1, January 2023, pp. 2-11.
WWW Link. 2301
BibRef

Zhang, H.F.[Hao-Fei], Mao, F.[Feng], Xue, M.Q.[Meng-Qi], Fang, G.F.[Gong-Fan], Feng, Z.L.[Zun-Lei], Song, J.[Jie], Song, M.L.[Ming-Li],
Knowledge Amalgamation for Object Detection With Transformers,
IP(32), 2023, pp. 2093-2106.
IEEE DOI 2304
Transformers, Task analysis, Object detection, Detectors, Training, Feature extraction, Model reusing, vision transformers BibRef

Li, Y.[Ying], Chen, K.[Kehan], Sun, S.L.[Shi-Lei], He, C.[Chu],
Multi-scale homography estimation based on dual feature aggregation transformer,
IET-IPR(17), No. 5, 2023, pp. 1403-1416.
DOI Link 2304
image matching, image registration BibRef

Wang, G.Q.[Guan-Qun], Chen, H.[He], Chen, L.[Liang], Zhuang, Y.[Yin], Zhang, S.H.[Shang-Hang], Zhang, T.[Tong], Dong, H.[Hao], Gao, P.[Peng],
P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification,
RS(15), No. 7, 2023, pp. 1773.
DOI Link 2304
BibRef

Zhang, Q.M.[Qi-Ming], Xu, Y.F.[Yu-Fei], Zhang, J.[Jing], Tao, D.C.[Da-Cheng],
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond,
IJCV(131), No. 5, May 2023, pp. 1141-1162.
Springer DOI 2305
BibRef

Zhang, J.N.[Jiang-Ning], Li, X.T.[Xiang-Tai], Wang, Y.B.[Ya-Biao], Wang, C.J.[Cheng-Jie], Yang, Y.B.[Yi-Bo], Liu, Y.[Yong], Tao, D.C.[Da-Cheng],
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm,
IJCV(132), No. 1, January 2024, pp. 3509-3536.
Springer DOI 2409
BibRef

Fan, X.Y.[Xin-Yi], Liu, H.J.[Hua-Jun],
FlexFormer: Flexible Transformer for efficient visual recognition,
PRL(169), 2023, pp. 95-101.
Elsevier DOI 2305
Vision transformer, Frequency analysis, Image classification BibRef

Cho, S.[Seokju], Hong, S.[Sunghwan], Kim, S.[Seungryong],
CATs++: Boosting Cost Aggregation With Convolutions and Transformers,
PAMI(45), No. 6, June 2023, pp. 7174-7194.
IEEE DOI
WWW Link. 2305
Costs, Transformers, Correlation, Semantics, Feature extraction, Task analysis, Cost aggregation, efficient transformer, semantic visual correspondence BibRef

Kim, B.J.[Bum Jun], Choi, H.[Hyeyeon], Jang, H.[Hyeonah], Lee, D.G.[Dong Gu], Jeong, W.[Wonseok], Kim, S.W.[Sang Woo],
Improved robustness of vision transformers via prelayernorm in patch embedding,
PR(141), 2023, pp. 109659.
Elsevier DOI 2306
Vision transformer, Patch embedding, Contrast enhancement, Robustness, Layer normalization, Convolutional neural network, Deep learning BibRef

Wang, Z.W.[Zi-Wei], Wang, C.Y.[Chang-Yuan], Xu, X.W.[Xiu-Wei], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Quantformer: Learning Extremely Low-Precision Vision Transformers,
PAMI(45), No. 7, July 2023, pp. 8813-8826.
IEEE DOI 2306
Quantization (signal), Transformers, Computational modeling, Search problems, Object detection, Image color analysis, vision transformers BibRef

Sun, S.Y.[Shu-Yang], Yue, X.Y.[Xiao-Yu], Zhao, H.S.[Heng-Shuang], Torr, P.H.S.[Philip H.S.], Bai, S.[Song],
Patch-Based Separable Transformer for Visual Recognition,
PAMI(45), No. 7, July 2023, pp. 9241-9247.
IEEE DOI 2306
Task analysis, Current transformers, Visualization, Feature extraction, Convolutional neural networks, instance segmentation BibRef

Yue, X.Y.[Xiao-Yu], Sun, S.Y.[Shu-Yang], Kuang, Z.H.[Zhang-Hui], Wei, M.[Meng], Torr, P.H.S.[Philip H.S.], Zhang, W.[Wayne], Lin, D.[Dahua],
Vision Transformer with Progressive Sampling,
ICCV21(377-386)
IEEE DOI 2203
Codes, Computational modeling, Interference, Transformers, Feature extraction, Recognition and classification, Representation learning BibRef

Peng, Z.L.[Zhi-Liang], Guo, Z.H.[Zong-Hao], Huang, W.[Wei], Wang, Y.W.[Yao-Wei], Xie, L.X.[Ling-Xi], Jiao, J.B.[Jian-Bin], Tian, Q.[Qi], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Recognition and Detection,
PAMI(45), No. 8, August 2023, pp. 9454-9468.
IEEE DOI 2307
Transformers, Feature extraction, Couplings, Visualization, Detectors, Convolution, Object detection, Feature fusion, vision transformer BibRef

Peng, Z.L.[Zhi-Liang], Huang, W.[Wei], Gu, S.Z.[Shan-Zhi], Xie, L.X.[Ling-Xi], Wang, Y.[Yaowei], Jiao, J.B.[Jian-Bin], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Visual Recognition,
ICCV21(357-366)
IEEE DOI 2203
Couplings, Representation learning, Visualization, Fuses, Convolution, Object detection, Transformers, Representation learning BibRef

Feng, Z.Z.[Zhan-Zhou], Zhang, S.L.[Shi-Liang],
Efficient Vision Transformer via Token Merger,
IP(32), 2023, pp. 4156-4169.
IEEE DOI 2307
Corporate acquisitions, Transformers, Semantics, Task analysis, Visualization, Merging, Computational efficiency, sparese representation BibRef

Yang, J.H.[Jia-Hao], Li, X.Y.[Xiang-Yang], Zheng, M.[Mao], Wang, Z.H.[Zi-Han], Zhu, Y.Q.[Yong-Qing], Guo, X.Q.[Xiao-Qian], Yuan, Y.C.[Yu-Chen], Chai, Z.[Zifeng], Jiang, S.Q.[Shu-Qiang],
MemBridge: Video-Language Pre-Training With Memory-Augmented Inter-Modality Bridge,
IP(32), 2023, pp. 4073-4087.
IEEE DOI 2307

WWW Link. Bridges, Transformers, Computer architecture, Task analysis, Visualization, Feature extraction, Memory modules, memory module BibRef

Wang, D.L.[Duo-Lin], Chen, Y.[Yadang], Naz, B.[Bushra], Sun, L.[Le], Li, B.Z.[Bao-Zhu],
Spatial-Aware Transformer (SAT): Enhancing Global Modeling in Transformer Segmentation for Remote Sensing Images,
RS(15), No. 14, 2023, pp. 3607.
DOI Link 2307
BibRef

Huang, X.Y.[Xin-Yan], Liu, F.[Fang], Cui, Y.H.[Yuan-Hao], Chen, P.[Puhua], Li, L.L.[Ling-Ling], Li, P.F.[Peng-Fang],
Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification,
RS(15), No. 14, 2023, pp. 3645.
DOI Link 2307
BibRef

Yao, T.[Ting], Li, Y.[Yehao], Pan, Y.W.[Ying-Wei], Wang, Y.[Yu], Zhang, X.P.[Xiao-Ping], Mei, T.[Tao],
Dual Vision Transformer,
PAMI(45), No. 9, September 2023, pp. 10870-10882.
IEEE DOI 2309
Survey, Vision Transformer. BibRef

Rao, Y.M.[Yong-Ming], Liu, Z.[Zuyan], Zhao, W.L.[Wen-Liang], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks,
PAMI(45), No. 9, September 2023, pp. 10883-10897.
IEEE DOI 2309
BibRef

Li, J.[Jie], Liu, Z.[Zhao], Li, L.[Li], Lin, J.Q.[Jun-Qin], Yao, J.[Jian], Tu, J.[Jingmin],
Multi-view convolutional vision transformer for 3D object recognition,
JVCIR(95), 2023, pp. 103906.
Elsevier DOI 2309
Multi-view, 3D object recognition, Feature fusion, Convolutional neural networks BibRef

Shang, J.H.[Jing-Huan], Li, X.[Xiang], Kahatapitiya, K.[Kumara], Lee, Y.C.[Yu-Cheol], Ryoo, M.S.[Michael S.],
StARformer: Transformer With State-Action-Reward Representations for Robot Learning,
PAMI(45), No. 11, November 2023, pp. 12862-12877.
IEEE DOI 2310
BibRef
Earlier: A1, A3, A2, A5, Only:
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning,
ECCV22(XXIX:462-479).
Springer DOI 2211
BibRef

Duan, H.R.[Hao-Ran], Long, Y.[Yang], Wang, S.D.[Shi-Dong], Zhang, H.F.[Hao-Feng], Willcocks, C.G.[Chris G.], Shao, L.[Ling],
Dynamic Unary Convolution in Transformers,
PAMI(45), No. 11, November 2023, pp. 12747-12759.
IEEE DOI 2310
BibRef

Chen, S.M.[Shi-Ming], Hong, Z.M.[Zi-Ming], Hou, W.J.[Wen-Jin], Xie, G.S.[Guo-Sen], Song, Y.B.[Yi-Bing], Zhao, J.[Jian], You, X.G.[Xin-Ge], Yan, S.C.[Shui-Cheng], Shao, L.[Ling],
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning,
PAMI(45), No. 11, November 2023, pp. 12844-12861.
IEEE DOI 2310
BibRef

Qian, S.J.[Sheng-Ju], Zhu, Y.[Yi], Li, W.B.[Wen-Bo], Li, M.[Mu], Jia, J.Y.[Jia-Ya],
What Makes for Good Tokenizers in Vision Transformer?,
PAMI(45), No. 11, November 2023, pp. 13011-13023.
IEEE DOI 2310
BibRef

Sun, W.X.[Wei-Xuan], Qin, Z.[Zhen], Deng, H.[Hui], Wang, J.[Jianyuan], Zhang, Y.[Yi], Zhang, K.[Kaihao], Barnes, N.[Nick], Birchfield, S.[Stan], Kong, L.P.[Ling-Peng], Zhong, Y.[Yiran],
Vicinity Vision Transformer,
PAMI(45), No. 10, October 2023, pp. 12635-12649.
IEEE DOI 2310
BibRef

Cao, C.J.[Chen-Jie], Dong, Q.[Qiaole], Fu, Y.W.[Yan-Wei],
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors,
PAMI(45), No. 10, October 2023, pp. 12667-12684.
IEEE DOI 2310
BibRef

Fang, Y.X.[Yu-Xin], Wang, X.G.[Xing-Gang], Wu, R.[Rui], Liu, W.Y.[Wen-Yu],
What Makes for Hierarchical Vision Transformer?,
PAMI(45), No. 10, October 2023, pp. 12714-12720.
IEEE DOI 2310
BibRef

Xu, P.[Peng], Zhu, X.T.[Xia-Tian], Clifton, D.A.[David A.],
Multimodal Learning With Transformers: A Survey,
PAMI(45), No. 10, October 2023, pp. 12113-12132.
IEEE DOI 2310
BibRef

Liu, J.[Jun], Guo, H.R.[Hao-Ran], He, Y.[Yile], Li, H.L.[Hua-Li],
Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification,
RS(15), No. 21, 2023, pp. 5208.
DOI Link 2311
BibRef

Lin, M.B.[Ming-Bao], Chen, M.Z.[Meng-Zhao], Zhang, Y.X.[Yu-Xin], Shen, C.H.[Chun-Hua], Ji, R.R.[Rong-Rong], Cao, L.J.[Liu-Juan],
Super Vision Transformer,
IJCV(131), No. 12, December 2023, pp. 3136-3151.
Springer DOI 2311
BibRef

Li, Z.Y.[Zhong-Yu], Gao, S.H.[Shang-Hua], Cheng, M.M.[Ming-Ming],
SERE: Exploring Feature Self-Relation for Self-Supervised Transformer,
PAMI(45), No. 12, December 2023, pp. 15619-15631.
IEEE DOI 2311
BibRef

Yuan, Y.H.[Yu-Hui], Liang, W.C.[Wei-Cong], Ding, H.H.[Heng-Hui], Liang, Z.H.[Zhan-Hao], Zhang, C.[Chao], Hu, H.[Han],
Expediting Large-Scale Vision Transformer for Dense Prediction Without Fine-Tuning,
PAMI(46), No. 1, January 2024, pp. 250-266.
IEEE DOI 2312
BibRef

Jiao, J.[Jiayu], Tang, Y.M.[Yu-Ming], Lin, K.Y.[Kun-Yu], Gao, Y.P.[Yi-Peng], Ma, A.J.[Andy J.], Wang, Y.W.[Yao-Wei], Zheng, W.S.[Wei-Shi],
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition,
MultMed(25), 2023, pp. 8906-8919.
IEEE DOI Code:
HTML Version. 2312
BibRef

Li, Z.H.[Zi-Han], Li, Y.X.[Yun-Xiang], Li, Q.D.[Qing-De], Wang, P.[Puyang], Guo, D.[Dazhou], Lu, L.[Le], Jin, D.[Dakai], Zhang, Y.[You], Hong, Q.Q.[Qing-Qi],
LViT: Language Meets Vision Transformer in Medical Image Segmentation,
MedImg(43), No. 1, January 2024, pp. 96-107.
IEEE DOI Code:
WWW Link. 2401
BibRef

Fu, K.[Kexue], Yuan, M.Z.[Ming-Zhi], Liu, S.L.[Shao-Lei], Wang, M.[Manning],
Boosting Point-BERT by Multi-Choice Tokens,
CirSysVideo(34), No. 1, January 2024, pp. 438-447.
IEEE DOI 2401
self-supervised pre-training task.
See also Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. BibRef

Ghosal, S.S.[Soumya Suvra], Li, Y.X.[Yi-Xuan],
Are Vision Transformers Robust to Spurious Correlations?,
IJCV(132), No. 3, March 2024, pp. 689-709.
Springer DOI 2402
BibRef

Yan, F.Y.[Fang-Yuan], Yan, B.[Bin], Liang, W.[Wei], Pei, M.T.[Ming-Tao],
Token labeling-guided multi-scale medical image classification,
PRL(178), 2024, pp. 28-34.
Elsevier DOI 2402
Medical image classification, Vision transformer, Token labeling BibRef

Li, Y.X.[Yue-Xiang], Huang, Y.W.[Ya-Wen], He, N.[Nanjun], Ma, K.[Kai], Zheng, Y.F.[Ye-Feng],
Improving vision transformer for medical image classification via token-wise perturbation,
JVCIR(98), 2024, pp. 104022.
Elsevier DOI 2402
Self-supervised learning, Vision transformer, Image classification BibRef

Nguyen, H.[Hung], Kim, C.[Chanho], Li, F.[Fuxin],
Space-time recurrent memory network,
CVIU(241), 2024, pp. 103943.
Elsevier DOI 2403
Deep learning architectures and techniques, Segmentation, Memory network, Transformer BibRef

Kheldouni, A.[Amine], Boumhidi, J.[Jaouad],
A Study of Bidirectional Encoder Representations from Transformers for Sequential Recommendations,
ISCV22(1-5)
IEEE DOI 2208
Knowledge engineering, Recurrent neural networks, Predictive models, Markov processes BibRef

Chen, Z.[Ziyi], Bai, C.Y.[Chen-Yao], Zhu, Y.L.[Yun-Long], Lu, X.W.[Xi-Wen],
TUT: Template-Augmented U-Net Transformer for Unsupervised Anomaly Detection,
SPLetters(31), 2024, pp. 780-784.
IEEE DOI 2404
Image reconstruction, Decoding, Convolution, Vectors, Anomaly detection, Head, Self-supervised learning, unsupervised learning BibRef

Xiao, Q.[Qiao], Zhang, Y.[Yu], Yang, Q.[Qiang],
Selective Random Walk for Transfer Learning in Heterogeneous Label Spaces,
PAMI(46), No. 6, June 2024, pp. 4476-4488.
IEEE DOI 2405
Transfer learning, Bridges, Metalearning, Adaptation models, Training, Task analysis, Transfer learning, selective random walk BibRef

Zhang, J.S.[Jin-Song], Gu, L.F.[Ling-Feng], Lai, Y.K.[Yu-Kun], Wang, X.Y.[Xue-Yang], Li, K.[Kun],
Toward Grouping in Large Scenes With Occlusion-Aware Spatio-Temporal Transformers,
CirSysVideo(34), No. 5, May 2024, pp. 3919-3929.
IEEE DOI 2405
Feature extraction, Trajectory, Transformers, Task analysis, Data mining, Video sequences, Group detection, large-scale scenes, spatio-temporal transformers BibRef

Akkaya, I.B.[Ibrahim Batuhan], Kathiresan, S.S.[Senthilkumar S.], Arani, E.[Elahe], Zonooz, B.[Bahram],
Enhancing performance of vision transformers on small datasets through local inductive bias incorporation,
PR(153), 2024, pp. 110510.
Elsevier DOI Code:
WWW Link. 2405
Vision transformer, Inductive bias, Locality, Small dataset BibRef

Yao, T.[Ting], Li, Y.[Yehao], Pan, Y.W.[Ying-Wei], Mei, T.[Tao],
HIRI-ViT: Scaling Vision Transformer With High Resolution Inputs,
PAMI(46), No. 9, September 2024, pp. 6431-6442.
IEEE DOI 2408
Transformers, Convolution, Convolutional neural networks, Computational efficiency, Spatial resolution, Visualization, vision transformer BibRef

Lu, J.C.[Jia-Chen], Zhang, J.G.[Jun-Ge], Zhu, X.T.[Xia-Tian], Feng, J.F.[Jian-Feng], Xiang, T.[Tao], Zhang, L.[Li],
Softmax-Free Linear Transformers,
IJCV(132), No. 8, August 2024, pp. 3355-3374.
Springer DOI Code:
WWW Link. 2408
Approximage the self-attention by linear function. BibRef

Xu, G.Y.[Guang-Yi], Ye, J.Y.[Jun-Yong], Liu, X.Y.[Xin-Yuan], Wen, X.B.[Xu-Bin], Li, Y.[Youwei], Wang, J.J.[Jing-Jing],
Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors,
CVIU(246), 2024, pp. 104049.
Elsevier DOI 2408
Deep learning, Vision Transformers, Fine-tuning, Plug and play, Transfer learning BibRef

Li, C.H.[Cheng-Hao], Zhang, C.N.[Chao-Ning],
Toward a deeper understanding: RetNet viewed through Convolution,
PR(155), 2024, pp. 110625.
Elsevier DOI Code:
WWW Link. 2408
Boost local response of ViT. Convolutional neural network, Vision transformer, RetNet BibRef

Yan, L.Q.[Long-Quan], Yan, R.X.[Rui-Xiang], Chai, B.[Bosong], Geng, G.H.[Guo-Hua], Zhou, P.[Pengbo], Gao, J.[Jian],
DM-GAN: CNN hybrid vits for training GANs under limited data,
PR(156), 2024, pp. 110810.
Elsevier DOI 2408
GAN, Few-shot, Vision transformer, Proprietary artifact image BibRef

Feng, Q.H.[Qi-Hua], Li, P.Y.[Pei-Ya], Lu, Z.X.[Zhi-Xun], Li, C.Z.[Chao-Zhuo], Wang, Z.[Zefan], Liu, Z.Q.[Zhi-Quan], Duan, C.H.[Chun-Hui], Huang, F.[Feiran], Weng, J.[Jian], Yu, P.S.[Philip S.],
EViT: Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing,
CirSysVideo(34), No. 8, August 2024, pp. 7467-7483.
IEEE DOI Code:
WWW Link. 2408
Feature extraction, Encryption, Codes, Cloud computing, Transform coding, Streaming media, Ciphers, Image retrieval, self-supervised learning BibRef

Liao, H.X.[Hui-Xian], Li, X.[Xiaosen], Qin, X.[Xiao], Wang, W.J.[Wen-Ji], He, G.[Guodui], Huang, H.J.[Hao-Jie], Guo, X.[Xu], Chun, X.[Xin], Zhang, J.[Jinyong], Fu, Y.Q.[Yun-Qin], Qin, Z.Y.[Zheng-You],
EPSViTs: A hybrid architecture for image classification based on parameter-shared multi-head self-attention,
IVC(149), 2024, pp. 105130.
Elsevier DOI 2408
Image classification, Multi-head self-attention, Parameter-shared, Hybrid architecture BibRef

Yang, F.F.[Fei-Fan], Chen, G.[Gang], Duan, J.S.[Jian-Shu],
Skip-Encoder and Skip-Decoder for Detection Transformer in Optical Remote Sensing,
RS(16), No. 16, 2024, pp. 2884.
DOI Link 2408
BibRef

Naeem, M.F.[Muhammad Ferjad], Xian, Y.Q.[Yong-Qin], Van Gool, L.J.[Luc J.], Tombari, F.[Federico],
I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification,
IJCV(132), No. 1, January 2024, pp. 3806-3822.
Springer DOI 2409
BibRef

Naeem, M.F.[Muhammad Ferjad], Khan, M.G.Z.A.[Muhammad Gul Zain Ali], Xian, Y.Q.[Yong-Qin], Afzal, M.Z.[Muhammad Zeshan], Stricker, D.[Didier], Van Gool, L.J.[Luc J.], Tombari, F.[Federico],
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification,
CVPR23(15169-15179)
IEEE DOI 2309
BibRef

Kim, S.[Sunpil], Yoon, G.J.[Gang-Joon], Song, J.[Jinjoo], Yoon, S.M.[Sang Min],
Simultaneous image patch attention and pruning for patch selective transformer,
IVC(150), 2024, pp. 105239.
Elsevier DOI 2409
Patch pruning, Patch emphasis, Attentive patch selection, Vision transformer BibRef

Wang, H.Y.[Hong-Yu], Ma, S.M.[Shu-Ming], Dong, L.[Li], Huang, S.[Shaohan], Zhang, D.D.[Dong-Dong], Wei, F.[Furu],
DeepNet: Scaling Transformers to 1,000 Layers,
PAMI(46), No. 10, October 2024, pp. 6761-6774.
IEEE DOI 2409
Transformers, Training, Optimization, Stability analysis, Machine translation, Decoding, Computational modeling, Big models, transformers BibRef


Edalati, A.[Ali], Hameed, M.G.A.[Marawan Gamal Abdel], Mosleh, A.[Ali],
Generalized Kronecker-based Adapters for Parameter-efficient Fine-tuning of Vision Transformers,
CRV23(97-104)
IEEE DOI 2406
Adaptation models, Tensors, Limiting, Computational modeling, Transformers, Convolutional neural networks BibRef

Herzig, R.[Roei], Abramovich, O.[Ofir], Ben Avraham, E.[Elad], Arbelle, A.[Assaf], Karlinsky, L.[Leonid], Shamir, A.[Ariel], Darrell, T.J.[Trevor J.], Globerson, A.[Amir],
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data,
WACV24(6789-6801)
IEEE DOI Code:
WWW Link. 2404
Graphics, Solid modeling, Annotations, Transformers, Multitasking, Task analysis, Algorithms, Video recognition and understanding, Image recognition and understanding BibRef

Marouf, I.E.[Imad Eddine], Tartaglione, E.[Enzo], Lathuiličre, S.[Stéphane],
Mini but Mighty: Finetuning ViTs with Mini Adapters,
WACV24(1721-1730)
IEEE DOI 2404
Training, Costs, Neurons, Transfer learning, Estimation, Computer architecture, Algorithms BibRef

Kim, G.[Gihyun], Kim, J.[Juyeop], Lee, J.S.[Jong-Seok],
Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective,
WACV24(3964-3973)
IEEE DOI 2404
Deep learning, Perturbation methods, Frequency-domain analysis, Linearity, Transformers, Robustness, High frequency, Algorithms, adversarial attack and defense methods BibRef

Xu, X.[Xuwei], Wang, S.[Sen], Chen, Y.D.[Yu-Dong], Zheng, Y.P.[Yan-Ping], Wei, Z.W.[Zhe-Wei], Liu, J.J.[Jia-Jun],
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation,
WACV24(86-95)
IEEE DOI Code:
WWW Link. 2404
Source coding, Computational modeling, Merging, Broadcasting, Transformers, Computational complexity, Algorithms BibRef

Han, Q.[Qiu], Zhang, G.J.[Gong-Jie], Huang, J.X.[Jia-Xing], Gao, P.[Peng], Wei, Z.[Zhang], Lu, S.J.[Shi-Jian],
Efficient MAE towards Large-Scale Vision Transformers,
WACV24(595-604)
IEEE DOI 2404
Measurement, Degradation, Visualization, Runtime, Computational modeling, Transformers, Algorithms BibRef

Park, J.W.[Jong-Woo], Kahatapitiya, K.[Kumara], Kim, D.H.[Dong-Hyun], Sudalairaj, S.[Shivchander], Fan, Q.F.[Quan-Fu], Ryoo, M.S.[Michael S.],
Grafting Vision Transformers,
WACV24(1134-1143)
IEEE DOI Code:
WWW Link. 2404
Codes, Computational modeling, Semantics, Information sharing, Computer architecture, Transformers, Algorithms, Image recognition and understanding BibRef

Shimizu, S.[Shuki], Tamaki, T.[Toru],
Joint learning of images and videos with a single Vision Transformer,
MVA23(1-6)
DOI Link 2403
Training, Image recognition, Machine vision, Transformers, Tuning, Videos BibRef

Li, K.C.[Kun-Chang], Wang, Y.[Yali], Li, Y.Z.[Yi-Zhuo], Wang, Y.[Yi], He, Y.[Yinan], Wang, L.M.[Li-Min], Qiao, Y.[Yu],
Unmasked Teacher: Towards Training-Efficient Video Foundation Models,
ICCV23(19891-19903)
IEEE DOI 2401
BibRef

Ding, S.R.[Shuang-Rui], Zhao, P.S.[Pei-Sen], Zhang, X.P.[Xiao-Peng], Qian, R.[Rui], Xiong, H.K.[Hong-Kai], Tian, Q.[Qi],
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation,
ICCV23(16899-16910)
IEEE DOI Code:
WWW Link. 2401
BibRef

Chen, M.Z.[Meng-Zhao], Lin, M.[Mingbao], Lin, Z.H.[Zhi-Hang], Zhang, Y.X.[Yu-Xin], Chao, F.[Fei], Ji, R.R.[Rong-Rong],
SMMix: Self-Motivated Image Mixing for Vision Transformers,
ICCV23(17214-17224)
IEEE DOI Code:
WWW Link. 2401
BibRef

Kim, D.[Dahun], Angelova, A.[Anelia], Kuo, W.C.[Wei-Cheng],
Contrastive Feature Masking Open-Vocabulary Vision Transformer,
ICCV23(15556-15566)
IEEE DOI 2401
BibRef

Zhang, Y.[Yuke], Chen, D.[Dake], Kundu, S.[Souvik], Li, C.H.[Cheng-Hao], Beerel, P.A.[Peter A.],
SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation,
ICCV23(5093-5102)
IEEE DOI 2401
BibRef

Li, Z.K.[Zhi-Kai], Gu, Q.Y.[Qing-Yi],
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference,
ICCV23(17019-17029)
IEEE DOI Code:
WWW Link. 2401
BibRef

Frumkin, N.[Natalia], Gope, D.[Dibakar], Marculescu, D.[Diana],
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers,
ICCV23(16932-16942)
IEEE DOI Code:
WWW Link. 2401
BibRef

Li, Z.K.[Zhi-Kai], Xiao, J.R.[Jun-Rui], Yang, L.W.[Lian-Wei], Gu, Q.Y.[Qing-Yi],
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers,
ICCV23(17181-17190)
IEEE DOI Code:
WWW Link. 2401
BibRef

Havtorn, J.D.[Jakob Drachmann], Royer, A.[Amélie], Blankevoort, T.[Tijmen], Bejnordi, B.E.[Babak Ehteshami],
MSViT: Dynamic Mixed-scale Tokenization for Vision Transformers,
NIVT23(838-848)
IEEE DOI 2401
BibRef

Haurum, J.B.[Joakim Bruslund], Escalera, S.[Sergio], Taylor, G.W.[Graham W.], Moeslund, T.B.[Thomas B.],
Which Tokens to Use? Investigating Token Reduction in Vision Transformers,
NIVT23(773-783)
IEEE DOI Code:
WWW Link. 2401
BibRef

Wang, X.[Xijun], Chu, X.J.[Xiao-Jie], Han, C.[Chunrui], Zhang, X.Y.[Xiang-Yu],
SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers,
NIVT23(731-741)
IEEE DOI 2401
BibRef

Chen, Y.H.[Yi-Hsin], Weng, Y.C.[Ying-Chieh], Kao, C.H.[Chia-Hao], Chien, C.[Cheng], Chiu, W.C.[Wei-Chen], Peng, W.H.[Wen-Hsiao],
TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception,
ICCV23(23240-23250)
IEEE DOI 2401
BibRef

Li, Y.[Yanyu], Hu, J.[Ju], Wen, Y.[Yang], Evangelidis, G.[Georgios], Salahi, K.[Kamyar], Wang, Y.Z.[Yan-Zhi], Tulyakov, S.[Sergey], Ren, J.[Jian],
Rethinking Vision Transformers for MobileNet Size and Speed,
ICCV23(16843-16854)
IEEE DOI 2401
BibRef

Nurgazin, M.[Maxat], Tu, N.A.[Nguyen Anh],
A Comparative Study of Vision Transformer Encoders and Few-shot Learning for Medical Image Classification,
CVAMD23(2505-2513)
IEEE DOI 2401
BibRef

Yeganeh, Y.[Yousef], Farshad, A.[Azade], Weinberger, P.[Peter], Ahmadi, S.A.[Seyed-Ahmad], Adeli, E.[Ehsan], Navab, N.[Nassir],
Transformers Pay Attention to Convolutions Leveraging Emerging Properties of ViTs by Dual Attention-Image Network,
CVAMD23(2296-2307)
IEEE DOI 2401
BibRef

Zheng, J.H.[Jia-Hao], Yang, L.Q.[Long-Qi], Li, Y.[Yiying], Yang, K.[Ke], Wang, Z.Y.[Zhi-Yuan], Zhou, J.[Jun],
Lightweight Vision Transformer with Spatial and Channel Enhanced Self-Attention,
REDLCV23(1484-1488)
IEEE DOI 2401
BibRef

Xie, W.[Wei], Zhao, Z.[Zimeng], Li, S.Y.[Shi-Ying], Zuo, B.H.[Bing-Hui], Wang, Y.G.[Yan-Gang],
Nonrigid Object Contact Estimation With Regional Unwrapping Transformer,
ICCV23(9308-9317)
IEEE DOI 2401
BibRef

Vasu, P.K.A.[Pavan Kumar Anasosalu], Gabriel, J.[James], Zhu, J.[Jeff], Tuzel, O.[Oncel], Ranjan, A.[Anurag],
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization,
ICCV23(5762-5772)
IEEE DOI Code:
WWW Link. 2401
BibRef

Hyeon-Woo, N.[Nam], Yu-Ji, K.[Kim], Heo, B.[Byeongho], Han, D.Y.[Dong-Yoon], Oh, S.J.[Seong Joon], Oh, T.H.[Tae-Hyun],
Scratching Visual Transformer's Back with Uniform Attention,
ICCV23(5784-5795)
IEEE DOI 2401
BibRef

Tang, C.[Chen], Zhang, L.L.[Li Lyna], Jiang, H.Q.[Hui-Qiang], Xu, J.H.[Jia-Hang], Cao, T.[Ting], Zhang, Q.[Quanlu], Yang, Y.Q.[Yu-Qing], Wang, Z.[Zhi], Yang, M.[Mao],
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices,
ICCV23(5806-5817)
IEEE DOI 2401
BibRef

Ren, S.[Sucheng], Yang, X.Y.[Xing-Yi], Liu, S.[Songhua], Wang, X.C.[Xin-Chao],
SG-Former: Self-guided Transformer with Evolving Token Reallocation,
ICCV23(5980-5991)
IEEE DOI Code:
WWW Link. 2401
BibRef

Lin, W.F.[Wei-Feng], Wu, Z.H.[Zi-Heng], Chen, J.[Jiayu], Huang, J.[Jun], Jin, L.W.[Lian-Wen],
Scale-Aware Modulation Meet Transformer,
ICCV23(5992-6003)
IEEE DOI Code:
WWW Link. 2401
BibRef

Zhang, H.K.[Hao-Kui], Hu, W.Z.[Wen-Ze], Wang, X.Y.[Xiao-Yu],
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer,
ICCV23(6037-6046)
IEEE DOI Code:
WWW Link. 2401
BibRef

He, Y.F.[Ye-Fei], Lou, Z.Y.[Zhen-Yu], Zhang, L.[Luoming], Liu, J.[Jing], Wu, W.J.[Wei-Jia], Zhou, H.[Hong], Zhuang, B.[Bohan],
BiViT: Extremely Compressed Binary Vision Transformers,
ICCV23(5628-5640)
IEEE DOI 2401
BibRef

Dutson, M.[Matthew], Li, Y.[Yin], Gupta, M.[Mohit],
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers,
ICCV23(16865-16877)
IEEE DOI 2401
BibRef

Wang, Z.Q.[Zi-Qing], Fang, Y.T.[Yue-Tong], Cao, J.H.[Jia-Hang], Zhang, Q.[Qiang], Wang, Z.[Zhongrui], Xu, R.[Renjing],
Masked Spiking Transformer,
ICCV23(1761-1771)
IEEE DOI Code:
WWW Link. 2401
BibRef

Peebles, W.[William], Xie, S.[Saining],
Scalable Diffusion Models with Transformers,
ICCV23(4172-4182)
IEEE DOI 2401
BibRef

Zeng, W.X.[Wen-Xuan], Li, M.[Meng], Xiong, W.J.[Wen-Jie], Tong, T.[Tong], Lu, W.J.[Wen-Jie], Tan, J.[Jin], Wang, R.S.[Run-Sheng], Huang, R.[Ru],
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention,
ICCV23(5029-5040)
IEEE DOI Code:
WWW Link. 2401
BibRef

Mentzer, F.[Fabian], Agustson, E.[Eirikur], Tschannen, M.[Michael],
M2T: Masking Transformers Twice for Faster Decoding,
ICCV23(5317-5326)
IEEE DOI 2401
BibRef

Psomas, B.[Bill], Kakogeorgiou, I.[Ioannis], Karantzalos, K.[Konstantinos], Avrithis, Y.[Yannis],
Keep It SimPool:Who Said Supervised Transformers Suffer from Attention Deficit?,
ICCV23(5327-5337)
IEEE DOI Code:
WWW Link. 2401
BibRef

Xiao, H.[Han], Zheng, W.Z.[Wen-Zhao], Zhu, Z.[Zheng], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Token-Label Alignment for Vision Transformers,
ICCV23(5472-5481)
IEEE DOI Code:
WWW Link. 2401
BibRef

Yu, R.Y.[Run-Yi], Wang, Z.N.[Zhen-Nan], Wang, Y.H.[Yin-Huai], Li, K.[Kehan], Liu, C.[Chang], Duan, H.[Haoyi], Ji, X.Y.[Xiang-Yang], Chen, J.[Jie],
LaPE: Layer-adaptive Position Embedding for Vision Transformers with Independent Layer Normalization,
ICCV23(5863-5873)
IEEE DOI 2401
BibRef

Roy, A.[Anurag], Verma, V.K.[Vinay K.], Voonna, S.[Sravan], Ghosh, K.[Kripabandhu], Ghosh, S.[Saptarshi], Das, A.[Abir],
Exemplar-Free Continual Transformer with Convolutions,
ICCV23(5874-5884)
IEEE DOI 2401
BibRef

Xu, Y.X.[Yi-Xing], Li, C.[Chao], Li, D.[Dong], Sheng, X.[Xiao], Jiang, F.[Fan], Tian, L.[Lu], Sirasao, A.[Ashish],
FDViT: Improve the Hierarchical Architecture of Vision Transformer,
ICCV23(5927-5937)
IEEE DOI 2401
BibRef

Han, D.C.[Dong-Chen], Pan, X.[Xuran], Han, Y.Z.[Yi-Zeng], Song, S.[Shiji], Huang, G.[Gao],
FLatten Transformer: Vision Transformer using Focused Linear Attention,
ICCV23(5938-5948)
IEEE DOI Code:
WWW Link. 2401
BibRef

Chen, Y.J.[Yong-Jie], Liu, H.M.[Hong-Min], Yin, H.R.[Hao-Ran], Fan, B.[Bin],
Building Vision Transformers with Hierarchy Aware Feature Aggregation,
ICCV23(5885-5895)
IEEE DOI 2401
BibRef

Quétu, V.[Victor], Milovanovic, M.[Marta], Tartaglione, E.[Enzo],
Sparse Double Descent in Vision Transformers: Real or Phantom Threat?,
CIAP23(II:490-502).
Springer DOI 2312
BibRef

Ak, K.E.[Kenan Emir], Lee, G.G.[Gwang-Gook], Xu, Y.[Yan], Shen, M.W.[Ming-Wei],
Leveraging Efficient Training and Feature Fusion in Transformers for Multimodal Classification,
ICIP23(1420-1424)
IEEE DOI 2312
BibRef

Popovic, N.[Nikola], Paudel, D.P.[Danda Pani], Probst, T.[Thomas], Van Gool, L.J.[Luc J.],
Token-Consistent Dropout For Calibrated Vision Transformers,
ICIP23(1030-1034)
IEEE DOI 2312
BibRef

Sajjadi, M.S.M.[Mehdi S. M.], Mahendran, A.[Aravindh], Kipf, T.[Thomas], Pot, E.[Etienne], Duckworth, D.[Daniel], Lucic, M.[Mario], Greff, K.[Klaus],
RUST: Latent Neural Scene Representations from Unposed Imagery,
CVPR23(17297-17306)
IEEE DOI 2309
BibRef

Bowman, B.[Benjamin], Achille, A.[Alessandro], Zancato, L.[Luca], Trager, M.[Matthew], Perera, P.[Pramuditha], Paolini, G.[Giovanni], Soatto, S.[Stefano],
Ŕ-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting,
CVPR23(14984-14993)
IEEE DOI 2309
BibRef

Nakhli, R.[Ramin], Moghadam, P.A.[Puria Azadi], Mi, H.Y.[Hao-Yang], Farahani, H.[Hossein], Baras, A.[Alexander], Gilks, B.[Blake], Bashashati, A.[Ali],
Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images,
CVPR23(11547-11557)
IEEE DOI 2309
BibRef

Gärtner, E.[Erik], Metz, L.[Luke], Andriluka, M.[Mykhaylo], Freeman, C.D.[C. Daniel], Sminchisescu, C.[Cristian],
Transformer-Based Learned Optimization,
CVPR23(11970-11979)
IEEE DOI 2309
BibRef

Li, J.C.[Jia-Chen], Hassani, A.[Ali], Walton, S.[Steven], Shi, H.[Humphrey],
ConvMLP: Hierarchical Convolutional MLPs for Vision,
WFM23(6307-6316)
IEEE DOI 2309
multi-layer perceptron BibRef

Walmer, M.[Matthew], Suri, S.[Saksham], Gupta, K.[Kamal], Shrivastava, A.[Abhinav],
Teaching Matters: Investigating the Role of Supervision in Vision Transformers,
CVPR23(7486-7496)
IEEE DOI 2309
BibRef

Wang, S.G.[Shi-Guang], Xie, T.[Tao], Cheng, J.[Jian], Zhang, X.C.[Xing-Cheng], Liu, H.J.[Hai-Jun],
MDL-NAS: A Joint Multi-domain Learning Framework for Vision Transformer,
CVPR23(20094-20104)
IEEE DOI 2309
BibRef

Ko, D.[Dohwan], Choi, J.[Joonmyung], Choi, H.K.[Hyeong Kyu], On, K.W.[Kyoung-Woon], Roh, B.[Byungseok], Kim, H.W.J.[Hyun-Woo J.],
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models,
CVPR23(20105-20115)
IEEE DOI 2309
BibRef

Ren, S.[Sucheng], Wei, F.Y.[Fang-Yun], Zhang, Z.[Zheng], Hu, H.[Han],
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models,
CVPR23(3687-3697)
IEEE DOI 2309
BibRef

He, J.F.[Jian-Feng], Gao, Y.[Yuan], Zhang, T.Z.[Tian-Zhu], Zhang, Z.[Zhe], Wu, F.[Feng],
D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers,
CVPR23(2904-2914)
IEEE DOI 2309
BibRef

Chen, X.Y.[Xuan-Yao], Liu, Z.J.[Zhi-Jian], Tang, H.T.[Hao-Tian], Yi, L.[Li], Zhao, H.[Hang], Han, S.[Song],
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer,
CVPR23(2061-2070)
IEEE DOI 2309
BibRef

Wei, S.Y.[Si-Yuan], Ye, T.Z.[Tian-Zhu], Zhang, S.[Shen], Tang, Y.[Yao], Liang, J.J.[Jia-Jun],
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers,
CVPR23(2092-2101)
IEEE DOI 2309
BibRef

Lin, Y.B.[Yan-Bo], Sung, Y.L.[Yi-Lin], Lei, J.[Jie], Bansal, M.[Mohit], Bertasius, G.[Gedas],
Vision Transformers are Parameter-Efficient Audio-Visual Learners,
CVPR23(2299-2309)
IEEE DOI 2309
BibRef

Das, R.[Rajshekhar], Dukler, Y.[Yonatan], Ravichandran, A.[Avinash], Swaminathan, A.[Ashwin],
Learning Expressive Prompting With Residuals for Vision Transformers,
CVPR23(3366-3377)
IEEE DOI 2309
BibRef

Zheng, M.X.[Meng-Xin], Lou, Q.[Qian], Jiang, L.[Lei],
TrojViT: Trojan Insertion in Vision Transformers,
CVPR23(4025-4034)
IEEE DOI 2309
BibRef

Guo, Y.[Yong], Stutz, D.[David], Schiele, B.[Bernt],
Improving Robustness of Vision Transformers by Reducing Sensitivity to Patch Corruptions,
CVPR23(4108-4118)
IEEE DOI 2309
BibRef

Li, Y.X.[Yan-Xi], Xu, C.[Chang],
Trade-off between Robustness and Accuracy of Vision Transformers,
CVPR23(7558-7568)
IEEE DOI 2309
BibRef

Tarasiou, M.[Michail], Chavez, E.[Erik], Zafeiriou, S.[Stefanos],
ViTs for SITS: Vision Transformers for Satellite Image Time Series,
CVPR23(10418-10428)
IEEE DOI 2309
BibRef

Yu, Z.Z.[Zhong-Zhi], Wu, S.[Shang], Fu, Y.G.[Yong-Gan], Zhang, S.[Shunyao], Lin, Y.Y.C.[Ying-Yan Celine],
Hint-Aug: Drawing Hints from Foundation Vision Transformers towards Boosted Few-shot Parameter-Efficient Tuning,
CVPR23(11102-11112)
IEEE DOI 2309
BibRef

Kim, D.[Dahun], Angelova, A.[Anelia], Kuo, W.C.[Wei-Cheng],
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers,
CVPR23(11144-11154)
IEEE DOI 2309
BibRef

Hou, J.[Ji], Dai, X.L.[Xiao-Liang], He, Z.J.[Zi-Jian], Dai, A.[Angela], Nießner, M.[Matthias],
Mask3D: Pretraining 2D Vision Transformers by Learning Masked 3D Priors,
CVPR23(13510-13519)
IEEE DOI 2309
BibRef

Xu, Z.Z.[Zheng-Zhuo], Liu, R.K.[Rui-Kang], Yang, S.[Shuo], Chai, Z.H.[Zeng-Hao], Yuan, C.[Chun],
Learning Imbalanced Data with Vision Transformers,
CVPR23(15793-15803)
IEEE DOI 2309
BibRef

Zhang, J.P.[Jian-Ping], Huang, Y.Z.[Yi-Zhan], Wu, W.B.[Wei-Bin], Lyu, M.R.[Michael R.],
Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization,
CVPR23(16415-16424)
IEEE DOI 2309
BibRef

Yang, H.[Huanrui], Yin, H.X.[Hong-Xu], Shen, M.[Maying], Molchanov, P.[Pavlo], Li, H.[Hai], Kautz, J.[Jan],
Global Vision Transformer Pruning with Hessian-Aware Saliency,
CVPR23(18547-18557)
IEEE DOI 2309
BibRef

Nakamura, R.[Ryo], Kataoka, H.[Hirokatsu], Takashima, S.[Sora], Noriega, E.J.M.[Edgar Josafat Martinez], Yokota, R.[Rio], Inoue, N.[Nakamasa],
Pre-training Vision Transformers with Very Limited Synthesized Images,
ICCV23(20303-20312)
IEEE DOI 2401
BibRef

Takashima, S.[Sora], Hayamizu, R.[Ryo], Inoue, N.[Nakamasa], Kataoka, H.[Hirokatsu], Yokota, R.[Rio],
Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves,
CVPR23(18579-18588)
IEEE DOI 2309
BibRef

Kang, D.[Dahyun], Koniusz, P.[Piotr], Cho, M.[Minsu], Murray, N.[Naila],
Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification and Segmentation,
CVPR23(19627-19638)
IEEE DOI 2309
BibRef

Liu, Y.J.[Yi-Jiang], Yang, H.R.[Huan-Rui], Dong, Z.[Zhen], Keutzer, K.[Kurt], Du, L.[Li], Zhang, S.H.[Shang-Hang],
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers,
CVPR23(20321-20330)
IEEE DOI 2309
BibRef

Park, J.[Jeongsoo], Johnson, J.[Justin],
RGB No More: Minimally-Decoded JPEG Vision Transformers,
CVPR23(22334-22346)
IEEE DOI 2309
BibRef

Yu, C.[Chong], Chen, T.[Tao], Gan, Z.X.[Zhong-Xue], Fan, J.Y.[Jia-Yuan],
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization,
CVPR23(22658-22668)
IEEE DOI 2309
BibRef

Bao, F.[Fan], Nie, S.[Shen], Xue, K.W.[Kai-Wen], Cao, Y.[Yue], Li, C.X.[Chong-Xuan], Su, H.[Hang], Zhu, J.[Jun],
All are Worth Words: A ViT Backbone for Diffusion Models,
CVPR23(22669-22679)
IEEE DOI 2309
BibRef

Li, B.[Bonan], Hu, Y.[Yinhan], Nie, X.C.[Xue-Cheng], Han, C.Y.[Cong-Ying], Jiang, X.J.[Xiang-Jian], Guo, T.D.[Tian-De], Liu, L.Q.[Luo-Qi],
DropKey for Vision Transformer,
CVPR23(22700-22709)
IEEE DOI 2309
BibRef

Lan, S.Y.[Shi-Yi], Yang, X.[Xitong], Yu, Z.[Zhiding], Wu, Z.[Zuxuan], Alvarez, J.M.[Jose M.], Anandkumar, A.[Anima],
Vision Transformers are Good Mask Auto-Labelers,
CVPR23(23745-23755)
IEEE DOI 2309
BibRef

Yu, L.[Lu], Xiang, W.[Wei],
X-Pruner: eXplainable Pruning for Vision Transformers,
CVPR23(24355-24363)
IEEE DOI 2309
BibRef

Singh, A.[Apoorv],
Training Strategies for Vision Transformers for Object Detection,
WAD23(110-118)
IEEE DOI 2309
BibRef

Hukkelĺs, H.[Hĺkon], Lindseth, F.[Frank],
Does Image Anonymization Impact Computer Vision Training?,
WAD23(140-150)
IEEE DOI 2309
BibRef

Marnissi, M.A.[Mohamed Amine],
Revolutionizing Thermal Imaging: GAN-Based Vision Transformers for Image Enhancement,
ICIP23(2735-2739)
IEEE DOI 2312
BibRef

Marnissi, M.A.[Mohamed Amine], Fathallah, A.[Abir],
GAN-based Vision Transformer for High-Quality Thermal Image Enhancement,
GCV23(817-825)
IEEE DOI 2309
BibRef

Scheibenreif, L.[Linus], Mommert, M.[Michael], Borth, D.[Damian],
Masked Vision Transformers for Hyperspectral Image Classification,
EarthVision23(2166-2176)
IEEE DOI 2309
BibRef

Komorowski, P.[Piotr], Baniecki, H.[Hubert], Biecek, P.[Przemyslaw],
Towards Evaluating Explanations of Vision Transformers for Medical Imaging,
XAI4CV23(3726-3732)
IEEE DOI 2309
BibRef

Nalmpantis, A.[Angelos], Panagiotopoulos, A.[Apostolos], Gkountouras, J.[John], Papakostas, K.[Konstantinos], Aziz, W.[Wilker],
Vision DiffMask: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking,
XAI4CV23(3756-3763)
IEEE DOI 2309
BibRef

Ronen, T.[Tomer], Levy, O.[Omer], Golbert, A.[Avram],
Vision Transformers with Mixed-Resolution Tokenization,
ECV23(4613-4622)
IEEE DOI 2309
BibRef

Le, P.H.C.[Phuoc-Hoan Charles], Li, X.[Xinlin],
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models,
ECV23(4665-4674)
IEEE DOI 2309
BibRef

Ma, D.[Dongning], Zhao, P.F.[Peng-Fei], Jiao, X.[Xun],
PerfHD: Efficient ViT Architecture Performance Ranking using Hyperdimensional Computing,
NAS23(2230-2237)
IEEE DOI 2309
BibRef

Wang, J.[Jun], Alamayreh, O.[Omran], Tondi, B.[Benedetta], Barni, M.[Mauro],
Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture,
WMF23(953-962)
IEEE DOI 2309
BibRef

Tian, R.[Rui], Wu, Z.[Zuxuan], Dai, Q.[Qi], Hu, H.[Han], Qiao, Y.[Yu], Jiang, Y.G.[Yu-Gang],
ResFormer: Scaling ViTs with Multi-Resolution Training,
CVPR23(22721-22731)
IEEE DOI 2309
BibRef

Li, Y.[Yi], Min, K.[Kyle], Tripathi, S.[Subarna], Vasconcelos, N.M.[Nuno M.],
SViTT: Temporal Learning of Sparse Video-Text Transformers,
CVPR23(18919-18929)
IEEE DOI 2309
BibRef

Beyer, L.[Lucas], Izmailov, P.[Pavel], Kolesnikov, A.[Alexander], Caron, M.[Mathilde], Kornblith, S.[Simon], Zhai, X.H.[Xiao-Hua], Minderer, M.[Matthias], Tschannen, M.[Michael], Alabdulmohsin, I.[Ibrahim], Pavetic, F.[Filip],
FlexiViT: One Model for All Patch Sizes,
CVPR23(14496-14506)
IEEE DOI 2309
BibRef

Chang, S.N.[Shu-Ning], Wang, P.[Pichao], Lin, M.[Ming], Wang, F.[Fan], Zhang, D.J.H.[David Jun-Hao], Jin, R.[Rong], Shou, M.Z.[Mike Zheng],
Making Vision Transformers Efficient from A Token Sparsification View,
CVPR23(6195-6205)
IEEE DOI 2309
BibRef

Phan, L.[Lam], Nguyen, H.T.H.[Hiep Thi Hong], Warrier, H.[Harikrishna], Gupta, Y.[Yogesh],
Patch Embedding as Local Features: Unifying Deep Local and Global Features via Vision Transformer for Image Retrieval,
ACCV22(II:204-221).
Springer DOI 2307
BibRef

Guo, X.D.[Xin-Dong], Sun, Y.[Yu], Zhao, R.[Rong], Kuang, L.Q.[Li-Qun], Han, X.[Xie],
SWPT: Spherical Window-based Point Cloud Transformer,
ACCV22(I:396-412).
Springer DOI 2307
BibRef

Wang, W.J.[Wen-Ju], Chen, G.[Gang], Zhou, H.R.[Hao-Ran], Wang, X.L.[Xiao-Lin],
OVPT: Optimal Viewset Pooling Transformer for 3d Object Recognition,
ACCV22(I:486-503).
Springer DOI 2307
BibRef

Kim, D.[Daeho], Kim, J.[Jaeil],
Vision Transformer Compression and Architecture Exploration with Efficient Embedding Space Search,
ACCV22(III:524-540).
Springer DOI 2307
BibRef

Lee, Y.S.[Yun-Sung], Lee, G.[Gyuseong], Ryoo, K.[Kwangrok], Go, H.[Hyojun], Park, J.[Jihye], Kim, S.[Seungryong],
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling,
VIPriors22(706-720).
Springer DOI 2304
Transformers vs. CNN different benefits. Best of both. BibRef

Amir, S.[Shir], Gandelsman, Y.[Yossi], Bagon, S.[Shai], Dekel, T.[Tali],
On the Effectiveness of VIT Features as Local Semantic Descriptors,
SelfLearn22(39-55).
Springer DOI 2304
BibRef

Deng, X.[Xuran], Liu, C.B.[Chuan-Bin], Lu, Z.Y.[Zhi-Ying],
Recombining Vision Transformer Architecture for Fine-grained Visual Categorization,
MMMod23(II: 127-138).
Springer DOI 2304
BibRef

Tonkes, V.[Vincent], Sabatelli, M.[Matthia],
How Well Do Vision Transformers (vts) Transfer to the Non-natural Image Domain? An Empirical Study Involving Art Classification,
VisArt22(234-250).
Springer DOI 2304
BibRef

Rangrej, S.B.[Samrudhdhi B], Liang, K.J.[Kevin J], Hassner, T.[Tal], Clark, J.J.[James J],
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction,
WACV23(3402-3412)
IEEE DOI 2302
Predictive models, Transformers, Cameras, Spatiotemporal phenomena, Sensors, Observability BibRef

Liu, Y.[Yue], Matsoukas, C.[Christos], Strand, F.[Fredrik], Azizpour, H.[Hossein], Smith, K.[Kevin],
PatchDropout: Economizing Vision Transformers Using Patch Dropout,
WACV23(3942-3951)
IEEE DOI 2302
Training, Image resolution, Computational modeling, Biological system modeling, Memory management, Transformers, Biomedical/healthcare/medicine BibRef

Song, C.H.[Chull Hwan], Yoon, J.Y.[Joo-Young], Choi, S.[Shunghyun], Avrithis, Y.[Yannis],
Boosting vision transformers for image retrieval,
WACV23(107-117)
IEEE DOI 2302
Training, Location awareness, Image retrieval, Self-supervised learning, Image representation, Transformers BibRef

Yang, J.[Jinyu], Liu, J.J.[Jing-Jing], Xu, N.[Ning], Huang, J.Z.[Jun-Zhou],
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation,
WACV23(520-530)
IEEE DOI 2302
Benchmark testing, Image representation, Transformers, Convolutional neural networks, Task analysis, and algorithms (including transfer) BibRef

Saavedra-Ruiz, M.[Miguel], Morin, S.[Sacha], Paull, L.[Liam],
Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers,
CRV22(197-204)
IEEE DOI 2301
Adaptation models, Image segmentation, Image resolution, Navigation, Transformers, Robot sensing systems, Visual Servoing BibRef

Patel, K.[Krushi], Bur, A.M.[Andrés M.], Li, F.J.[Feng-Jun], Wang, G.H.[Guang-Hui],
Aggregating Global Features into Local Vision Transformer,
ICPR22(1141-1147)
IEEE DOI 2212
Source coding, Computational modeling, Information processing, Performance gain, Transformers BibRef

Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zechun], Xing, E.[Eric],
Sliced Recursive Transformer,
ECCV22(XXIV:727-744).
Springer DOI 2211
BibRef

Shao, Y.[Yidi], Loy, C.C.[Chen Change], Dai, B.[Bo],
Transformer with Implicit Edges for Particle-Based Physics Simulation,
ECCV22(XIX:549-564).
Springer DOI 2211
BibRef

Wang, W.[Wen], Zhang, J.[Jing], Cao, Y.[Yang], Shen, Y.L.[Yong-Liang], Tao, D.C.[Da-Cheng],
Towards Data-Efficient Detection Transformers,
ECCV22(IX:88-105).
Springer DOI 2211
BibRef

Lorenzana, M.B.[Marlon Bran], Engstrom, C.[Craig], Chandra, S.S.[Shekhar S.],
Transformer Compressed Sensing Via Global Image Tokens,
ICIP22(3011-3015)
IEEE DOI 2211
Training, Limiting, Image resolution, Neural networks, Image representation, Transformers, MRI BibRef

Lu, X.Y.[Xiao-Yong], Du, S.[Songlin],
NCTR: Neighborhood Consensus Transformer for Feature Matching,
ICIP22(2726-2730)
IEEE DOI 2211
Learning systems, Impedance matching, Aggregates, Pose estimation, Neural networks, Transformers, Local feature matching, graph neural network BibRef

Jeny, A.A.[Afsana Ahsan], Junayed, M.S.[Masum Shah], Islam, M.B.[Md Baharul],
An Efficient End-To-End Image Compression Transformer,
ICIP22(1786-1790)
IEEE DOI 2211
Image coding, Correlation, Limiting, Computational modeling, Rate-distortion, Video compression, Transformers, entropy model BibRef

Bai, J.W.[Jia-Wang], Yuan, L.[Li], Xia, S.T.[Shu-Tao], Yan, S.C.[Shui-Cheng], Li, Z.F.[Zhi-Feng], Liu, W.[Wei],
Improving Vision Transformers by Revisiting High-Frequency Components,
ECCV22(XXIV:1-18).
Springer DOI 2211
BibRef

Li, K.[Kehan], Yu, R.[Runyi], Wang, Z.[Zhennan], Yuan, L.[Li], Song, G.[Guoli], Chen, J.[Jie],
Locality Guidance for Improving Vision Transformers on Tiny Datasets,
ECCV22(XXIV:110-127).
Springer DOI 2211
BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.C.[Alan C.], Li, Y.[Yinxiao],
MaxViT: Multi-axis Vision Transformer,
ECCV22(XXIV:459-479).
Springer DOI 2211
BibRef

Yang, R.[Rui], Ma, H.L.[Hai-Long], Wu, J.[Jie], Tang, Y.S.[Yan-Song], Xiao, X.F.[Xue-Feng], Zheng, M.[Min], Li, X.[Xiu],
ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer,
ECCV22(XXIV:480-496).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], El-Nouby, A.[Alaaeldin], Verbeek, J.[Jakob], Jégou, H.[Hervé],
Three Things Everyone Should Know About Vision Transformers,
ECCV22(XXIV:497-515).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Jégou, H.[Hervé],
DeiT III: Revenge of the ViT,
ECCV22(XXIV:516-533).
Springer DOI 2211
BibRef

Li, Y.H.[Yang-Hao], Mao, H.Z.[Han-Zi], Girshick, R.[Ross], He, K.M.[Kai-Ming],
Exploring Plain Vision Transformer Backbones for Object Detection,
ECCV22(IX:280-296).
Springer DOI 2211
BibRef

Yu, Q.H.[Qi-Hang], Wang, H.Y.[Hui-Yu], Qiao, S.Y.[Si-Yuan], Collins, M.[Maxwell], Zhu, Y.K.[Yu-Kun], Adam, H.[Hartwig], Yuille, A.L.[Alan L.], Chen, L.C.[Liang-Chieh],
k-means Mask Transformer,
ECCV22(XXIX:288-307).
Springer DOI 2211
BibRef

Pham, K.[Khoi], Kafle, K.[Kushal], Lin, Z.[Zhe], Ding, Z.H.[Zhi-Hong], Cohen, S.[Scott], Tran, Q.[Quan], Shrivastava, A.[Abhinav],
Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers,
ECCV22(XXV:201-219).
Springer DOI 2211
BibRef

Yu, W.X.[Wen-Xin], Zhang, H.[Hongru], Lan, T.X.[Tian-Xiang], Hu, Y.C.[Yu-Cheng], Yin, D.[Dong],
CBPT: A New Backbone for Enhancing Information Transmission of Vision Transformers,
ICIP22(156-160)
IEEE DOI 2211
Merging, Information processing, Object detection, Transformers, Computational complexity, Vision Transformer, Backbone BibRef

Takeda, M.[Mana], Yanai, K.[Keiji],
Continual Learning in Vision Transformer,
ICIP22(616-620)
IEEE DOI 2211
Learning systems, Image recognition, Transformers, Natural language processing, Convolutional neural networks, Vision Transformer BibRef

Zhou, W.L.[Wei-Lian], Kamata, S.I.[Sei-Ichiro], Luo, Z.[Zhengbo], Xue, X.[Xi],
Rethinking Unified Spectral-Spatial-Based Hyperspectral Image Classification Under 3D Configuration of Vision Transformer,
ICIP22(711-715)
IEEE DOI 2211
Flowcharts, Correlation, Convolution, Transformers, Hyperspectral image classification, 3D coordinate positional embedding BibRef

Li, J.[Junbo], Zhang, H.[Huan], Xie, C.[Cihang],
ViP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers,
ECCV22(XXV:573-587).
Springer DOI 2211
BibRef

Cao, Y.H.[Yun-Hao], Yu, H.[Hao], Wu, J.X.[Jian-Xin],
Training Vision Transformers with only 2040 Images,
ECCV22(XXV:220-237).
Springer DOI 2211
BibRef

Wang, C.[Cong], Xu, H.M.[Hong-Min], Zhang, X.[Xiong], Wang, L.[Li], Zheng, Z.[Zhitong], Liu, H.F.[Hai-Feng],
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger,
ECCV22(XX:739-756).
Springer DOI 2211
BibRef

Wu, B.[Boxi], Gu, J.D.[Jin-Dong], Li, Z.F.[Zhi-Feng], Cai, D.[Deng], He, X.F.[Xiao-Fei], Liu, W.[Wei],
Towards Efficient Adversarial Training on Vision Transformers,
ECCV22(XIII:307-325).
Springer DOI 2211
BibRef

Gu, J.D.[Jin-Dong], Tresp, V.[Volker], Qin, Y.[Yao],
Are Vision Transformers Robust to Patch Perturbations?,
ECCV22(XII:404-421).
Springer DOI 2211
BibRef

Zong, Z.[Zhuofan], Li, K.[Kunchang], Song, G.[Guanglu], Wang, Y.[Yali], Qiao, Y.[Yu], Leng, B.[Biao], Liu, Y.[Yu],
Self-slimmed Vision Transformer,
ECCV22(XI:432-448).
Springer DOI 2211
BibRef

Fayyaz, M.[Mohsen], Koohpayegani, S.A.[Soroush Abbasi], Jafari, F.R.[Farnoush Rezaei], Sengupta, S.[Sunando], Joze, H.R.V.[Hamid Reza Vaezi], Sommerlade, E.[Eric], Pirsiavash, H.[Hamed], Gall, J.[Jürgen],
Adaptive Token Sampling for Efficient Vision Transformers,
ECCV22(XI:396-414).
Springer DOI 2211
BibRef

Li, Z.K.[Zhi-Kai], Ma, L.P.[Li-Ping], Chen, M.J.[Meng-Juan], Xiao, J.R.[Jun-Rui], Gu, Q.Y.[Qing-Yi],
Patch Similarity Aware Data-Free Quantization for Vision Transformers,
ECCV22(XI:154-170).
Springer DOI 2211
BibRef

Weng, Z.J.[Ze-Jia], Yang, X.T.[Xi-Tong], Li, A.[Ang], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang],
Semi-supervised Vision Transformers,
ECCV22(XXX:605-620).
Springer DOI 2211
BibRef

Su, T.[Tong], Ye, S.[Shuo], Song, C.Q.[Cheng-Qun], Cheng, J.[Jun],
Mask-Vit: an Object Mask Embedding in Vision Transformer for Fine-Grained Visual Classification,
ICIP22(1626-1630)
IEEE DOI 2211
Knowledge engineering, Visualization, Focusing, Interference, Benchmark testing, Transformers, Feature extraction, Knowledge Embedding BibRef

Gai, L.[Lulu], Chen, W.[Wei], Gao, R.[Rui], Chen, Y.W.[Yan-Wei], Qiao, X.[Xu],
Using Vision Transformers in 3-D Medical Image Classifications,
ICIP22(696-700)
IEEE DOI 2211
Deep learning, Training, Visualization, Transfer learning, Optimization methods, Self-supervised learning, Transformers, 3-D medical image classifications BibRef

Wu, K.[Kan], Zhang, J.[Jinnian], Peng, H.[Houwen], Liu, M.C.[Meng-Chen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
TinyViT: Fast Pretraining Distillation for Small Vision Transformers,
ECCV22(XXI:68-85).
Springer DOI 2211
BibRef

Gao, L.[Li], Nie, D.[Dong], Li, B.[Bo], Ren, X.F.[Xiao-Feng],
Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation,
ECCV22(XXIII:744-761).
Springer DOI 2211
BibRef

Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Li, Y.[Yehao], Ngo, C.W.[Chong-Wah], Mei, T.[Tao],
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning,
ECCV22(XXV:328-345).
Springer DOI 2211
BibRef

Yuan, Z.H.[Zhi-Hang], Xue, C.H.[Chen-Hao], Chen, Y.Q.[Yi-Qi], Wu, Q.[Qiang], Sun, G.Y.[Guang-Yu],
PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization,
ECCV22(XII:191-207).
Springer DOI 2211
BibRef

Kong, Z.L.[Zheng-Lun], Dong, P.Y.[Pei-Yan], Ma, X.L.[Xiao-Long], Meng, X.[Xin], Niu, W.[Wei], Sun, M.S.[Meng-Shu], Shen, X.[Xuan], Yuan, G.[Geng], Ren, B.[Bin], Tang, H.[Hao], Qin, M.H.[Ming-Hai], Wang, Y.Z.[Yan-Zhi],
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning,
ECCV22(XI:620-640).
Springer DOI 2211
BibRef

Pan, J.T.[Jun-Ting], Bulat, A.[Adrian], Tan, F.[Fuwen], Zhu, X.T.[Xia-Tian], Dudziak, L.[Lukasz], Li, H.S.[Hong-Sheng], Tzimiropoulos, G.[Georgios], Martinez, B.[Brais],
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers,
ECCV22(XI:294-311).
Springer DOI 2211
BibRef

Xiang, H.[Hao], Xu, R.S.[Run-Sheng], Ma, J.Q.[Jia-Qi],
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer,
ICCV23(284-295)
IEEE DOI Code:
WWW Link. 2401
BibRef

Xu, R.S.[Run-Sheng], Xiang, H.[Hao], Tu, Z.Z.[Zheng-Zhong], Xia, X.[Xin], Yang, M.H.[Ming-Hsuan], Ma, J.Q.[Jia-Qi],
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer,
ECCV22(XXIX:107-124).
Springer DOI 2211
BibRef

Liu, Y.[Yong], Mai, S.Q.[Si-Qi], Chen, X.N.[Xiang-Ning], Hsieh, C.J.[Cho-Jui], You, Y.[Yang],
Towards Efficient and Scalable Sharpness-Aware Minimization,
CVPR22(12350-12360)
IEEE DOI 2210

WWW Link. Training, Schedules, Scalability, Perturbation methods, Stochastic processes, Transformers, Minimization, Vision applications and systems BibRef

Ren, P.Z.[Peng-Zhen], Li, C.[Changlin], Wang, G.[Guangrun], Xiao, Y.[Yun], Du, Q.[Qing], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun],
Beyond Fixation: Dynamic Window Visual Transformer,
CVPR22(11977-11987)
IEEE DOI 2210
Performance evaluation, Visualization, Systematics, Computational modeling, Scalability, Transformers, Deep learning architectures and techniques BibRef

Bhattacharjee, D.[Deblina], Zhang, T.[Tong], Süsstrunk, S.[Sabine], Salzmann, M.[Mathieu],
MuIT: An End-to-End Multitask Learning Transformer,
CVPR22(12021-12031)
IEEE DOI 2210
Heart, Image segmentation, Computational modeling, Image edge detection, Semantics, Estimation, Predictive models, Scene analysis and understanding BibRef

Fang, J.[Jiemin], Xie, L.X.[Ling-Xi], Wang, X.G.[Xing-Gang], Zhang, X.P.[Xiao-Peng], Liu, W.Y.[Wen-Yu], Tian, Q.[Qi],
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens,
CVPR22(12053-12062)
IEEE DOI 2210
Deep learning, Visualization, Neural networks, Graphics processing units, retrieval BibRef

Sandler, M.[Mark], Zhmoginov, A.[Andrey], Vladymyrov, M.[Max], Jackson, A.[Andrew],
Fine-tuning Image Transformers using Learnable Memory,
CVPR22(12145-12154)
IEEE DOI 2210
Deep learning, Adaptation models, Costs, Computational modeling, Memory management, Transformers, Transfer/low-shot/long-tail learning BibRef

Yu, X.[Xumin], Tang, L.[Lulu], Rao, Y.M.[Yong-Ming], Huang, T.J.[Tie-Jun], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling,
CVPR22(19291-19300)
IEEE DOI 2210
Point cloud compression, Solid modeling, Computational modeling, Bit error rate, Transformers, Pattern recognition, Deep learning architectures and techniques BibRef

Park, C.[Chunghyun], Jeong, Y.[Yoonwoo], Cho, M.[Minsu], Park, J.[Jaesik],
Fast Point Transformer,
CVPR22(16928-16937)
IEEE DOI 2210
Point cloud compression, Shape, Semantics, Neural networks, Transformers, grouping and shape analysis BibRef

Zeng, W.[Wang], Jin, S.[Sheng], Liu, W.T.[Wen-Tao], Qian, C.[Chen], Luo, P.[Ping], Ouyang, W.L.[Wan-Li], Wang, X.G.[Xiao-Gang],
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer,
CVPR22(11091-11101)
IEEE DOI 2210
Visualization, Shape, Pose estimation, Semantics, Pose estimation and tracking, Deep learning architectures and techniques BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.[Alan], Li, Y.X.[Yin-Xiao],
MAXIM: Multi-Axis MLP for Image Processing,
CVPR22(5759-5770)
IEEE DOI 2210

WWW Link. Training, Photography, Adaptation models, Visualization, Computational modeling, Transformers, Low-level vision, Computational photography BibRef

Yun, S.[Sukmin], Lee, H.[Hankook], Kim, J.[Jaehyung], Shin, J.[Jinwoo],
Patch-level Representation Learning for Self-supervised Vision Transformers,
CVPR22(8344-8353)
IEEE DOI 2210
Training, Representation learning, Visualization, Neural networks, Object detection, Self-supervised learning, Transformers, Self- semi- meta- unsupervised learning BibRef

Hou, Z.J.[Ze-Jiang], Kung, S.Y.[Sun-Yuan],
Multi-Dimensional Vision Transformer Compression via Dependency Guided Gaussian Process Search,
EVW22(3668-3677)
IEEE DOI 2210
Adaptation models, Image coding, Head, Computational modeling, Neurons, Gaussian processes, Transformers BibRef

Salman, H.[Hadi], Jain, S.[Saachi], Wong, E.[Eric], Madry, A.[Aleksander],
Certified Patch Robustness via Smoothed Vision Transformers,
CVPR22(15116-15126)
IEEE DOI 2210
Visualization, Smoothing methods, Costs, Computational modeling, Transformers, Adversarial attack and defense BibRef

Wang, Y.K.[Yi-Kai], Chen, X.H.[Xing-Hao], Cao, L.[Lele], Huang, W.B.[Wen-Bing], Sun, F.C.[Fu-Chun], Wang, Y.H.[Yun-He],
Multimodal Token Fusion for Vision Transformers,
CVPR22(12176-12185)
IEEE DOI 2210
Point cloud compression, Image segmentation, Shape, Semantics, Object detection, Vision+X BibRef

Tang, Y.[Yehui], Han, K.[Kai], Wang, Y.H.[Yun-He], Xu, C.[Chang], Guo, J.Y.[Jian-Yuan], Xu, C.[Chao], Tao, D.C.[Da-Cheng],
Patch Slimming for Efficient Vision Transformers,
CVPR22(12155-12164)
IEEE DOI 2210
Visualization, Quantization (signal), Computational modeling, Aggregates, Benchmark testing, Representation learning BibRef

Zhang, J.[Jinnian], Peng, H.[Houwen], Wu, K.[Kan], Liu, M.C.[Meng-Chen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
MiniViT: Compressing Vision Transformers with Weight Multiplexing,
CVPR22(12135-12144)
IEEE DOI 2210
Multiplexing, Performance evaluation, Image coding, Codes, Computational modeling, Benchmark testing, Vision applications and systems BibRef

Chen, T.L.[Tian-Long], Zhang, Z.Y.[Zhen-Yu], Cheng, Y.[Yu], Awadallah, A.[Ahmed], Wang, Z.Y.[Zhang-Yang],
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy,
CVPR22(12010-12020)
IEEE DOI 2210
Training, Convolutional codes, Deep learning, Computational modeling, Redundancy, Deep learning architectures and techniques BibRef

Yin, H.X.[Hong-Xu], Vahdat, A.[Arash], Alvarez, J.M.[Jose M.], Mallya, A.[Arun], Kautz, J.[Jan], Molchanov, P.[Pavlo],
A-ViT: Adaptive Tokens for Efficient Vision Transformer,
CVPR22(10799-10808)
IEEE DOI 2210
Training, Adaptive systems, Network architecture, Transformers, Throughput, Hardware, Complexity theory, Efficient learning and inferences BibRef

Lu, J.H.[Jia-Hao], Zhang, X.S.[Xi Sheryl], Zhao, T.L.[Tian-Li], He, X.Y.[Xiang-Yu], Cheng, J.[Jian],
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers,
CVPR22(10041-10050)
IEEE DOI 2210
Privacy, Data privacy, Federated learning, Computational modeling, Training data, Transformers, Market research, Privacy and federated learning BibRef

Hatamizadeh, A.[Ali], Yin, H.X.[Hong-Xu], Roth, H.[Holger], Li, W.Q.[Wen-Qi], Kautz, J.[Jan], Xu, D.[Daguang], Molchanov, P.[Pavlo],
GradViT: Gradient Inversion of Vision Transformers,
CVPR22(10011-10020)
IEEE DOI 2210
Measurement, Differential privacy, Neural networks, Transformers, Pattern recognition, Security, Iterative methods, Privacy and federated learning BibRef

Zhang, H.F.[Hao-Fei], Duan, J.R.[Jia-Rui], Xue, M.Q.[Meng-Qi], Song, J.[Jie], Sun, L.[Li], Song, M.L.[Ming-Li],
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training,
CVPR22(8934-8943)
IEEE DOI 2210
Training, Upper bound, Neural networks, Training data, Network architecture, Transformers, Computer vision theory, Efficient learning and inferences BibRef

Chavan, A.[Arnav], Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zhuang], Liu, Z.[Zechun], Cheng, K.T.[Kwang-Ting], Xing, E.[Eric],
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space,
CVPR22(4921-4931)
IEEE DOI 2210
Training, Performance evaluation, Image coding, Force, Graphics processing units, Vision applications and systems BibRef

Chen, Z.Y.[Zhao-Yu], Li, B.[Bo], Wu, S.[Shuang], Xu, J.H.[Jiang-He], Ding, S.H.[Shou-Hong], Zhang, W.Q.[Wen-Qiang],
Shape Matters: Deformable Patch Attack,
ECCV22(IV:529-548).
Springer DOI 2211
BibRef

Chen, Z.Y.[Zhao-Yu], Li, B.[Bo], Xu, J.H.[Jiang-He], Wu, S.[Shuang], Ding, S.H.[Shou-Hong], Zhang, W.Q.[Wen-Qiang],
Towards Practical Certifiable Patch Defense with Vision Transformer,
CVPR22(15127-15137)
IEEE DOI 2210
Smoothing methods, Toy manufacturing industry, Semantics, Network architecture, Transformers, Robustness, Adversarial attack and defense BibRef

Chen, R.J.[Richard J.], Chen, C.[Chengkuan], Li, Y.C.[Yi-Cong], Chen, T.Y.[Tiffany Y.], Trister, A.D.[Andrew D.], Krishnan, R.G.[Rahul G.], Mahmood, F.[Faisal],
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning,
CVPR22(16123-16134)
IEEE DOI 2210
Training, Visualization, Self-supervised learning, Image representation, Transformers, Self- semi- meta- unsupervised learning BibRef

Yang, Z.[Zhao], Wang, J.Q.[Jia-Qi], Tang, Y.S.[Yan-Song], Chen, K.[Kai], Zhao, H.S.[Heng-Shuang], Torr, P.H.S.[Philip H.S.],
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation,
CVPR22(18134-18144)
IEEE DOI 2210
Image segmentation, Visualization, Image coding, Shape, Linguistics, Transformers, Feature extraction, Segmentation, grouping and shape analysis BibRef

Scheibenreif, L.[Linus], Hanna, J.[Joëlle], Mommert, M.[Michael], Borth, D.[Damian],
Self-supervised Vision Transformers for Land-cover Segmentation and Classification,
EarthVision22(1421-1430)
IEEE DOI 2210
Training, Earth, Image segmentation, Computational modeling, Conferences, Transformers BibRef

Zhai, X.H.[Xiao-Hua], Kolesnikov, A.[Alexander], Houlsby, N.[Neil], Beyer, L.[Lucas],
Scaling Vision Transformers,
CVPR22(1204-1213)
IEEE DOI 2210
Training, Error analysis, Computational modeling, Neural networks, Memory management, Training data, Transfer/low-shot/long-tail learning BibRef

Guo, J.Y.[Jian-Yuan], Han, K.[Kai], Wu, H.[Han], Tang, Y.[Yehui], Chen, X.H.[Xing-Hao], Wang, Y.H.[Yun-He], Xu, C.[Chang],
CMT: Convolutional Neural Networks Meet Vision Transformers,
CVPR22(12165-12175)
IEEE DOI 2210
Visualization, Image recognition, Force, Object detection, Transformers, Representation learning BibRef

Meng, L.C.[Ling-Chen], Li, H.D.[Heng-Duo], Chen, B.C.[Bor-Chun], Lan, S.Y.[Shi-Yi], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang], Lim, S.N.[Ser-Nam],
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition,
CVPR22(12299-12308)
IEEE DOI 2210
Image recognition, Head, Law enforcement, Computational modeling, Redundancy, Transformers, Efficient learning and inferences, retrieval BibRef

Herrmann, C.[Charles], Sargent, K.[Kyle], Jiang, L.[Lu], Zabih, R.[Ramin], Chang, H.[Huiwen], Liu, C.[Ce], Krishnan, D.[Dilip], Sun, D.Q.[De-Qing],
Pyramid Adversarial Training Improves ViT Performance,
CVPR22(13409-13419)
IEEE DOI 2210
Training, Image recognition, Stochastic processes, Transformers, Robustness, retrieval, Recognition: detection BibRef

Li, C.L.[Chang-Lin], Zhuang, B.[Bohan], Wang, G.R.[Guang-Run], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun], Yang, Y.[Yi],
Automated Progressive Learning for Efficient Training of Vision Transformers,
CVPR22(12476-12486)
IEEE DOI 2210
Training, Adaptation models, Schedules, Computational modeling, Estimation, Manuals, Transformers, Representation learning BibRef

Pu, M.Y.[Meng-Yang], Huang, Y.P.[Ya-Ping], Liu, Y.M.[Yu-Ming], Guan, Q.J.[Qing-Ji], Ling, H.B.[Hai-Bin],
EDTER: Edge Detection with Transformer,
CVPR22(1392-1402)
IEEE DOI 2210
Head, Image edge detection, Semantics, Detectors, Transformers, Feature extraction, Segmentation, grouping and shape analysis, Scene analysis and understanding BibRef

Zhu, R.[Rui], Li, Z.Q.[Zheng-Qin], Matai, J.[Janarbek], Porikli, F.M.[Fatih M.], Chandraker, M.[Manmohan],
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes,
CVPR22(2812-2821)
IEEE DOI 2210
Photorealism, Shape, Computational modeling, Lighting, Transformers, Physics-based vision and shape-from-X BibRef

Ermolov, A.[Aleksandr], Mirvakhabova, L.[Leyla], Khrulkov, V.[Valentin], Sebe, N.[Nicu], Oseledets, I.[Ivan],
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning,
CVPR22(7399-7409)
IEEE DOI 2210
Measurement, Geometry, Visualization, Semantics, Self-supervised learning, Transformer cores, Transformers, Representation learning BibRef

Zhang, C.Z.[Chong-Zhi], Zhang, M.Y.[Ming-Yuan], Zhang, S.H.[Shang-Hang], Jin, D.S.[Dai-Sheng], Zhou, Q.[Qiang], Cai, Z.A.[Zhong-Ang], Zhao, H.[Haiyu], Liu, X.L.[Xiang-Long], Liu, Z.W.[Zi-Wei],
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts,
CVPR22(7267-7276)
IEEE DOI 2210
Training, Representation learning, Systematics, Shape, Taxonomy, Self-supervised learning, Transformers, Recognition: detection, Representation learning BibRef

Hou, Z.[Zhi], Yu, B.[Baosheng], Tao, D.C.[Da-Cheng],
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning,
CVPR22(7246-7256)
IEEE DOI 2210
Training, Deep learning, Representation learning, Neural networks, Tail, Transformers, Transfer/low-shot/long-tail learning, Self- semi- meta- unsupervised learning BibRef

Zamir, S.W.[Syed Waqas], Arora, A.[Aditya], Khan, S.[Salman], Hayat, M.[Munawar], Khan, F.S.[Fahad Shahbaz], Yang, M.H.[Ming-Hsuan],
Restormer: Efficient Transformer for High-Resolution Image Restoration,
CVPR22(5718-5729)
IEEE DOI 2210
Computational modeling, Transformer cores, Transformers, Data models, Image restoration, Task analysis, Deep learning architectures and techniques BibRef

Lin, K.[Kevin], Wang, L.J.[Li-Juan], Liu, Z.C.[Zi-Cheng],
Mesh Graphormer,
ICCV21(12919-12928)
IEEE DOI 2203
Convolutional codes, Solid modeling, Network topology, Transformers, Gestures and body pose BibRef

Casey, E.[Evan], Pérez, V.[Víctor], Li, Z.R.[Zhuo-Ru],
The Animation Transformer: Visual Correspondence via Segment Matching,
ICCV21(11303-11312)
IEEE DOI 2203
Visualization, Image segmentation, Image color analysis, Production, Animation, Transformers, grouping and shape BibRef

Reizenstein, J.[Jeremy], Shapovalov, R.[Roman], Henzler, P.[Philipp], Sbordone, L.[Luca], Labatut, P.[Patrick], Novotny, D.[David],
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction,
ICCV21(10881-10891)
IEEE DOI 2203
Award, Marr Prize, HM. Point cloud compression, Transformers, Rendering (computer graphics), Cameras, Image reconstruction, 3D from multiview and other sensors BibRef

Feng, W.X.[Wei-Xin], Wang, Y.J.[Yuan-Jiang], Ma, L.H.[Li-Hua], Yuan, Y.[Ye], Zhang, C.[Chi],
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning,
ICCV21(10150-10160)
IEEE DOI 2203
Training, Representation learning, Visualization, Protocols, Object detection, Semisupervised learning, Transformers, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Wu, H.P.[Hai-Ping], Xiao, B.[Bin], Codella, N.[Noel], Liu, M.C.[Meng-Chen], Dai, X.Y.[Xi-Yang], Yuan, L.[Lu], Zhang, L.[Lei],
CvT: Introducing Convolutions to Vision Transformers,
ICCV21(22-31)
IEEE DOI 2203
Code, Vision Transformer.
WWW Link. Convolutional codes, Image resolution, Image recognition, Performance gain, Transformers, Distortion, BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Sablayrolles, A.[Alexandre], Synnaeve, G.[Gabriel], Jégou, H.[Hervé],
Going deeper with Image Transformers,
ICCV21(32-42)
IEEE DOI 2203
Training, Neural networks, Training data, Data models, Circuit faults, Recognition and classification, Optimization and learning methods BibRef

Zhao, J.W.[Jia-Wei], Yan, K.[Ke], Zhao, Y.F.[Yi-Fan], Guo, X.W.[Xiao-Wei], Huang, F.Y.[Fei-Yue], Li, J.[Jia],
Transformer-based Dual Relation Graph for Multi-label Image Recognition,
ICCV21(163-172)
IEEE DOI 2203
Image recognition, Correlation, Computational modeling, Semantics, Benchmark testing, Representation learning BibRef

Pan, Z.Z.[Zi-Zheng], Zhuang, B.[Bohan], Liu, J.[Jing], He, H.Y.[Hao-Yu], Cai, J.F.[Jian-Fei],
Scalable Vision Transformers with Hierarchical Pooling,
ICCV21(367-376)
IEEE DOI 2203
Visualization, Image recognition, Computational modeling, Scalability, Transformers, Computational efficiency, Efficient training and inference methods BibRef

Yuan, L.[Li], Chen, Y.P.[Yun-Peng], Wang, T.[Tao], Yu, W.H.[Wei-Hao], Shi, Y.J.[Yu-Jun], Jiang, Z.H.[Zi-Hang], Tay, F.E.H.[Francis E. H.], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI 2203
Training, Image resolution, Computational modeling, Image edge detection, Transformers, BibRef

Wu, B.[Bichen], Xu, C.F.[Chen-Feng], Dai, X.L.[Xiao-Liang], Wan, A.[Alvin], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Tomizuka, M.[Masayoshi], Gonzalez, J.[Joseph], Keutzer, K.[Kurt], Vajda, P.[Peter],
Visual Transformers: Where Do Transformers Really Belong in Vision Models?,
ICCV21(579-589)
IEEE DOI 2203
Training, Visualization, Image segmentation, Lips, Computational modeling, Semantics, Vision applications and systems BibRef

Hu, R.H.[Rong-Hang], Singh, A.[Amanpreet],
UniT: Multimodal Multitask Learning with a Unified Transformer,
ICCV21(1419-1429)
IEEE DOI 2203
Training, Natural languages, Object detection, Predictive models, Transformers, Multitasking, Representation learning BibRef

Qiu, Y.[Yue], Yamamoto, S.[Shintaro], Nakashima, K.[Kodai], Suzuki, R.[Ryota], Iwata, K.[Kenji], Kataoka, H.[Hirokatsu], Satoh, Y.[Yutaka],
Describing and Localizing Multiple Changes with Transformers,
ICCV21(1951-1960)
IEEE DOI 2203
Measurement, Location awareness, Codes, Natural languages, Benchmark testing, Transformers, Vision applications and systems BibRef

Song, M.[Myungseo], Choi, J.[Jinyoung], Han, B.H.[Bo-Hyung],
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform,
ICCV21(2360-2369)
IEEE DOI 2203
Training, Image coding, Neural networks, Rate-distortion, Transforms, Network architecture, Computational photography, Low-level and physics-based vision BibRef

Shenga, H.[Hualian], Cai, S.[Sijia], Liu, Y.[Yuan], Deng, B.[Bing], Huang, J.Q.[Jian-Qiang], Hua, X.S.[Xian-Sheng], Zhao, M.J.[Min-Jian],
Improving 3D Object Detection with Channel-wise Transformer,
ICCV21(2723-2732)
IEEE DOI 2203
Point cloud compression, Object detection, Detectors, Transforms, Transformers, Encoding, Detection and localization in 2D and 3D, BibRef

Zhang, P.C.[Peng-Chuan], Dai, X.[Xiyang], Yang, J.W.[Jian-Wei], Xiao, B.[Bin], Yuan, L.[Lu], Zhang, L.[Lei], Gao, J.F.[Jian-Feng],
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding,
ICCV21(2978-2988)
IEEE DOI 2203
Image segmentation, Image coding, Computational modeling, Memory management, Object detection, Transformers, Representation learning BibRef

Dong, Q.[Qi], Tu, Z.W.[Zhuo-Wen], Liao, H.F.[Hao-Fu], Zhang, Y.T.[Yu-Ting], Mahadevan, V.[Vijay], Soatto, S.[Stefano],
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries,
ICCV21(3530-3539)
IEEE DOI 2203
Visualization, Detectors, Transformers, Task analysis, Standards, Detection and localization in 2D and 3D, Representation learning BibRef

Fan, H.Q.[Hao-Qi], Xiong, B.[Bo], Mangalam, K.[Karttikeya], Li, Y.[Yanghao], Yan, Z.C.[Zhi-Cheng], Malik, J.[Jitendra], Feichtenhofer, C.[Christoph],
Multiscale Vision Transformers,
ICCV21(6804-6815)
IEEE DOI 2203
Visualization, Image recognition, Codes, Computational modeling, Transformers, Complexity theory, Recognition and classification BibRef

Mahmood, K.[Kaleel], Mahmood, R.[Rigel], van Dijk, M.[Marten],
On the Robustness of Vision Transformers to Adversarial Examples,
ICCV21(7818-7827)
IEEE DOI 2203
Transformers, Robustness, Adversarial machine learning, Security, Machine learning architectures and formulations BibRef

Chen, X.L.[Xin-Lei], Xie, S.[Saining], He, K.[Kaiming],
An Empirical Study of Training Self-Supervised Vision Transformers,
ICCV21(9620-9629)
IEEE DOI 2203
Training, Benchmark testing, Transformers, Standards, Representation learning, Recognition and classification, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Yuan, Y.[Ye], Weng, X.[Xinshuo], Ou, Y.[Yanglan], Kitani, K.[Kris],
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting,
ICCV21(9793-9803)
IEEE DOI 2203
Uncertainty, Stochastic processes, Predictive models, Transformers, Encoding, Trajectory, Motion and tracking, Vision for robotics and autonomous vehicles BibRef

Wu, K.[Kan], Peng, H.W.[Hou-Wen], Chen, M.H.[Ming-Hao], Fu, J.L.[Jian-Long], Chao, H.Y.[Hong-Yang],
Rethinking and Improving Relative Position Encoding for Vision Transformer,
ICCV21(10013-10021)
IEEE DOI 2203
Image coding, Codes, Computational modeling, Transformers, Encoding, Natural language processing, Datasets and evaluation, Recognition and classification BibRef

Bhojanapalli, S.[Srinadh], Chakrabarti, A.[Ayan], Glasner, D.[Daniel], Li, D.[Daliang], Unterthiner, T.[Thomas], Veit, A.[Andreas],
Understanding Robustness of Transformers for Image Classification,
ICCV21(10211-10221)
IEEE DOI 2203
Perturbation methods, Transformers, Robustness, Data models, Convolutional neural networks, Recognition and classification BibRef

Yan, B.[Bin], Peng, H.[Houwen], Fu, J.L.[Jian-Long], Wang, D.[Dong], Lu, H.C.[Hu-Chuan],
Learning Spatio-Temporal Transformer for Visual Tracking,
ICCV21(10428-10437)
IEEE DOI 2203
Visualization, Target tracking, Smoothing methods, Pipelines, Benchmark testing, Transformers, BibRef

Heo, B.[Byeongho], Yun, S.[Sangdoo], Han, D.Y.[Dong-Yoon], Chun, S.[Sanghyuk], Choe, J.[Junsuk], Oh, S.J.[Seong Joon],
Rethinking Spatial Dimensions of Vision Transformers,
ICCV21(11916-11925)
IEEE DOI 2203
Dimensionality reduction, Computational modeling, Object detection, Transformers, Robustness, Recognition and classification BibRef

Voskou, A.[Andreas], Panousis, K.P.[Konstantinos P.], Kosmopoulos, D.[Dimitrios], Metaxas, D.N.[Dimitris N.], Chatzis, S.[Sotirios],
Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation,
ICCV21(11926-11935)
IEEE DOI 2203
Training, Memory management, Stochastic processes, Gesture recognition, Benchmark testing, Assistive technologies, BibRef

Ranftl, R.[René], Bochkovskiy, A.[Alexey], Koltun, V.[Vladlen],
Vision Transformers for Dense Prediction,
ICCV21(12159-12168)
IEEE DOI 2203
Image resolution, Semantics, Neural networks, Estimation, Training data, grouping and shape BibRef

Chen, M.H.[Ming-Hao], Peng, H.W.[Hou-Wen], Fu, J.L.[Jian-Long], Ling, H.B.[Hai-Bin],
AutoFormer: Searching Transformers for Visual Recognition,
ICCV21(12250-12260)
IEEE DOI 2203
Training, Convolutional codes, Visualization, Head, Search methods, Manuals, Recognition and classification BibRef

Yuan, K.[Kun], Guo, S.P.[Shao-Peng], Liu, Z.W.[Zi-Wei], Zhou, A.[Aojun], Yu, F.W.[Feng-Wei], Wu, W.[Wei],
Incorporating Convolution Designs into Visual Transformers,
ICCV21(559-568)
IEEE DOI 2203
Training, Visualization, Costs, Convolution, Training data, Transformers, Feature extraction, Recognition and classification, Efficient training and inference methods BibRef

Chen, Z.[Zhengsu], Xie, L.X.[Ling-Xi], Niu, J.W.[Jian-Wei], Liu, X.F.[Xue-Feng], Wei, L.[Longhui], Tian, Q.[Qi],
Visformer: The Vision-friendly Transformer,
ICCV21(569-578)
IEEE DOI 2203
Convolutional codes, Training, Visualization, Protocols, Computational modeling, Fitting, Recognition and classification, Representation learning BibRef

Yao, Z.L.[Zhu-Liang], Cao, Y.[Yue], Lin, Y.T.[Yu-Tong], Liu, Z.[Ze], Zhang, Z.[Zheng], Hu, H.[Han],
Leveraging Batch Normalization for Vision Transformers,
NeruArch21(413-422)
IEEE DOI 2112
Training, Transformers, Feeds BibRef

Graham, B.[Ben], El-Nouby, A.[Alaaeldin], Touvron, H.[Hugo], Stock, P.[Pierre], Joulin, A.[Armand], Jégou, H.[Hervé], Douze, M.[Matthijs],
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference,
ICCV21(12239-12249)
IEEE DOI 2203
Training, Image resolution, Neural networks, Parallel processing, Transformers, Feature extraction, Representation learning BibRef

Horváth, J.[János], Baireddy, S.[Sriram], Hao, H.X.[Han-Xiang], Montserrat, D.M.[Daniel Mas], Delp, E.J.[Edward J.],
Manipulation Detection in Satellite Images Using Vision Transformer,
WMF21(1032-1041)
IEEE DOI 2109
BibRef
Earlier: A1, A4, A3, A5, Only:
Manipulation Detection in Satellite Images Using Deep Belief Networks,
WMF20(2832-2840)
IEEE DOI 2008
Image sensors, Satellites, Splicing, Forestry, Tools. Satellites, Image reconstruction, Training, Forgery, Heating systems, Feature extraction BibRef

Beal, J.[Josh], Wu, H.Y.[Hao-Yu], Park, D.H.[Dong Huk], Zhai, A.[Andrew], Kislyuk, D.[Dmitry],
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations,
WACV22(1431-1440)
IEEE DOI 2202
Visualization, Solid modeling, Systematics, Computational modeling, Transformers, Semi- and Un- supervised Learning BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Attention in Vision Transformers .


Last update:Sep 28, 2024 at 17:47:54