14.5.10.5.1 Vision Transformers, ViT

Chapter Contents (Back)
Vision Transformers. Transformers. Shift, Scale, and Distortion Invariance. Video specific:
See also Video Transformers.
See also Zero-Shot Learning.
See also Detection Transformer, DETR Applications.

Bazi, Y.[Yakoub], Bashmal, L.[Laila], Al Rahhal, M.M.[Mohamad M.], Al Dayil, R.[Reham], Al Ajlan, N.[Naif],
Vision Transformers for Remote Sensing Image Classification,
RS(13), No. 3, 2021, pp. xx-yy.
DOI Link 2102
BibRef

Hu, H.Q.[Hao-Qi], Lu, X.F.[Xiao-Feng], Zhang, X.P.[Xin-Peng], Zhang, T.X.[Tian-Xing], Sun, G.L.[Guang-Ling],
Inheritance Attention Matrix-Based Universal Adversarial Perturbations on Vision Transformers,
SPLetters(28), 2021, pp. 1923-1927.
IEEE DOI 2110
Perturbation methods, Robustness, Visualization, Transformers, Optimization, Task analysis, Head, Vision Transformers, self-attention BibRef

Li, T.[Tao], Zhang, Z.[Zheng], Pei, L.[Lishen], Gan, Y.[Yan],
HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval,
SPLetters(29), 2022, pp. 827-831.
IEEE DOI 2204
Transformers, Binary codes, Task analysis, Training, Image retrieval, Feature extraction, Databases, Binary embedding, image retrieval BibRef

Jiang, B.[Bo], Zhao, K.K.[Kang-Kang], Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI 2204
Measurement, Transformers, Image representation, Feature extraction, Visualization, transformer BibRef

Chen, Z.M.[Zhao-Min], Cui, Q.[Quan], Zhao, B.[Borui], Song, R.J.[Ren-Jie], Zhang, X.Q.[Xiao-Qin], Yoshie, O.[Osamu],
SST: Spatial and Semantic Transformers for Multi-Label Image Recognition,
IP(31), 2022, pp. 2570-2583.
IEEE DOI 2204
Correlation, Semantics, Transformers, Image recognition, Task analysis, Training, Feature extraction, label correlation BibRef

Xue, Z.X.[Zhi-Xiang], Tan, X.[Xiong], Yu, X.[Xuchu], Liu, B.[Bing], Yu, A.[Anzhu], Zhang, P.Q.[Peng-Qiang],
Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification,
IP(31), 2022, pp. 3095-3110.
IEEE DOI 2205
Feature extraction, Transformers, Hyperspectral imaging, Laser radar, Data mining, Collaboration, Data models, cross attention fusion BibRef

Wang, G.H.[Guang-Hui], Li, B.[Bin], Zhang, T.[Tao], Zhang, S.[Shubi],
A Network Combining a Transformer and a Convolutional Neural Network for Remote Sensing Image Change Detection,
RS(14), No. 9, 2022, pp. xx-yy.
DOI Link 2205
BibRef

Luo, G.[Gen], Zhou, Y.[Yiyi], Sun, X.S.[Xiao-Shuai], Wang, Y.[Yan], Cao, L.J.[Liu-Juan], Wu, Y.J.[Yong-Jian], Huang, F.Y.[Fei-Yue], Ji, R.R.[Rong-Rong],
Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks,
IP(31), 2022, pp. 3386-3398.
IEEE DOI 2205
Transformers, Task analysis, Computational modeling, Benchmark testing, Visualization, Convolution, Head, reference expression comprehension BibRef

Tu, Y.B.[Yun-Bin], Li, L.[Liang], Su, L.[Li], Gao, S.X.[Sheng-Xiang], Yan, C.G.[Cheng-Gang], Zha, Z.J.[Zheng-Jun], Yu, Z.T.[Zheng-Tao], Huang, Q.M.[Qing-Ming],
I2-Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning,
IP(31), 2022, pp. 3565-3577.
IEEE DOI 2206
Transformers, Semantics, Task analysis, Visualization, TV, Graph neural networks, TV Show captioning, transformer BibRef

Heo, J.[Jiseong], Wang, Y.[Yooseung], Park, J.[Jihun],
Occlusion-aware spatial attention transformer for occluded object recognition,
PRL(159), 2022, pp. 70-76.
Elsevier DOI 2206
Occluded object recognition, Visual transformer, Spatial attention BibRef

Wang, J.Y.[Jia-Yun], Chakraborty, R.[Rudrasis], Yu, S.X.[Stella X.],
Transformer for 3D Point Clouds,
PAMI(44), No. 8, August 2022, pp. 4419-4431.
IEEE DOI 2207
Convolution, Feature extraction, Shape, Semantics, Task analysis, Measurement, point cloud, transformation, deformable, segmentation, 3D detection BibRef

Wang, L.[Libo], Li, R.[Rui], Zhang, C.[Ce], Fang, S.H.[Sheng-Hui], Duan, C.X.[Chen-Xi], Meng, X.L.[Xiao-Liang], Atkinson, P.M.[Peter M.],
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery,
PandRS(190), 2022, pp. 196-214.
Elsevier DOI 2208
Award, U.V. Helava, ISPRS. Semantic Segmentation, Remote Sensing, Vision Transformer, Fully Transformer Network, Global-local Context, Urban Scene BibRef

Kheldouni, A.[Amine], Boumhidi, J.[Jaouad],
A Study of Bidirectional Encoder Representations from Transformers for Sequential Recommendations,
ISCV22(1-5)
IEEE DOI 2208
Knowledge engineering, Recurrent neural networks, Predictive models, Markov processes BibRef

Li, Z.[Zekun], Liu, Y.F.[Yu-Fan], Li, B.[Bing], Feng, B.L.[Bai-Lan], Wu, K.[Kebin], Peng, C.W.[Cheng-Wei], Hu, W.M.[Wei-Ming],
SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction,
CirSysVideo(32), No. 9, September 2022, pp. 6160-6173.
IEEE DOI 2209
Transformers, Semantics, Task analysis, Detectors, Image segmentation, Head, Convolution, Transformer, dense prediction, multi-level interaction BibRef

Wu, J.J.[Jia-Jing], Wei, Z.Q.[Zhi-Qiang], Zhang, J.P.[Jin-Peng], Zhang, Y.[Yushi], Jia, D.N.[Dong-Ning], Yin, B.[Bo], Yu, Y.C.[Yun-Chao],
Full-Coupled Convolutional Transformer for Surface-Based Duct Refractivity Inversion,
RS(14), No. 17, 2022, pp. xx-yy.
DOI Link 2209
BibRef

Dalmaz, O.[Onat], Yurt, M.[Mahmut], Çukur, T.[Tolga],
ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis,
MedImg(41), No. 10, October 2022, pp. 2598-2614.
IEEE DOI 2210
Transformers, Biomedical imaging, Subspace constraints, Task analysis, Image synthesis, Magnetic resonance imaging, unified BibRef

Jiang, K.[Kai], Peng, P.[Peng], Lian, Y.[Youzao], Xu, W.S.[Wei-Sheng],
The encoding method of position embeddings in vision transformer,
JVCIR(89), 2022, pp. 103664.
Elsevier DOI 2212
Vision transformer, Position embeddings, Gabor filters BibRef

Han, K.[Kai], Wang, Y.H.[Yun-He], Chen, H.[Hanting], Chen, X.[Xinghao], Guo, J.[Jianyuan], Liu, Z.H.[Zhen-Hua], Tang, Y.[Yehui], Xiao, A.[An], Xu, C.J.[Chun-Jing], Xu, Y.X.[Yi-Xing], Yang, Z.H.[Zhao-Hui], Zhang, Y.[Yiman], Tao, D.C.[Da-Cheng],
A Survey on Vision Transformer,
PAMI(45), No. 1, January 2023, pp. 87-110.
IEEE DOI 2212
Survey, Vision Transformer. Transformers, Task analysis, Encoding, Computational modeling, Visualization, Object detection, high-level vision, video BibRef

Hou, Q.[Qibin], Jiang, Z.[Zihang], Yuan, L.[Li], Cheng, M.M.[Ming-Ming], Yan, S.C.[Shui-Cheng], Feng, J.S.[Jia-Shi],
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition,
PAMI(45), No. 1, January 2023, pp. 1328-1334.
IEEE DOI 2212
Transformers, Encoding, Visualization, Convolutional codes, Mixers, Computer architecture, Training data, Vision permutator, deep neural network BibRef

Wu, Y.H.[Yu-Huan], Liu, Y.[Yun], Zhan, X.[Xin], Cheng, M.M.[Ming-Ming],
P2T: Pyramid Pooling Transformer for Scene Understanding,
PAMI(45), No. 11, November 2023, pp. 12760-12771.
IEEE DOI 2310
BibRef

Zhou, D.[Daquan], Hou, Q.[Qibin], Yang, L.J.[Lin-Jie], Jin, X.J.[Xiao-Jie], Feng, J.S.[Jia-Shi],
Token Selection is a Simple Booster for Vision Transformers,
PAMI(45), No. 11, November 2023, pp. 12738-12746.
IEEE DOI 2310
BibRef

Yu, X.H.[Xiao-Han], Wang, J.[Jun], Zhao, Y.[Yang], Gao, Y.S.[Yong-Sheng],
Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization,
PR(135), 2023, pp. 109131.
Elsevier DOI 2212
Ultra-fine-grained visual categorization, Vision transformer, Self-supervised learning, Attentive mixing BibRef

Li, Y.[Yehao], Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Mei, T.[Tao],
Contextual Transformer Networks for Visual Recognition,
PAMI(45), No. 2, February 2023, pp. 1489-1500.
IEEE DOI 2301
Transformers, Convolution, Visualization, Task analysis, Image recognition, Object detection, Transformer, image recognition BibRef

Wang, H.[Hang], Du, Y.[Youtian], Zhang, Y.[Yabin], Li, S.[Shuai], Zhang, L.[Lei],
One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing,
IP(32), 2023, pp. 190-202.
IEEE DOI 2301
Visualization, Proposals, Transformers, Task analysis, Detectors, Message passing, Predictive models, gated message passing BibRef

Kim, B.[Boah], Kim, J.[Jeongsol], Ye, J.C.[Jong Chul],
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing,
IP(32), 2023, pp. 203-218.
IEEE DOI 2301
Task analysis, Transformers, Servers, Distance learning, Computer aided instruction, Tail, Head, Distributed learning, task-agnostic learning BibRef

Park, S.[Sangjoon], Ye, J.C.[Jong Chul],
Multi-Task Distributed Learning Using Vision Transformer With Random Patch Permutation,
MedImg(42), No. 7, July 2023, pp. 2091-2105.
IEEE DOI 2307
Task analysis, Transformers, Head, Tail, Servers, Multitasking, Distance learning, Federated learning, split learning, privacy preservation BibRef

Kiya, H.[Hitoshi], Iijima, R.[Ryota], Maungmaung, A.[Aprilpyone], Kinoshit, Y.[Yuma],
Image and Model Transformation with Secret Key for Vision Transformer,
IEICE(E106-D), No. 1, January 2023, pp. 2-11.
WWW Link. 2301
BibRef

Lin, X.[Xiao], Sun, S.Z.[Shu-Zhou], Huang, W.[Wei], Sheng, B.[Bin], Li, P.[Ping], Feng, D.D.[David Dagan],
EAPT: Efficient Attention Pyramid Transformer for Image Processing,
MultMed(25), 2023, pp. 50-61.
IEEE DOI 2301
Transformers, Encoding, Task analysis, Semantics, Feature extraction, Costs, Convolutional neural networks, Transformer, semantic segmentation BibRef

Mou, C.[Chong], Zhang, J.[Jian],
TransCL: Transformer Makes Strong and Flexible Compressive Learning,
PAMI(45), No. 4, April 2023, pp. 5236-5251.
IEEE DOI 2303
Task analysis, Transformers, Image reconstruction, Image coding, Compressed sensing, Sensors, Cameras, Compressed sensing, semantic segmentation BibRef

Yuan, L.[Li], Hou, Q.[Qibin], Jiang, Z.[Zihang], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
VOLO: Vision Outlooker for Visual Recognition,
PAMI(45), No. 5, May 2023, pp. 6575-6586.
IEEE DOI 2304
Transformers, Computer architecture, Computational modeling, Training, Data models, Task analysis, Visualization, image classification BibRef

Zhang, H.F.[Hao-Fei], Mao, F.[Feng], Xue, M.Q.[Meng-Qi], Fang, G.F.[Gong-Fan], Feng, Z.L.[Zun-Lei], Song, J.[Jie], Song, M.L.[Ming-Li],
Knowledge Amalgamation for Object Detection With Transformers,
IP(32), 2023, pp. 2093-2106.
IEEE DOI 2304
Transformers, Task analysis, Object detection, Detectors, Training, Feature extraction, Model reusing, vision transformers BibRef

Li, Y.[Ying], Chen, K.[Kehan], Sun, S.L.[Shi-Lei], He, C.[Chu],
Multi-scale homography estimation based on dual feature aggregation transformer,
IET-IPR(17), No. 5, 2023, pp. 1403-1416.
DOI Link 2304
image matching, image registration BibRef

Wang, G.Q.[Guan-Qun], Chen, H.[He], Chen, L.[Liang], Zhuang, Y.[Yin], Zhang, S.H.[Shang-Hang], Zhang, T.[Tong], Dong, H.[Hao], Gao, P.[Peng],
P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification,
RS(15), No. 7, 2023, pp. 1773.
DOI Link 2304
BibRef

Zhang, Q.M.[Qi-Ming], Xu, Y.F.[Yu-Fei], Zhang, J.[Jing], Tao, D.C.[Da-Cheng],
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond,
IJCV(131), No. 5, May 2023, pp. 1141-1162.
Springer DOI 2305
BibRef

Fan, X.[Xinyi], Liu, H.J.[Hua-Jun],
FlexFormer: Flexible Transformer for efficient visual recognition,
PRL(169), 2023, pp. 95-101.
Elsevier DOI 2305
Vision transformer, Frequency analysis, Image classification BibRef

Cho, S.[Seokju], Hong, S.[Sunghwan], Kim, S.[Seungryong],
CATs++: Boosting Cost Aggregation With Convolutions and Transformers,
PAMI(45), No. 6, June 2023, pp. 7174-7194.
IEEE DOI
WWW Link. 2305
Costs, Transformers, Correlation, Semantics, Feature extraction, Task analysis, Cost aggregation, efficient transformer, semantic visual correspondence BibRef

Kim, B.J.[Bum Jun], Choi, H.[Hyeyeon], Jang, H.[Hyeonah], Lee, D.G.[Dong Gu], Jeong, W.[Wonseok], Kim, S.W.[Sang Woo],
Improved robustness of vision transformers via prelayernorm in patch embedding,
PR(141), 2023, pp. 109659.
Elsevier DOI 2306
Vision transformer, Patch embedding, Contrast enhancement, Robustness, Layer normalization, Convolutional neural network, Deep learning BibRef

He, Q.[Qibin], Sun, X.[Xian], Yan, Z.Y.[Zhi-Yuan], Wang, B.[Bing], Zhu, Z.[Zicong], Diao, W.H.[Wen-Hui], Yang, M.Y.[Michael Ying],
AST: Adaptive Self-supervised Transformer for optical remote sensing representation,
PandRS(200), 2023, pp. 41-54.
Elsevier DOI 2306
Cross-scale transformer, Interpretation, Masked image modeling, Optical remote sensing, Representation learning BibRef

Wang, Z.[Ziwei], Wang, C.Y.[Chang-Yuan], Xu, X.[Xiuwei], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Quantformer: Learning Extremely Low-Precision Vision Transformers,
PAMI(45), No. 7, July 2023, pp. 8813-8826.
IEEE DOI 2306
Quantization (signal), Transformers, Computational modeling, Search problems, Object detection, Image color analysis, vision transformers BibRef

Sun, S.Y.[Shu-Yang], Yue, X.Y.[Xiao-Yu], Zhao, H.S.[Heng-Shuang], Torr, P.H.S.[Philip H.S.], Bai, S.[Song],
Patch-Based Separable Transformer for Visual Recognition,
PAMI(45), No. 7, July 2023, pp. 9241-9247.
IEEE DOI 2306
Task analysis, Current transformers, Visualization, Feature extraction, Convolutional neural networks, instance segmentation BibRef

Yue, X.Y.[Xiao-Yu], Sun, S.Y.[Shu-Yang], Kuang, Z.H.[Zhang-Hui], Wei, M.[Meng], Torr, P.H.S.[Philip H.S.], Zhang, W.[Wayne], Lin, D.[Dahua],
Vision Transformer with Progressive Sampling,
ICCV21(377-386)
IEEE DOI 2203
Codes, Computational modeling, Interference, Transformers, Feature extraction, Recognition and classification, Representation learning BibRef

Zheng, F.[Fujian], Lin, S.[Shuai], Zhou, W.[Wei], Huang, H.[Hong],
A Lightweight Dual-Branch Swin Transfomrer for Remote Sensing Scene Classification,
RS(15), No. 11, 2023, pp. 2865.
DOI Link 2306
BibRef

Yu, L.[Lu], Xiang, W.[Wei], Fang, J.[Juan], Chen, Y.P.P.[Yi-Ping Phoebe], Chi, L.[Lianhua],
eX-ViT: A Novel explainable vision transformer for weakly supervised semantic segmentation,
PR(142), 2023, pp. 109666.
Elsevier DOI 2307
Explainable, Attention map, Transformer, Weakly supervised BibRef

Peng, Z.L.[Zhi-Liang], Guo, Z.H.[Zong-Hao], Huang, W.[Wei], Wang, Y.W.[Yao-Wei], Xie, L.X.[Ling-Xi], Jiao, J.B.[Jian-Bin], Tian, Q.[Qi], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Recognition and Detection,
PAMI(45), No. 8, August 2023, pp. 9454-9468.
IEEE DOI 2307
Transformers, Feature extraction, Couplings, Visualization, Detectors, Convolution, Object detection, Feature fusion, vision transformer BibRef

Peng, Z.L.[Zhi-Liang], Huang, W.[Wei], Gu, S.Z.[Shan-Zhi], Xie, L.X.[Ling-Xi], Wang, Y.[Yaowei], Jiao, J.B.[Jian-Bin], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Visual Recognition,
ICCV21(357-366)
IEEE DOI 2203
Couplings, Representation learning, Visualization, Fuses, Convolution, Object detection, Transformers, Representation learning BibRef

Feng, Z.Z.[Zhan-Zhou], Zhang, S.L.[Shi-Liang],
Efficient Vision Transformer via Token Merger,
IP(32), 2023, pp. 4156-4169.
IEEE DOI 2307
Corporate acquisitions, Transformers, Semantics, Task analysis, Visualization, Merging, Computational efficiency, sparese representation BibRef

Yang, J.H.[Jia-Hao], Li, X.Y.[Xiang-Yang], Zheng, M.[Mao], Wang, Z.[Zihan], Zhu, Y.Q.[Yong-Qing], Guo, X.Q.[Xiao-Qian], Yuan, Y.C.[Yu-Chen], Chai, Z.[Zifeng], Jiang, S.Q.[Shu-Qiang],
MemBridge: Video-Language Pre-Training With Memory-Augmented Inter-Modality Bridge,
IP(32), 2023, pp. 4073-4087.
IEEE DOI 2307

WWW Link. Bridges, Transformers, Computer architecture, Task analysis, Visualization, Feature extraction, Memory modules, memory module BibRef

Wang, D.L.[Duo-Lin], Chen, Y.[Yadang], Naz, B.[Bushra], Sun, L.[Le], Li, B.Z.[Bao-Zhu],
Spatial-Aware Transformer (SAT): Enhancing Global Modeling in Transformer Segmentation for Remote Sensing Images,
RS(15), No. 14, 2023, pp. 3607.
DOI Link 2307
BibRef

Huang, X.Y.[Xin-Yan], Liu, F.[Fang], Cui, Y.H.[Yuan-Hao], Chen, P.[Puhua], Li, L.L.[Ling-Ling], Li, P.F.[Peng-Fang],
Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification,
RS(15), No. 14, 2023, pp. 3645.
DOI Link 2307
BibRef

Yao, T.[Ting], Li, Y.[Yehao], Pan, Y.W.[Ying-Wei], Wang, Y.[Yu], Zhang, X.P.[Xiao-Ping], Mei, T.[Tao],
Dual Vision Transformer,
PAMI(45), No. 9, September 2023, pp. 10870-10882.
IEEE DOI 2309
Survey, Vision Transformer. BibRef

Rao, Y.M.[Yong-Ming], Liu, Z.[Zuyan], Zhao, W.L.[Wen-Liang], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks,
PAMI(45), No. 9, September 2023, pp. 10883-10897.
IEEE DOI 2309
BibRef

Li, J.[Jie], Liu, Z.[Zhao], Li, L.[Li], Lin, J.Q.[Jun-Qin], Yao, J.[Jian], Tu, J.[Jingmin],
Multi-view convolutional vision transformer for 3D object recognition,
JVCIR(95), 2023, pp. 103906.
Elsevier DOI 2309
Multi-view, 3D object recognition, Feature fusion, Convolutional neural networks BibRef

Wu, G.[Gaojie], Zheng, W.S.[Wei-Shi], Lu, Y.T.[Yu-Tong], Tian, Q.[Qi],
PSLT: A Light-Weight Vision Transformer With Ladder Self-Attention and Progressive Shift,
PAMI(45), No. 9, September 2023, pp. 11120-11135.
IEEE DOI 2309
BibRef

Shang, J.H.[Jing-Huan], Li, X.[Xiang], Kahatapitiya, K.[Kumara], Lee, Y.C.[Yu-Cheol], Ryoo, M.S.[Michael S.],
StARformer: Transformer With State-Action-Reward Representations for Robot Learning,
PAMI(45), No. 11, November 2023, pp. 12862-12877.
IEEE DOI 2310
BibRef
Earlier: A1, A3, A2, A5, Only:
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning,
ECCV22(XXIX:462-479).
Springer DOI 2211
BibRef

Duan, H.R.[Hao-Ran], Long, Y.[Yang], Wang, S.D.[Shi-Dong], Zhang, H.F.[Hao-Feng], Willcocks, C.G.[Chris G.], Shao, L.[Ling],
Dynamic Unary Convolution in Transformers,
PAMI(45), No. 11, November 2023, pp. 12747-12759.
IEEE DOI 2310
BibRef

Chen, S.M.[Shi-Ming], Hong, Z.M.[Zi-Ming], Hou, W.J.[Wen-Jin], Xie, G.S.[Guo-Sen], Song, Y.B.[Yi-Bing], Zhao, J.[Jian], You, X.G.[Xin-Ge], Yan, S.C.[Shui-Cheng], Shao, L.[Ling],
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning,
PAMI(45), No. 11, November 2023, pp. 12844-12861.
IEEE DOI 2310
BibRef

Qian, S.J.[Sheng-Ju], Zhu, Y.[Yi], Li, W.[Wenbo], Li, M.[Mu], Jia, J.Y.[Jia-Ya],
What Makes for Good Tokenizers in Vision Transformer?,
PAMI(45), No. 11, November 2023, pp. 13011-13023.
IEEE DOI 2310
BibRef

Sun, W.X.[Wei-Xuan], Qin, Z.[Zhen], Deng, H.[Hui], Wang, J.[Jianyuan], Zhang, Y.[Yi], Zhang, K.[Kaihao], Barnes, N.[Nick], Birchfield, S.[Stan], Kong, L.P.[Ling-Peng], Zhong, Y.[Yiran],
Vicinity Vision Transformer,
PAMI(45), No. 10, October 2023, pp. 12635-12649.
IEEE DOI 2310
BibRef

Cao, C.J.[Chen-Jie], Dong, Q.[Qiaole], Fu, Y.W.[Yan-Wei],
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors,
PAMI(45), No. 10, October 2023, pp. 12667-12684.
IEEE DOI 2310
BibRef

Fang, Y.X.[Yu-Xin], Wang, X.G.[Xing-Gang], Wu, R.[Rui], Liu, W.Y.[Wen-Yu],
What Makes for Hierarchical Vision Transformer?,
PAMI(45), No. 10, October 2023, pp. 12714-12720.
IEEE DOI 2310
BibRef

Xu, P.[Peng], Zhu, X.T.[Xia-Tian], Clifton, D.A.[David A.],
Multimodal Learning With Transformers: A Survey,
PAMI(45), No. 10, October 2023, pp. 12113-12132.
IEEE DOI 2310
BibRef

Li, K.C.[Kun-Chang], Wang, Y.[Yali], Zhang, J.[Junhao], Gao, P.[Peng], Song, G.[Guanglu], Liu, Y.[Yu], Li, H.S.[Hong-Sheng], Qiao, Y.[Yu],
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition,
PAMI(45), No. 10, October 2023, pp. 12581-12600.
IEEE DOI 2310
Unify CNN and Transformers BibRef

Liu, J.[Jun], Guo, H.R.[Hao-Ran], He, Y.[Yile], Li, H.L.[Hua-Li],
Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification,
RS(15), No. 21, 2023, pp. 5208.
DOI Link 2311
BibRef

Lin, M.B.[Ming-Bao], Chen, M.Z.[Meng-Zhao], Zhang, Y.X.[Yu-Xin], Shen, C.H.[Chun-Hua], Ji, R.R.[Rong-Rong], Cao, L.J.[Liu-Juan],
Super Vision Transformer,
IJCV(131), No. 12, December 2023, pp. 3136-3151.
Springer DOI 2311
BibRef

Li, H.L.[Hao-Ling], Xue, M.Q.[Meng-Qi], Song, J.[Jie], Zhang, H.F.[Hao-Fei], Huang, W.Q.[Wen-Qi], Liang, L.[Lingyu], Song, M.L.[Ming-Li],
Constituent Attention for Vision Transformers,
CVIU(237), 2023, pp. 103838.
Elsevier DOI Code:
WWW Link. 2311
Vision Transformer, Attention mechanism, Classification, Interpretability for deep learning BibRef

Li, Z.Y.[Zhong-Yu], Gao, S.[Shanghua], Cheng, M.M.[Ming-Ming],
SERE: Exploring Feature Self-Relation for Self-Supervised Transformer,
PAMI(45), No. 12, December 2023, pp. 15619-15631.
IEEE DOI 2311
BibRef


Sajjadi, M.S.M.[Mehdi S. M.], Mahendran, A.[Aravindh], Kipf, T.[Thomas], Pot, E.[Etienne], Duckworth, D.[Daniel], Lucic, M.[Mario], Greff, K.[Klaus],
RUST: Latent Neural Scene Representations from Unposed Imagery,
CVPR23(17297-17306)
IEEE DOI 2309
BibRef

Ling, Z.X.[Zhi-Xin], Xing, Z.[Zhen], Zhou, X.D.[Xiang-Dong], Cao, M.L.[Man-Liang], Zhou, G.C.[Gui-Chun],
PanoSwin: a Pano-style Swin Transformer for Panorama Understanding,
CVPR23(17755-17764)
IEEE DOI 2309
BibRef

Bowman, B.[Benjamin], Achille, A.[Alessandro], Zancato, L.[Luca], Trager, M.[Matthew], Perera, P.[Pramuditha], Paolini, G.[Giovanni], Soatto, S.[Stefano],
Ŕ-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting,
CVPR23(14984-14993)
IEEE DOI 2309
BibRef

Nakhli, R.[Ramin], Moghadam, P.A.[Puria Azadi], Mi, H.Y.[Hao-Yang], Farahani, H.[Hossein], Baras, A.[Alexander], Gilks, B.[Blake], Bashashati, A.[Ali],
Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images,
CVPR23(11547-11557)
IEEE DOI 2309
BibRef

Gärtner, E.[Erik], Metz, L.[Luke], Andriluka, M.[Mykhaylo], Freeman, C.D.[C. Daniel], Sminchisescu, C.[Cristian],
Transformer-Based Learned Optimization,
CVPR23(11970-11979)
IEEE DOI 2309
BibRef

Ding, M.Y.[Ming-Yu], Shen, Y.[Yikang], Fan, L.J.[Li-Jie], Chen, Z.F.[Zhen-Fang], Chen, Z.[Zitian], Luo, P.[Ping], Tenenbaum, J.[Josh], Gan, C.[Chuang],
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention,
CVPR23(14528-14539)
IEEE DOI 2309
BibRef

Song, J.C.[Jie-Chong], Mou, C.[Chong], Wang, S.Q.[Shi-Qi], Ma, S.W.[Si-Wei], Zhang, J.[Jian],
Optimization-Inspired Cross-Attention Transformer for Compressive Sensing,
CVPR23(6174-6184)
IEEE DOI 2309
BibRef

Li, J.C.[Jia-Chen], Hassani, A.[Ali], Walton, S.[Steven], Shi, H.[Humphrey],
ConvMLP: Hierarchical Convolutional MLPs for Vision,
WFM23(6307-6316)
IEEE DOI 2309
multi-layer perceptron BibRef

Hassani, A.[Ali], Walton, S.[Steven], Li, J.C.[Jia-Chen], Li, S.[Shen], Shi, H.[Humphrey],
Neighborhood Attention Transformer,
CVPR23(6185-6194)
IEEE DOI 2309
BibRef

Walmer, M.[Matthew], Suri, S.[Saksham], Gupta, K.[Kamal], Shrivastava, A.[Abhinav],
Teaching Matters: Investigating the Role of Supervision in Vision Transformers,
CVPR23(7486-7496)
IEEE DOI 2309
BibRef

Wang, S.G.[Shi-Guang], Xie, T.[Tao], Cheng, J.[Jian], Zhang, X.C.[Xing-Cheng], Liu, H.J.[Hai-Jun],
MDL-NAS: A Joint Multi-domain Learning Framework for Vision Transformer,
CVPR23(20094-20104)
IEEE DOI 2309
BibRef

Ko, D.[Dohwan], Choi, J.[Joonmyung], Choi, H.K.[Hyeong Kyu], On, K.W.[Kyoung-Woon], Roh, B.[Byungseok], Kim, H.W.J.[Hyun-Woo J.],
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models,
CVPR23(20105-20115)
IEEE DOI 2309
BibRef

Ren, S.[Sucheng], Wei, F.Y.[Fang-Yun], Zhang, Z.[Zheng], Hu, H.[Han],
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models,
CVPR23(3687-3697)
IEEE DOI 2309
BibRef

He, J.F.[Jian-Feng], Gao, Y.[Yuan], Zhang, T.Z.[Tian-Zhu], Zhang, Z.[Zhe], Wu, F.[Feng],
D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers,
CVPR23(2904-2914)
IEEE DOI 2309
BibRef

Liu, Z.J.[Zhi-Jian], Yang, X.Y.[Xin-Yu], Tang, H.T.[Hao-Tian], Yang, S.[Shang], Han, S.[Song],
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer,
CVPR23(1200-1211)
IEEE DOI 2309
BibRef

Chen, X.[Xuanyao], Liu, Z.J.[Zhi-Jian], Tang, H.T.[Hao-Tian], Yi, L.[Li], Zhao, H.[Hang], Han, S.[Song],
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer,
CVPR23(2061-2070)
IEEE DOI 2309
BibRef

Pan, X.[Xuran], Ye, T.Z.[Tian-Zhu], Xia, Z.[Zhuofan], Song, S.[Shiji], Huang, G.[Gao],
Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention,
CVPR23(2082-2091)
IEEE DOI 2309
BibRef

Wei, S.Y.[Si-Yuan], Ye, T.Z.[Tian-Zhu], Zhang, S.[Shen], Tang, Y.[Yao], Liang, J.J.[Jia-Jun],
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers,
CVPR23(2092-2101)
IEEE DOI 2309
BibRef

Lin, Y.B.[Yan-Bo], Sung, Y.L.[Yi-Lin], Lei, J.[Jie], Bansal, M.[Mohit], Bertasius, G.[Gedas],
Vision Transformers are Parameter-Efficient Audio-Visual Learners,
CVPR23(2299-2309)
IEEE DOI 2309
BibRef

Das, R.[Rajshekhar], Dukler, Y.[Yonatan], Ravichandran, A.[Avinash], Swaminathan, A.[Ashwin],
Learning Expressive Prompting With Residuals for Vision Transformers,
CVPR23(3366-3377)
IEEE DOI 2309
BibRef

Zheng, M.X.[Meng-Xin], Lou, Q.[Qian], Jiang, L.[Lei],
TrojViT: Trojan Insertion in Vision Transformers,
CVPR23(4025-4034)
IEEE DOI 2309
BibRef

Guo, Y.[Yong], Stutz, D.[David], Schiele, B.[Bernt],
Improving Robustness of Vision Transformers by Reducing Sensitivity to Patch Corruptions,
CVPR23(4108-4118)
IEEE DOI 2309
BibRef

Liu, J.[Jihao], Huang, X.[Xin], Zheng, J.L.[Jin-Liang], Liu, Y.[Yu], Li, H.S.[Hong-Sheng],
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers,
CVPR23(6252-6261)
IEEE DOI 2309
BibRef

Li, Y.X.[Yan-Xi], Xu, C.[Chang],
Trade-off between Robustness and Accuracy of Vision Transformers,
CVPR23(7558-7568)
IEEE DOI 2309
BibRef

Zhu, L.[Lei], Wang, X.J.[Xin-Jiang], Ke, Z.[Zhanghan], Zhang, W.[Wayne], Lau, R.[Rynson],
BiFormer: Vision Transformer with Bi-Level Routing Attention,
CVPR23(10323-10333)
IEEE DOI 2309
BibRef

Long, S.[Sifan], Zhao, Z.[Zhen], Pi, J.[Jimin], Wang, S.S.[Sheng-Sheng], Wang, J.D.[Jing-Dong],
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers,
CVPR23(10334-10343)
IEEE DOI 2309
BibRef

Tarasiou, M.[Michail], Chavez, E.[Erik], Zafeiriou, S.[Stefanos],
ViTs for SITS: Vision Transformers for Satellite Image Time Series,
CVPR23(10418-10428)
IEEE DOI 2309
BibRef

Yu, Z.Z.[Zhong-Zhi], Wu, S.[Shang], Fu, Y.G.[Yong-Gan], Zhang, S.[Shunyao], Lin, Y.Y.C.[Ying-Yan Celine],
Hint-Aug: Drawing Hints from Foundation Vision Transformers towards Boosted Few-shot Parameter-Efficient Tuning,
CVPR23(11102-11112)
IEEE DOI 2309
BibRef

Kim, D.[Dahun], Angelova, A.[Anelia], Kuo, W.C.[Wei-Cheng],
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers,
CVPR23(11144-11154)
IEEE DOI 2309
BibRef

Hou, J.[Ji], Dai, X.L.[Xiao-Liang], He, Z.J.[Zi-Jian], Dai, A.[Angela], Nießner, M.[Matthias],
Mask3D: Pretraining 2D Vision Transformers by Learning Masked 3D Priors,
CVPR23(13510-13519)
IEEE DOI 2309
BibRef

Liu, X.Y.[Xin-Yu], Peng, H.[Houwen], Zheng, N.X.[Ning-Xin], Yang, Y.Q.[Yu-Qing], Hu, H.[Han], Yuan, Y.X.[Yi-Xuan],
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention,
CVPR23(14420-14430)
IEEE DOI 2309
BibRef

You, H.R.[Hao-Ran], Xiong, Y.[Yunyang], Dai, X.L.[Xiao-Liang], Wu, B.[Bichen], Zhang, P.Z.[Pei-Zhao], Fan, H.Q.[Hao-Qi], Vajda, P.[Peter], Lin, Y.Y.C.[Ying-Yan Celine],
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference,
CVPR23(14431-14442)
IEEE DOI 2309
BibRef

Xu, Z.Z.[Zheng-Zhuo], Liu, R.[Ruikang], Yang, S.[Shuo], Chai, Z.[Zenghao], Yuan, C.[Chun],
Learning Imbalanced Data with Vision Transformers,
CVPR23(15793-15803)
IEEE DOI 2309
BibRef

Zhang, J.P.[Jian-Ping], Huang, Y.Z.[Yi-Zhan], Wu, W.B.[Wei-Bin], Lyu, M.R.[Michael R.],
Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization,
CVPR23(16415-16424)
IEEE DOI 2309
BibRef

Yang, H.[Huanrui], Yin, H.X.[Hong-Xu], Shen, M.[Maying], Molchanov, P.[Pavlo], Li, H.[Hai], Kautz, J.[Jan],
Global Vision Transformer Pruning with Hessian-Aware Saliency,
CVPR23(18547-18557)
IEEE DOI 2309
BibRef

Grainger, R.[Ryan], Paniagua, T.[Thomas], Song, X.[Xi], Cuntoor, N.[Naresh], Lee, M.W.[Mun Wai], Wu, T.F.[Tian-Fu],
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers,
CVPR23(18568-18578)
IEEE DOI 2309
BibRef

Takashima, S.[Sora], Hayamizu, R.[Ryo], Inoue, N.[Nakamasa], Kataoka, H.[Hirokatsu], Yokota, R.[Rio],
Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves,
CVPR23(18579-18588)
IEEE DOI 2309
BibRef

Kang, D.[Dahyun], Koniusz, P.[Piotr], Cho, M.[Minsu], Murray, N.[Naila],
Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification and Segmentation,
CVPR23(19627-19638)
IEEE DOI 2309
BibRef

Liu, Y.J.[Yi-Jiang], Yang, H.[Huanrui], Dong, Z.[Zhen], Keutzer, K.[Kurt], Du, L.[Li], Zhang, S.[Shanghang],
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers,
CVPR23(20321-20330)
IEEE DOI 2309
BibRef

Park, J.[Jeongsoo], Johnson, J.[Justin],
RGB No More: Minimally-Decoded JPEG Vision Transformers,
CVPR23(22334-22346)
IEEE DOI 2309
BibRef

Yu, C.[Chong], Chen, T.[Tao], Gan, Z.X.[Zhong-Xue], Fan, J.Y.[Jia-Yuan],
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization,
CVPR23(22658-22668)
IEEE DOI 2309
BibRef

Bao, F.[Fan], Nie, S.[Shen], Xue, K.[Kaiwen], Cao, Y.[Yue], Li, C.X.[Chong-Xuan], Su, H.[Hang], Zhu, J.[Jun],
All are Worth Words: A ViT Backbone for Diffusion Models,
CVPR23(22669-22679)
IEEE DOI 2309
BibRef

Wei, C.[Cong], Duke, B.[Brendan], Jiang, R.[Ruowei], Aarabi, P.[Parham], Taylor, G.W.[Graham W.], Shkurti, F.[Florian],
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers,
CVPR23(22680-22689)
IEEE DOI 2309
BibRef

Li, B.[Bonan], Hu, Y.[Yinhan], Nie, X.C.[Xue-Cheng], Han, C.Y.[Cong-Ying], Jiang, X.J.[Xiang-Jian], Guo, T.D.[Tian-De], Liu, L.Q.[Luo-Qi],
DropKey for Vision Transformer,
CVPR23(22700-22709)
IEEE DOI 2309
BibRef

Lan, S.Y.[Shi-Yi], Yang, X.[Xitong], Yu, Z.[Zhiding], Wu, Z.[Zuxuan], Alvarez, J.M.[Jose M.], Anandkumar, A.[Anima],
Vision Transformers are Good Mask Auto-Labelers,
CVPR23(23745-23755)
IEEE DOI 2309
BibRef

Yu, L.[Lu], Xiang, W.[Wei],
X-Pruner: eXplainable Pruning for Vision Transformers,
CVPR23(24355-24363)
IEEE DOI 2309
BibRef

Singh, A.[Apoorv],
Training Strategies for Vision Transformers for Object Detection,
WAD23(110-118)
IEEE DOI 2309
BibRef

Hukkelĺs, H.[Hĺkon], Lindseth, F.[Frank],
Does Image Anonymization Impact Computer Vision Training?,
WAD23(140-150)
IEEE DOI 2309
BibRef

Marnissi, M.A.[Mohamed Amine], Fathallah, A.[Abir],
GAN-based Vision Transformer for High-Quality Thermal Image Enhancement,
GCV23(817-825)
IEEE DOI 2309
BibRef

Scheibenreif, L.[Linus], Mommert, M.[Michael], Borth, D.[Damian],
Masked Vision Transformers for Hyperspectral Image Classification,
EarthVision23(2166-2176)
IEEE DOI 2309
BibRef

Komorowski, P.[Piotr], Baniecki, H.[Hubert], Biecek, P.[Przemyslaw],
Towards Evaluating Explanations of Vision Transformers for Medical Imaging,
XAI4CV23(3726-3732)
IEEE DOI 2309
BibRef

Nalmpantis, A.[Angelos], Panagiotopoulos, A.[Apostolos], Gkountouras, J.[John], Papakostas, K.[Konstantinos], Aziz, W.[Wilker],
Vision DiffMask: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking,
XAI4CV23(3756-3763)
IEEE DOI 2309
BibRef

Ronen, T.[Tomer], Levy, O.[Omer], Golbert, A.[Avram],
Vision Transformers with Mixed-Resolution Tokenization,
ECV23(4613-4622)
IEEE DOI 2309
BibRef

Le, P.H.C.[Phuoc-Hoan Charles], Li, X.[Xinlin],
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models,
ECV23(4665-4674)
IEEE DOI 2309
BibRef

Bhattacharyya, M.[Mayukh], Chattopadhyay, S.[Soumitri], Nag, S.[Sayan],
DeCAtt: Efficient Vision Transformers with Decorrelated Attention Heads,
ECV23(4695-4699)
IEEE DOI 2309
BibRef

Ma, D.[Dongning], Zhao, P.F.[Peng-Fei], Jiao, X.[Xun],
PerfHD: Efficient ViT Architecture Performance Ranking using Hyperdimensional Computing,
NAS23(2230-2237)
IEEE DOI 2309
BibRef

Wang, J.[Jun], Alamayreh, O.[Omran], Tondi, B.[Benedetta], Barni, M.[Mauro],
Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture,
WMF23(953-962)
IEEE DOI 2309
BibRef

Tian, R.[Rui], Wu, Z.[Zuxuan], Dai, Q.[Qi], Hu, H.[Han], Qiao, Y.[Yu], Jiang, Y.G.[Yu-Gang],
ResFormer: Scaling ViTs with Multi-Resolution Training,
CVPR23(22721-22731)
IEEE DOI 2309
BibRef

Li, Y.[Yi], Min, K.[Kyle], Tripathi, S.[Subarna], Vasconcelos, N.M.[Nuno M.],
SViTT: Temporal Learning of Sparse Video-Text Transformers,
CVPR23(18919-18929)
IEEE DOI 2309
BibRef

Beyer, L.[Lucas], Izmailov, P.[Pavel], Kolesnikov, A.[Alexander], Caron, M.[Mathilde], Kornblith, S.[Simon], Zhai, X.H.[Xiao-Hua], Minderer, M.[Matthias], Tschannen, M.[Michael], Alabdulmohsin, I.[Ibrahim], Pavetic, F.[Filip],
FlexiViT: One Model for All Patch Sizes,
CVPR23(14496-14506)
IEEE DOI 2309
BibRef

Chang, S.N.[Shu-Ning], Wang, P.[Pichao], Lin, M.[Ming], Wang, F.[Fan], Zhang, D.J.[David Junhao], Jin, R.[Rong], Shou, M.Z.[Mike Zheng],
Making Vision Transformers Efficient from A Token Sparsification View,
CVPR23(6195-6205)
IEEE DOI 2309
BibRef

Naeem, M.F.[Muhammad Ferjad], Khan, M.G.Z.A.[Muhammad Gul Zain Ali], Xian, Y.Q.[Yong-Qin], Afzal, M.Z.[Muhammad Zeshan], Stricker, D.[Didier], Van Gool, L.J.[Luc J.], Tombari, F.[Federico],
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification,
CVPR23(15169-15179)
IEEE DOI 2309
BibRef

Tatsunami, Y.[Yuki], Taki, M.[Masato],
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?,
ACCV22(VI:459-475).
Springer DOI 2307

WWW Link. Address computational comlexity. BibRef

Phan, L.[Lam], Nguyen, H.T.H.[Hiep Thi Hong], Warrier, H.[Harikrishna], Gupta, Y.[Yogesh],
Patch Embedding as Local Features: Unifying Deep Local and Global Features via Vision Transformer for Image Retrieval,
ACCV22(II:204-221).
Springer DOI 2307
BibRef

Guo, X.D.[Xin-Dong], Sun, Y.[Yu], Zhao, R.[Rong], Kuang, L.Q.[Li-Qun], Han, X.[Xie],
SWPT: Spherical Window-based Point Cloud Transformer,
ACCV22(I:396-412).
Springer DOI 2307
BibRef

Wang, W.J.[Wen-Ju], Chen, G.[Gang], Zhou, H.R.[Hao-Ran], Wang, X.L.[Xiao-Lin],
OVPT: Optimal Viewset Pooling Transformer for 3d Object Recognition,
ACCV22(I:486-503).
Springer DOI 2307
BibRef

Kim, D.[Daeho], Kim, J.[Jaeil],
Vision Transformer Compression and Architecture Exploration with Efficient Embedding Space Search,
ACCV22(III:524-540).
Springer DOI 2307
BibRef

Bolya, D.[Daniel], Fu, C.Y.[Cheng-Yang], Dai, X.L.[Xiao-Liang], Zhang, P.Z.[Pei-Zhao], Hoffman, J.[Judy],
Hydra Attention: Efficient Attention with Many Heads,
CADK22(35-49).
Springer DOI 2304
Transformers computation explodes with large images. Multiple heads. BibRef

Lee, Y.S.[Yun-Sung], Lee, G.[Gyuseong], Ryoo, K.[Kwangrok], Go, H.[Hyojun], Park, J.[Jihye], Kim, S.[Seungryong],
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling,
VIPriors22(706-720).
Springer DOI 2304
Transformers vs. CNN different benefits. Best of both. BibRef

Amir, S.[Shir], Gandelsman, Y.[Yossi], Bagon, S.[Shai], Dekel, T.[Tali],
On the Effectiveness of VIT Features as Local Semantic Descriptors,
SelfLearn22(39-55).
Springer DOI 2304
BibRef

Deng, X.[Xuran], Liu, C.B.[Chuan-Bin], Lu, Z.[Zhiying],
Recombining Vision Transformer Architecture for Fine-grained Visual Categorization,
MMMod23(II: 127-138).
Springer DOI 2304
BibRef

Tonkes, V.[Vincent], Sabatelli, M.[Matthia],
How Well Do Vision Transformers (vts) Transfer to the Non-natural Image Domain? An Empirical Study Involving Art Classification,
VisArt22(234-250).
Springer DOI 2304
BibRef

Li, B.C.[Bing-Chen], Li, X.[Xin], Lu, Y.T.[Yi-Ting], Liu, S.[Sen], Feng, R.[Ruoyu], Chen, Z.B.[Zhi-Bo],
HST: Hierarchical Swin Transformer for Compressed Image Super-resolution,
AIM22(651-668).
Springer DOI 2304
BibRef

Conde, M.V.[Marcos V.], Choi, U.J.[Ui-Jin], Burchi, M.[Maxime], Timofte, R.[Radu],
Swin2sr: Swinv2 Transformer for Compressed Image Super-resolution and Restoration,
AIM22(669-687).
Springer DOI 2304
BibRef

Rangrej, S.B.[Samrudhdhi B], Liang, K.J.[Kevin J], Hassner, T.[Tal], Clark, J.J.[James J],
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction,
WACV23(3402-3412)
IEEE DOI 2302
Predictive models, Transformers, Cameras, Spatiotemporal phenomena, Sensors, Observability BibRef

Mo, S.T.[Shen-Tong], Sun, Z.[Zhun], Li, C.[Chao],
Multi-level Contrastive Learning for Self-Supervised Vision Transformers,
WACV23(2777-2786)
IEEE DOI 2302
Training, Representation learning, Head, Semantic segmentation, Self-supervised learning, visual reasoning BibRef

Yun, J.[Jooyeol], Lee, S.[Sanghyeon], Park, M.H.[Min-Ho], Choo, J.[Jaegul],
iColoriT: Towards Propagating Local Hints to the Right Region in Interactive Colorization by Leveraging Vision Transformer,
WACV23(1787-1796)
IEEE DOI 2302
Convolutional codes, Image color analysis, Stacking, Gray-scale, Transformers, Algorithms: Computational photography, image and video synthesis BibRef

Liu, Y.[Yue], Matsoukas, C.[Christos], Strand, F.[Fredrik], Azizpour, H.[Hossein], Smith, K.[Kevin],
PatchDropout: Economizing Vision Transformers Using Patch Dropout,
WACV23(3942-3951)
IEEE DOI 2302
Training, Image resolution, Computational modeling, Biological system modeling, Memory management, Transformers, Biomedical/healthcare/medicine BibRef

Chen, X.Y.[Xiang-Yu], Hu, Q.[Qinghao], Li, K.[Kaidong], Zhong, C.[Cuncong], Wang, G.H.[Guang-Hui],
Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets,
WACV23(3973-3981)
IEEE DOI 2302
Codes, Focusing, Transformers, Convolutional neural networks, Task analysis, Algorithms: Machine learning architectures, and algorithms (including transfer) BibRef

Lan, H.[Hai], Wang, X.[Xihao], Shen, H.[Hao], Liang, P.[Peidong], Wei, X.[Xian],
Couplformer: Rethinking Vision Transformer with Coupling Attention,
WACV23(6464-6473)
IEEE DOI 2302
Couplings, Visualization, Image segmentation, Computational modeling, Memory management, Object detection, Visualization BibRef

Marin, D.[Dmitrii], Chang, J.H.R.[Jen-Hao Rick], Ranjan, A.[Anurag], Prabhu, A.[Anish], Rastegari, M.[Mohammad], Tuzel, O.[Oncel],
Token Pooling in Vision Transformers for Image Classification,
WACV23(12-21)
IEEE DOI 2302
Filtering, Semantic segmentation, Pose estimation, Transformers, Encoding, Convolutional neural networks, and algorithms (including transfer) BibRef

Song, C.H.[Chull Hwan], Yoon, J.Y.[Joo-Young], Choi, S.[Shunghyun], Avrithis, Y.[Yannis],
Boosting vision transformers for image retrieval,
WACV23(107-117)
IEEE DOI 2302
Training, Location awareness, Image retrieval, Self-supervised learning, Image representation, Transformers BibRef

Yang, J.[Jinyu], Liu, J.J.[Jing-Jing], Xu, N.[Ning], Huang, J.Z.[Jun-Zhou],
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation,
WACV23(520-530)
IEEE DOI 2302
Benchmark testing, Image representation, Transformers, Convolutional neural networks, Task analysis, and algorithms (including transfer) BibRef

Lin, K.E.[Kai-En], Yen-Chen, L.[Lin], Lai, W.S.[Wei-Sheng], Lin, T.Y.[Tsung-Yi], Shih, Y.C.[Yi-Chang], Ramamoorthi, R.[Ravi],
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image,
WACV23(806-815)
IEEE DOI 2302
Shape, Pose estimation, Feature extraction, Transformers, Cameras, Algorithms: Computational photography, 3D computer vision BibRef

Saavedra-Ruiz, M.[Miguel], Morin, S.[Sacha], Paull, L.[Liam],
Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers,
CRV22(197-204)
IEEE DOI 2301
Adaptation models, Image segmentation, Image resolution, Navigation, Transformers, Robot sensing systems, Visual Servoing BibRef

Debnath, B.[Biplob], Po, O.[Oliver], Chowdhury, F.A.[Farhan Asif], Chakradhar, S.[Srimat],
Cosine Similarity based Few-Shot Video Classifier with Attention-based Aggregation,
ICPR22(1273-1279)
IEEE DOI 2212
Training, Head, Pipelines, Benchmark testing, Feature extraction, Transformers BibRef

Patel, K.[Krushi], Bur, A.M.[Andrés M.], Li, F.J.[Feng-Jun], Wang, G.H.[Guang-Hui],
Aggregating Global Features into Local Vision Transformer,
ICPR22(1141-1147)
IEEE DOI 2212
Source coding, Computational modeling, Information processing, Performance gain, Transformers BibRef

Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zechun], Xing, E.[Eric],
Sliced Recursive Transformer,
ECCV22(XXIV:727-744).
Springer DOI 2211
BibRef

Shao, Y.[Yidi], Loy, C.C.[Chen Change], Dai, B.[Bo],
Transformer with Implicit Edges for Particle-Based Physics Simulation,
ECCV22(XIX:549-564).
Springer DOI 2211
BibRef

Wang, W.[Wen], Zhang, J.[Jing], Cao, Y.[Yang], Shen, Y.L.[Yong-Liang], Tao, D.C.[Da-Cheng],
Towards Data-Efficient Detection Transformers,
ECCV22(IX:88-105).
Springer DOI 2211
BibRef

Mari, C.R.[Carlos Roig], Gonzalez, D.V.[David Varas], Bou-Balust, E.[Elisenda],
Multi-Scale Transformer-Based Feature Combination for Image Retrieval,
ICIP22(3166-3170)
IEEE DOI 2211
Visualization, Semantics, Image retrieval, Feature extraction, Transformers, Internet, Image retrieval, Attention, Multi-scale, Feature combination BibRef

Lorenzana, M.B.[Marlon Bran], Engstrom, C.[Craig], Chandra, S.S.[Shekhar S.],
Transformer Compressed Sensing Via Global Image Tokens,
ICIP22(3011-3015)
IEEE DOI 2211
Training, Limiting, Image resolution, Neural networks, Image representation, Transformers, MRI BibRef

Furukawa, R.[Ryouichi], Hotta, K.[Kazuhiro],
Local Embedding for Axial Attention,
ICIP22(2586-2590)
IEEE DOI 2211
Deep learning, Image segmentation, Visualization, Computational modeling, Neural networks, Transformers. BibRef

Lu, X.Y.[Xiao-Yong], Du, S.[Songlin],
NCTR: Neighborhood Consensus Transformer for Feature Matching,
ICIP22(2726-2730)
IEEE DOI 2211
Learning systems, Impedance matching, Aggregates, Pose estimation, Neural networks, Transformers, Local feature matching, graph neural network BibRef

Jeny, A.A.[Afsana Ahsan], Junayed, M.S.[Masum Shah], Islam, M.B.[Md Baharul],
An Efficient End-To-End Image Compression Transformer,
ICIP22(1786-1790)
IEEE DOI 2211
Image coding, Correlation, Limiting, Computational modeling, Rate-distortion, Video compression, Transformers, entropy model BibRef

Kakogeorgiou, I.[Ioannis], Gidaris, S.[Spyros], Psomas, B.[Bill], Avrithis, Y.[Yannis], Bursuc, A.[Andrei], Karantzalos, K.[Konstantinos], Komodakis, N.[Nikos],
What to Hide from Your Students: Attention-Guided Masked Image Modeling,
ECCV22(XXX:300-318).
Springer DOI 2211

WWW Link. BibRef

Bai, J.W.[Jia-Wang], Yuan, L.[Li], Xia, S.T.[Shu-Tao], Yan, S.C.[Shui-Cheng], Li, Z.F.[Zhi-Feng], Liu, W.[Wei],
Improving Vision Transformers by Revisiting High-Frequency Components,
ECCV22(XXIV:1-18).
Springer DOI 2211
BibRef

Ding, M.Y.[Ming-Yu], Xiao, B.[Bin], Codella, N.[Noel], Luo, P.[Ping], Wang, J.D.[Jing-Dong], Yuan, L.[Lu],
DaViT: Dual Attention Vision Transformers,
ECCV22(XXIV:74-92).
Springer DOI 2211
BibRef

Li, K.[Kehan], Yu, R.[Runyi], Wang, Z.[Zhennan], Yuan, L.[Li], Song, G.[Guoli], Chen, J.[Jie],
Locality Guidance for Improving Vision Transformers on Tiny Datasets,
ECCV22(XXIV:110-127).
Springer DOI 2211
BibRef

Wang, P.C.[Pi-Chao], Wang, X.[Xue], Wang, F.[Fan], Lin, M.[Ming], Chang, S.N.[Shu-Ning], Li, H.[Hao], Jin, R.[Rong],
KVT: k-NN Attention for Boosting Vision Transformers,
ECCV22(XXIV:285-302).
Springer DOI 2211
BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.C.[Alan C.], Li, Y.[Yinxiao],
MaxViT: Multi-axis Vision Transformer,
ECCV22(XXIV:459-479).
Springer DOI 2211
BibRef

Yang, R.[Rui], Ma, H.L.[Hai-Long], Wu, J.[Jie], Tang, Y.S.[Yan-Song], Xiao, X.F.[Xue-Feng], Zheng, M.[Min], Li, X.[Xiu],
ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer,
ECCV22(XXIV:480-496).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], El-Nouby, A.[Alaaeldin], Verbeek, J.[Jakob], Jégou, H.[Hervé],
Three Things Everyone Should Know About Vision Transformers,
ECCV22(XXIV:497-515).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Jégou, H.[Hervé],
DeiT III: Revenge of the ViT,
ECCV22(XXIV:516-533).
Springer DOI 2211
BibRef

Li, Y.H.[Yang-Hao], Mao, H.Z.[Han-Zi], Girshick, R.[Ross], He, K.M.[Kai-Ming],
Exploring Plain Vision Transformer Backbones for Object Detection,
ECCV22(IX:280-296).
Springer DOI 2211
BibRef

Yu, Q.H.[Qi-Hang], Wang, H.Y.[Hui-Yu], Qiao, S.Y.[Si-Yuan], Collins, M.[Maxwell], Zhu, Y.K.[Yu-Kun], Adam, H.[Hartwig], Yuille, A.L.[Alan L.], Chen, L.C.[Liang-Chieh],
k-means Mask Transformer,
ECCV22(XXIX:288-307).
Springer DOI 2211
BibRef

Lezama, J.[José], Chang, H.[Huiwen], Jiang, L.[Lu], Essa, I.[Irfan],
Improved Masked Image Generation with Token-Critic,
ECCV22(XXIII:70-86).
Springer DOI 2211
Generative transformer. BibRef

Rao, Y.M.[Yong-Ming], Zhao, W.L.[Wen-Liang], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers,
ECCV22(XXI:50-67).
Springer DOI 2211
BibRef

Pham, K.[Khoi], Kafle, K.[Kushal], Lin, Z.[Zhe], Ding, Z.H.[Zhi-Hong], Cohen, S.[Scott], Tran, Q.[Quan], Shrivastava, A.[Abhinav],
Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers,
ECCV22(XXV:201-219).
Springer DOI 2211
BibRef

Yu, W.X.[Wen-Xin], Zhang, H.[Hongru], Lan, T.X.[Tian-Xiang], Hu, Y.C.[Yu-Cheng], Yin, D.[Dong],
CBPT: A New Backbone for Enhancing Information Transmission of Vision Transformers,
ICIP22(156-160)
IEEE DOI 2211
Merging, Information processing, Object detection, Transformers, Computational complexity, Vision Transformer, Backbone BibRef

Takeda, M.[Mana], Yanai, K.[Keiji],
Continual Learning in Vision Transformer,
ICIP22(616-620)
IEEE DOI 2211
Learning systems, Image recognition, Transformers, Natural language processing, Convolutional neural networks, Vision Transformer BibRef

Zhou, W.L.[Wei-Lian], Kamata, S.I.[Sei-Ichiro], Luo, Z.[Zhengbo], Xue, X.[Xi],
Rethinking Unified Spectral-Spatial-Based Hyperspectral Image Classification Under 3D Configuration of Vision Transformer,
ICIP22(711-715)
IEEE DOI 2211
Flowcharts, Correlation, Convolution, Transformers, Hyperspectral image classification, 3D coordinate positional embedding BibRef

Li, A.[Ang], Jiao, J.[Jichao], Li, N.[Ning], Qi, W.[Wangjing], Xu, W.[Wei], Pang, M.[Min],
Conmw Transformer: A General Vision Transformer Backbone With Merged-Window Attention,
ICIP22(1551-1555)
IEEE DOI 2211
Image resolution, Convolution, Transformers, Feature extraction, Tokenization, Computational efficiency, Vision Transformer, hybrid architecture BibRef

Li, J.[Junbo], Zhang, H.[Huan], Xie, C.[Cihang],
ViP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers,
ECCV22(XXV:573-587).
Springer DOI 2211
BibRef

Zhang, Q.M.[Qi-Ming], Xu, Y.F.[Yu-Fei], Zhang, J.[Jing], Tao, D.C.[Da-Cheng],
VSA: Learning Varied-Size Window Attention in Vision Transformers,
ECCV22(XXV:466-483).
Springer DOI 2211
BibRef

Cao, Y.H.[Yun-Hao], Yu, H.[Hao], Wu, J.X.[Jian-Xin],
Training Vision Transformers with only 2040 Images,
ECCV22(XXV:220-237).
Springer DOI 2211
BibRef

Wang, C.[Cong], Xu, H.M.[Hong-Min], Zhang, X.[Xiong], Wang, L.[Li], Zheng, Z.[Zhitong], Liu, H.F.[Hai-Feng],
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger,
ECCV22(XX:739-756).
Springer DOI 2211
BibRef

Wu, B.[Boxi], Gu, J.D.[Jin-Dong], Li, Z.F.[Zhi-Feng], Cai, D.[Deng], He, X.F.[Xiao-Fei], Liu, W.[Wei],
Towards Efficient Adversarial Training on Vision Transformers,
ECCV22(XIII:307-325).
Springer DOI 2211
BibRef

Gu, J.D.[Jin-Dong], Tresp, V.[Volker], Qin, Y.[Yao],
Are Vision Transformers Robust to Patch Perturbations?,
ECCV22(XII:404-421).
Springer DOI 2211
BibRef

Zong, Z.[Zhuofan], Li, K.[Kunchang], Song, G.[Guanglu], Wang, Y.[Yali], Qiao, Y.[Yu], Leng, B.[Biao], Liu, Y.[Yu],
Self-slimmed Vision Transformer,
ECCV22(XI:432-448).
Springer DOI 2211
BibRef

Fayyaz, M.[Mohsen], Koohpayegani, S.A.[Soroush Abbasi], Jafari, F.R.[Farnoush Rezaei], Sengupta, S.[Sunando], Joze, H.R.V.[Hamid Reza Vaezi], Sommerlade, E.[Eric], Pirsiavash, H.[Hamed], Gall, J.[Jürgen],
Adaptive Token Sampling for Efficient Vision Transformers,
ECCV22(XI:396-414).
Springer DOI 2211
BibRef

Li, Z.K.[Zhi-Kai], Ma, L.P.[Li-Ping], Chen, M.J.[Meng-Juan], Xiao, J.R.[Jun-Rui], Gu, Q.Y.[Qing-Yi],
Patch Similarity Aware Data-Free Quantization for Vision Transformers,
ECCV22(XI:154-170).
Springer DOI 2211
BibRef

Weng, Z.J.[Ze-Jia], Yang, X.T.[Xi-Tong], Li, A.[Ang], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang],
Semi-supervised Vision Transformers,
ECCV22(XXX:605-620).
Springer DOI 2211
BibRef

Mallick, R.[Rupayan], Benois-Pineau, J.[Jenny], Zemmari, A.[Akka],
I Saw: A Self-Attention Weighted Method for Explanation of Visual Transformers,
ICIP22(3271-3275)
IEEE DOI 2211
Measurement, Correlation coefficient, Visualization, Image segmentation, Databases, Object detection, Transformers, Gaze Fixation Density Maps BibRef

Su, T.[Tong], Ye, S.[Shuo], Song, C.Q.[Cheng-Qun], Cheng, J.[Jun],
Mask-Vit: an Object Mask Embedding in Vision Transformer for Fine-Grained Visual Classification,
ICIP22(1626-1630)
IEEE DOI 2211
Knowledge engineering, Visualization, Focusing, Interference, Benchmark testing, Transformers, Feature extraction, Knowledge Embedding BibRef

Gai, L.[Lulu], Chen, W.[Wei], Gao, R.[Rui], Chen, Y.W.[Yan-Wei], Qiao, X.[Xu],
Using Vision Transformers in 3-D Medical Image Classifications,
ICIP22(696-700)
IEEE DOI 2211
Deep learning, Training, Visualization, Transfer learning, Optimization methods, Self-supervised learning, Transformers, 3-D medical image classifications BibRef

Wu, K.[Kan], Zhang, J.[Jinnian], Peng, H.[Houwen], Liu, M.C.[Meng-Chen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
TinyViT: Fast Pretraining Distillation for Small Vision Transformers,
ECCV22(XXI:68-85).
Springer DOI 2211
BibRef

Gao, L.[Li], Nie, D.[Dong], Li, B.[Bo], Ren, X.F.[Xiao-Feng],
Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation,
ECCV22(XXIII:744-761).
Springer DOI 2211
BibRef

Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Li, Y.[Yehao], Ngo, C.W.[Chong-Wah], Mei, T.[Tao],
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning,
ECCV22(XXV:328-345).
Springer DOI 2211
BibRef

Yuan, Z.H.[Zhi-Hang], Xue, C.H.[Chen-Hao], Chen, Y.Q.[Yi-Qi], Wu, Q.[Qiang], Sun, G.Y.[Guang-Yu],
PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization,
ECCV22(XII:191-207).
Springer DOI 2211
BibRef

Kong, Z.L.[Zheng-Lun], Dong, P.Y.[Pei-Yan], Ma, X.L.[Xiao-Long], Meng, X.[Xin], Niu, W.[Wei], Sun, M.S.[Meng-Shu], Shen, X.[Xuan], Yuan, G.[Geng], Ren, B.[Bin], Tang, H.[Hao], Qin, M.[Minghai], Wang, Y.Z.[Yan-Zhi],
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning,
ECCV22(XI:620-640).
Springer DOI 2211
BibRef

Pan, J.[Junting], Bulat, A.[Adrian], Tan, F.[Fuwen], Zhu, X.T.[Xia-Tian], Dudziak, L.[Lukasz], Li, H.S.[Hong-Sheng], Tzimiropoulos, G.[Georgios], Martinez, B.[Brais],
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers,
ECCV22(XI:294-311).
Springer DOI 2211
BibRef

Xu, R.S.[Run-Sheng], Xiang, H.[Hao], Tu, Z.Z.[Zheng-Zhong], Xia, X.[Xin], Yang, M.H.[Ming-Hsuan], Ma, J.Q.[Jia-Qi],
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer,
ECCV22(XXIX:107-124).
Springer DOI 2211
BibRef

Liu, Y.[Yong], Mai, S.Q.[Si-Qi], Chen, X.N.[Xiang-Ning], Hsieh, C.J.[Cho-Jui], You, Y.[Yang],
Towards Efficient and Scalable Sharpness-Aware Minimization,
CVPR22(12350-12360)
IEEE DOI 2210

WWW Link. Training, Schedules, Scalability, Perturbation methods, Stochastic processes, Transformers, Minimization, Vision applications and systems BibRef

Ren, P.Z.[Peng-Zhen], Li, C.[Changlin], Wang, G.[Guangrun], Xiao, Y.[Yun], Du, Q.[Qing], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun],
Beyond Fixation: Dynamic Window Visual Transformer,
CVPR22(11977-11987)
IEEE DOI 2210
Performance evaluation, Visualization, Systematics, Computational modeling, Scalability, Transformers, Deep learning architectures and techniques BibRef

Liu, Z.[Ze], Hu, H.[Han], Lin, Y.T.[Yu-Tong], Yao, Z.L.[Zhu-Liang], Xie, Z.D.[Zhen-Da], Wei, Y.X.[Yi-Xuan], Ning, J.[Jia], Cao, Y.[Yue], Zhang, Z.[Zheng], Dong, L.[Li], Wei, F.[Furu], Guo, B.[Baining],
Swin Transformer V2: Scaling Up Capacity and Resolution,
CVPR22(11999-12009)
IEEE DOI 2210
Training, Representation learning, Adaptation models, Image resolution, Computational modeling, Semantics, Representation learning BibRef

Bhattacharjee, D.[Deblina], Zhang, T.[Tong], Süsstrunk, S.[Sabine], Salzmann, M.[Mathieu],
MuIT: An End-to-End Multitask Learning Transformer,
CVPR22(12021-12031)
IEEE DOI 2210
Heart, Image segmentation, Computational modeling, Image edge detection, Semantics, Estimation, Predictive models, Scene analysis and understanding BibRef

Fang, J.[Jiemin], Xie, L.X.[Ling-Xi], Wang, X.G.[Xing-Gang], Zhang, X.P.[Xiao-Peng], Liu, W.Y.[Wen-Yu], Tian, Q.[Qi],
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens,
CVPR22(12053-12062)
IEEE DOI 2210
Deep learning, Visualization, Neural networks, Graphics processing units, retrieval BibRef

Sandler, M.[Mark], Zhmoginov, A.[Andrey], Vladymyrov, M.[Max], Jackson, A.[Andrew],
Fine-tuning Image Transformers using Learnable Memory,
CVPR22(12145-12154)
IEEE DOI 2210
Deep learning, Adaptation models, Costs, Computational modeling, Memory management, Transformers, Transfer/low-shot/long-tail learning BibRef

Yu, X.[Xumin], Tang, L.[Lulu], Rao, Y.M.[Yong-Ming], Huang, T.J.[Tie-Jun], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling,
CVPR22(19291-19300)
IEEE DOI 2210
Point cloud compression, Solid modeling, Computational modeling, Bit error rate, Transformers, Pattern recognition, Deep learning architectures and techniques BibRef

Park, C.[Chunghyun], Jeong, Y.[Yoonwoo], Cho, M.[Minsu], Park, J.[Jaesik],
Fast Point Transformer,
CVPR22(16928-16937)
IEEE DOI 2210
Point cloud compression, Shape, Semantics, Neural networks, Transformers, grouping and shape analysis BibRef

Ren, S.[Sucheng], Zhou, D.[Daquan], He, S.F.[Sheng-Feng], Feng, J.S.[Jia-Shi], Wang, X.C.[Xin-Chao],
Shunted Self-Attention via Multi-Scale Token Aggregation,
CVPR22(10843-10852)
IEEE DOI 2210
Degradation, Deep learning, Costs, Computational modeling, Merging, Efficient learning and inferences BibRef

Zeng, W.[Wang], Jin, S.[Sheng], Liu, W.T.[Wen-Tao], Qian, C.[Chen], Luo, P.[Ping], Ouyang, W.L.[Wan-Li], Wang, X.G.[Xiao-Gang],
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer,
CVPR22(11091-11101)
IEEE DOI 2210
Visualization, Shape, Pose estimation, Semantics, Pose estimation and tracking, Deep learning architectures and techniques BibRef

Yu, W.H.[Wei-Hao], Luo, M.[Mi], Zhou, P.[Pan], Si, C.Y.[Chen-Yang], Zhou, Y.C.[Yi-Chen], Wang, X.C.[Xin-Chao], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
MetaFormer is Actually What You Need for Vision,
CVPR22(10809-10819)
IEEE DOI 2210
Computational modeling, Focusing, Transformers, Pattern recognition, Task analysis, retrieval BibRef

Xie, Z.D.[Zhen-Da], Zhang, Z.[Zheng], Cao, Y.[Yue], Lin, Y.T.[Yu-Tong], Bao, J.M.[Jian-Min], Yao, Z.L.[Zhu-Liang], Dai, Q.[Qi], Hu, H.[Han],
SimMIM: a Simple Framework for Masked Image Modeling,
CVPR22(9643-9653)
IEEE DOI 2210

WWW Link. Representation learning, Training, Head, Self-supervised learning, Predictive models, Data models, Self- semi- meta- Representation learning BibRef

Song, Z.[Zikai], Yu, J.Q.[Jun-Qing], Chen, Y.P.P.[Yi-Ping Phoebe], Yang, W.[Wei],
Transformer Tracking with Cyclic Shifting Window Attention,
CVPR22(8781-8790)
IEEE DOI 2210

WWW Link. Visualization, Target tracking, Image recognition, Optimization methods, Benchmark testing BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.[Alan], Li, Y.X.[Yin-Xiao],
MAXIM: Multi-Axis MLP for Image Processing,
CVPR22(5759-5770)
IEEE DOI 2210

WWW Link. Training, Photography, Adaptation models, Visualization, Computational modeling, Transformers, Low-level vision, Computational photography BibRef

Yun, S.[Sukmin], Lee, H.[Hankook], Kim, J.[Jaehyung], Shin, J.[Jinwoo],
Patch-level Representation Learning for Self-supervised Vision Transformers,
CVPR22(8344-8353)
IEEE DOI 2210
Training, Representation learning, Visualization, Neural networks, Object detection, Self-supervised learning, Transformers, Self- semi- meta- unsupervised learning BibRef

Hou, Z.J.[Ze-Jiang], Kung, S.Y.[Sun-Yuan],
Multi-Dimensional Vision Transformer Compression via Dependency Guided Gaussian Process Search,
EVW22(3668-3677)
IEEE DOI 2210
Adaptation models, Image coding, Head, Computational modeling, Neurons, Gaussian processes, Transformers BibRef

Salman, H.[Hadi], Jain, S.[Saachi], Wong, E.[Eric], Madry, A.[Aleksander],
Certified Patch Robustness via Smoothed Vision Transformers,
CVPR22(15116-15126)
IEEE DOI 2210
Visualization, Smoothing methods, Costs, Computational modeling, Transformers, Adversarial attack and defense BibRef

Wang, Y.K.[Yi-Kai], Chen, X.H.[Xing-Hao], Cao, L.[Lele], Huang, W.B.[Wen-Bing], Sun, F.C.[Fu-Chun], Wang, Y.H.[Yun-He],
Multimodal Token Fusion for Vision Transformers,
CVPR22(12176-12185)
IEEE DOI 2210
Point cloud compression, Image segmentation, Shape, Semantics, Object detection, Vision+X BibRef

Tang, Y.[Yehui], Han, K.[Kai], Wang, Y.H.[Yun-He], Xu, C.[Chang], Guo, J.Y.[Jian-Yuan], Xu, C.[Chao], Tao, D.C.[Da-Cheng],
Patch Slimming for Efficient Vision Transformers,
CVPR22(12155-12164)
IEEE DOI 2210
Visualization, Quantization (signal), Computational modeling, Aggregates, Benchmark testing, Representation learning BibRef

Zhang, J.[Jinnian], Peng, H.[Houwen], Wu, K.[Kan], Liu, M.C.[Meng-Chen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
MiniViT: Compressing Vision Transformers with Weight Multiplexing,
CVPR22(12135-12144)
IEEE DOI 2210
Multiplexing, Performance evaluation, Image coding, Codes, Computational modeling, Benchmark testing, Vision applications and systems BibRef

Chen, J.N.[Jie-Neng], Sun, S.Y.[Shu-Yang], He, J.[Ju], Torr, P.H.S.[Philip H.S.], Yuille, A.L.[Alan L.], Bai, S.[Song],
TransMix: Attend to Mix for Vision Transformers,
CVPR22(12125-12134)
IEEE DOI 2210
Training, Image segmentation, Codes, Semantics, Object detection, Benchmark testing, Transformers, Representation learning BibRef

Dong, X.Y.[Xiao-Yi], Bao, J.M.[Jian-Min], Chen, D.D.[Dong-Dong], Zhang, W.M.[Wei-Ming], Yu, N.H.[Neng-Hai], Yuan, L.[Lu], Chen, D.[Dong], Guo, B.[Baining],
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows,
CVPR22(12114-12124)
IEEE DOI 2210
Image segmentation, Costs, Mathematical analysis, Training data, Transformer cores, Transformers, grouping and shape analysis BibRef

Liu, H.[Hao], Jiang, X.H.[Xing-Hua], Li, X.[Xin], Bao, Z.M.[Zhi-Min], Jiang, D.Q.[De-Qiang], Ren, B.[Bo],
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition,
CVPR22(12063-12072)
IEEE DOI 2210
Visualization, Image segmentation, Semantics, Redundancy, Object detection, Deep learning architectures and techniques BibRef

Chen, T.L.[Tian-Long], Zhang, Z.Y.[Zhen-Yu], Cheng, Y.[Yu], Awadallah, A.[Ahmed], Wang, Z.Y.[Zhang-Yang],
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy,
CVPR22(12010-12020)
IEEE DOI 2210
Training, Convolutional codes, Deep learning, Computational modeling, Redundancy, Deep learning architectures and techniques BibRef

Yang, C.[Chenglin], Wang, Y.[Yilin], Zhang, J.M.[Jian-Ming], Zhang, H.[He], Wei, Z.J.[Zi-Jun], Lin, Z.[Zhe], Yuille, A.L.[Alan L.],
Lite Vision Transformer with Enhanced Self-Attention,
CVPR22(11988-11998)
IEEE DOI 2210
Convolutional codes, Image segmentation, Visualization, Convolution, Semantics, Merging, Predictive models, Deep learning architectures and techniques BibRef

Yin, H.X.[Hong-Xu], Vahdat, A.[Arash], Alvarez, J.M.[Jose M.], Mallya, A.[Arun], Kautz, J.[Jan], Molchanov, P.[Pavlo],
A-ViT: Adaptive Tokens for Efficient Vision Transformer,
CVPR22(10799-10808)
IEEE DOI 2210
Training, Adaptive systems, Network architecture, Transformers, Throughput, Hardware, Complexity theory, Efficient learning and inferences BibRef

Lu, J.H.[Jia-Hao], Zhang, X.S.[Xi Sheryl], Zhao, T.L.[Tian-Li], He, X.Y.[Xiang-Yu], Cheng, J.[Jian],
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers,
CVPR22(10041-10050)
IEEE DOI 2210
Privacy, Data privacy, Federated learning, Computational modeling, Training data, Transformers, Market research, Privacy and federated learning BibRef

Hatamizadeh, A.[Ali], Yin, H.X.[Hong-Xu], Roth, H.[Holger], Li, W.Q.[Wen-Qi], Kautz, J.[Jan], Xu, D.[Daguang], Molchanov, P.[Pavlo],
GradViT: Gradient Inversion of Vision Transformers,
CVPR22(10011-10020)
IEEE DOI 2210
Measurement, Differential privacy, Neural networks, Transformers, Pattern recognition, Security, Iterative methods, Privacy and federated learning BibRef

Zhang, H.[Haofei], Duan, J.R.[Jia-Rui], Xue, M.Q.[Meng-Qi], Song, J.[Jie], Sun, L.[Li], Song, M.L.[Ming-Li],
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training,
CVPR22(8934-8943)
IEEE DOI 2210
Training, Upper bound, Neural networks, Training data, Network architecture, Transformers, Computer vision theory, Efficient learning and inferences BibRef

Chavan, A.[Arnav], Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zhuang], Liu, Z.[Zechun], Cheng, K.T.[Kwang-Ting], Xing, E.[Eric],
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space,
CVPR22(4921-4931)
IEEE DOI 2210
Training, Performance evaluation, Image coding, Force, Graphics processing units, Vision applications and systems BibRef

Xia, Z.F.[Zhuo-Fan], Pan, X.[Xuran], Song, S.[Shiji], Li, L.E.[Li Erran], Huang, G.[Gao],
Vision Transformer with Deformable Attention,
CVPR22(4784-4793)
IEEE DOI 2210
Deformable models, Adaptation models, Computational modeling, Predictive models, Transformers, Data models, grouping and shape analysis BibRef

Hong, W.X.[Wei-Xiang], Lao, J.W.[Jiang-Wei], Ren, W.[Wang], Wang, J.[Jian], Chen, J.D.[Jing-Dong], Chu, W.[Wei],
Training Object Detectors from Scratch: An Empirical Study in the Era of Vision Transformer,
CVPR22(4652-4661)
IEEE DOI 2210
Training, Visualization, Semantics, Detectors, Object detection, Transformers, Recognition: detection, categorization, retrieval, Deep learning architectures and techniques BibRef

Chen, Z.Y.[Zhao-Yu], Li, B.[Bo], Wu, S.[Shuang], Xu, J.H.[Jiang-He], Ding, S.H.[Shou-Hong], Zhang, W.Q.[Wen-Qiang],
Shape Matters: Deformable Patch Attack,
ECCV22(IV:529-548).
Springer DOI 2211
BibRef

Chen, Z.Y.[Zhao-Yu], Li, B.[Bo], Xu, J.H.[Jiang-He], Wu, S.[Shuang], Ding, S.H.[Shou-Hong], Zhang, W.Q.[Wen-Qiang],
Towards Practical Certifiable Patch Defense with Vision Transformer,
CVPR22(15127-15137)
IEEE DOI 2210
Smoothing methods, Toy manufacturing industry, Semantics, Network architecture, Transformers, Robustness, Adversarial attack and defense BibRef

Chen, R.J.[Richard J.], Chen, C.[Chengkuan], Li, Y.C.[Yi-Cong], Chen, T.Y.[Tiffany Y.], Trister, A.D.[Andrew D.], Krishnan, R.G.[Rahul G.], Mahmood, F.[Faisal],
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning,
CVPR22(16123-16134)
IEEE DOI 2210
Training, Visualization, Self-supervised learning, Image representation, Transformers, Self- semi- meta- unsupervised learning BibRef

Yang, Z.[Zhao], Wang, J.Q.[Jia-Qi], Tang, Y.S.[Yan-Song], Chen, K.[Kai], Zhao, H.S.[Heng-Shuang], Torr, P.H.S.[Philip H.S.],
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation,
CVPR22(18134-18144)
IEEE DOI 2210
Image segmentation, Visualization, Image coding, Shape, Linguistics, Transformers, Feature extraction, Segmentation, grouping and shape analysis BibRef

Scheibenreif, L.[Linus], Hanna, J.[Joëlle], Mommert, M.[Michael], Borth, D.[Damian],
Self-supervised Vision Transformers for Land-cover Segmentation and Classification,
EarthVision22(1421-1430)
IEEE DOI 2210
Training, Earth, Image segmentation, Computational modeling, Conferences, Transformers BibRef

Zhai, X.H.[Xiao-Hua], Kolesnikov, A.[Alexander], Houlsby, N.[Neil], Beyer, L.[Lucas],
Scaling Vision Transformers,
CVPR22(1204-1213)
IEEE DOI 2210
Training, Error analysis, Computational modeling, Neural networks, Memory management, Training data, Transfer/low-shot/long-tail learning BibRef

Guo, J.Y.[Jian-Yuan], Han, K.[Kai], Wu, H.[Han], Tang, Y.[Yehui], Chen, X.H.[Xing-Hao], Wang, Y.H.[Yun-He], Xu, C.[Chang],
CMT: Convolutional Neural Networks Meet Vision Transformers,
CVPR22(12165-12175)
IEEE DOI 2210
Visualization, Image recognition, Force, Object detection, Transformers, Representation learning BibRef

Meng, L.C.[Ling-Chen], Li, H.D.[Heng-Duo], Chen, B.C.[Bor-Chun], Lan, S.Y.[Shi-Yi], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang], Lim, S.N.[Ser-Nam],
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition,
CVPR22(12299-12308)
IEEE DOI 2210
Image recognition, Head, Law enforcement, Computational modeling, Redundancy, Transformers, Efficient learning and inferences, retrieval BibRef

Herrmann, C.[Charles], Sargent, K.[Kyle], Jiang, L.[Lu], Zabih, R.[Ramin], Chang, H.[Huiwen], Liu, C.[Ce], Krishnan, D.[Dilip], Sun, D.Q.[De-Qing],
Pyramid Adversarial Training Improves ViT Performance,
CVPR22(13409-13419)
IEEE DOI 2210
Training, Image recognition, Stochastic processes, Transformers, Robustness, retrieval, Recognition: detection BibRef

Li, C.L.[Chang-Lin], Zhuang, B.[Bohan], Wang, G.R.[Guang-Run], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun], Yang, Y.[Yi],
Automated Progressive Learning for Efficient Training of Vision Transformers,
CVPR22(12476-12486)
IEEE DOI 2210
Training, Adaptation models, Schedules, Computational modeling, Estimation, Manuals, Transformers, Representation learning BibRef

Yu, T.[Tong], Khalitov, R.[Ruslan], Cheng, L.[Lei], Yang, Z.R.[Zhi-Rong],
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention,
CVPR22(681-690)
IEEE DOI 2210
Protocols, Costs, Scalability, Neural networks, Stacking, Genomics, Transformers, Deep learning architectures and techniques, Representation learning BibRef

Guo, J.Y.[Jian-Yuan], Tang, Y.H.[Ye-Hui], Han, K.[Kai], Chen, X.H.[Xing-Hao], Wu, H.[Han], Xu, C.[Chao], Xu, C.[Chang], Wang, Y.H.[Yun-He],
Hire-MLP: Vision MLP via Hierarchical Rearrangement,
CVPR22(816-826)
IEEE DOI 2210
Representation learning, Image segmentation, Semantics, Object detection, Transformers, Representation learning BibRef

Cheng, B.[Bowen], Misra, I.[Ishan], Schwing, A.G.[Alexander G.], Kirillov, A.[Alexander], Girdhar, R.[Rohit],
Masked-attention Mask Transformer for Universal Image Segmentation,
CVPR22(1280-1289)
IEEE DOI 2210
Image segmentation, Shape, Computational modeling, Semantics, Transformers, Feature extraction, retrieval BibRef

Pu, M.Y.[Meng-Yang], Huang, Y.P.[Ya-Ping], Liu, Y.M.[Yu-Ming], Guan, Q.J.[Qing-Ji], Ling, H.B.[Hai-Bin],
EDTER: Edge Detection with Transformer,
CVPR22(1392-1402)
IEEE DOI 2210
Head, Image edge detection, Semantics, Detectors, Transformers, Feature extraction, Segmentation, grouping and shape analysis, Scene analysis and understanding BibRef

Rangrej, S.B.[Samrudhdhi B.], Srinidhi, C.L.[Chetan L.], Clark, J.J.[James J.],
Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes,
CVPR22(2508-2517)
IEEE DOI 2210
Training, Computational modeling, Imaging, Predictive models, Transformers, Prediction algorithms, Visual reasoning BibRef

Zhu, R.[Rui], Li, Z.Q.[Zheng-Qin], Matai, J.[Janarbek], Porikli, F.M.[Fatih M.], Chandraker, M.[Manmohan],
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes,
CVPR22(2812-2821)
IEEE DOI 2210
Photorealism, Shape, Computational modeling, Lighting, Transformers, Physics-based vision and shape-from-X BibRef

Ermolov, A.[Aleksandr], Mirvakhabova, L.[Leyla], Khrulkov, V.[Valentin], Sebe, N.[Nicu], Oseledets, I.[Ivan],
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning,
CVPR22(7399-7409)
IEEE DOI 2210
Measurement, Geometry, Visualization, Semantics, Self-supervised learning, Transformer cores, Transformers, Representation learning BibRef

Lee, Y.[Youngwan], Kim, J.[Jonghee], Willette, J.[Jeffrey], Hwang, S.J.[Sung Ju],
MPViT: Multi-Path Vision Transformer for Dense Prediction,
CVPR22(7277-7286)
IEEE DOI 2210
Image segmentation, Semantics, Object detection, Transformers, Feature extraction, Pattern recognition, Recognition: detection, Representation learning BibRef

Zhang, C.Z.[Chong-Zhi], Zhang, M.Y.[Ming-Yuan], Zhang, S.H.[Shang-Hang], Jin, D.S.[Dai-Sheng], Zhou, Q.[Qiang], Cai, Z.A.[Zhong-Ang], Zhao, H.[Haiyu], Liu, X.L.[Xiang-Long], Liu, Z.[Ziwei],
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts,
CVPR22(7267-7276)
IEEE DOI 2210
Training, Representation learning, Systematics, Shape, Taxonomy, Self-supervised learning, Transformers, Recognition: detection, Representation learning BibRef

Hou, Z.[Zhi], Yu, B.[Baosheng], Tao, D.C.[Da-Cheng],
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning,
CVPR22(7246-7256)
IEEE DOI 2210
Training, Deep learning, Representation learning, Neural networks, Tail, Transformers, Transfer/low-shot/long-tail learning, Self- semi- meta- unsupervised learning BibRef

Zamir, S.W.[Syed Waqas], Arora, A.[Aditya], Khan, S.[Salman], Hayat, M.[Munawar], Khan, F.S.[Fahad Shahbaz], Yang, M.H.[Ming-Hsuan],
Restormer: Efficient Transformer for High-Resolution Image Restoration,
CVPR22(5718-5729)
IEEE DOI 2210
Computational modeling, Transformer cores, Transformers, Data models, Image restoration, Task analysis, Deep learning architectures and techniques BibRef

Zhao, H.S.[Heng-Shuang], Jiang, L.[Li], Jia, J.Y.[Jia-Ya], Torr, P.H.S.[Philip H.S.], Koltun, V.[Vladlen],
Point Transformer,
ICCV21(16239-16248)
IEEE DOI 2203
Point cloud compression, Measurement, Image segmentation, Semantics, Object detection, Transformer cores, Recognition and classification BibRef

Lin, K.[Kevin], Wang, L.J.[Li-Juan], Liu, Z.C.[Zi-Cheng],
Mesh Graphormer,
ICCV21(12919-12928)
IEEE DOI 2203
Convolutional codes, Solid modeling, Network topology, Transformers, Gestures and body pose BibRef

Casey, E.[Evan], Pérez, V.[Víctor], Li, Z.[Zhuoru],
The Animation Transformer: Visual Correspondence via Segment Matching,
ICCV21(11303-11312)
IEEE DOI 2203
Visualization, Image segmentation, Image color analysis, Production, Animation, Transformers, grouping and shape BibRef

Reizenstein, J.[Jeremy], Shapovalov, R.[Roman], Henzler, P.[Philipp], Sbordone, L.[Luca], Labatut, P.[Patrick], Novotny, D.[David],
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction,
ICCV21(10881-10891)
IEEE DOI 2203
Award, Marr Prize, HM. Point cloud compression, Transformers, Rendering (computer graphics), Cameras, Image reconstruction, 3D from multiview and other sensors BibRef

Mariotti, O.[Octave], Aodha, O.M.[Oisin Mac], Bilen, H.[Hakan],
ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation,
ICCV21(10398-10408)
IEEE DOI 2203
Training, Annotations, Estimation, Benchmark testing, Transformers, Representation learning, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Feng, W.X.[Wei-Xin], Wang, Y.J.[Yuan-Jiang], Ma, L.H.[Li-Hua], Yuan, Y.[Ye], Zhang, C.[Chi],
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning,
ICCV21(10150-10160)
IEEE DOI 2203
Training, Representation learning, Visualization, Protocols, Object detection, Semisupervised learning, Transformers, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Wu, H.P.[Hai-Ping], Xiao, B.[Bin], Codella, N.[Noel], Liu, M.C.[Meng-Chen], Dai, X.Y.[Xi-Yang], Yuan, L.[Lu], Zhang, L.[Lei],
CvT: Introducing Convolutions to Vision Transformers,
ICCV21(22-31)
IEEE DOI 2203
Code, Vision Transformer.
WWW Link. Convolutional codes, Image resolution, Image recognition, Performance gain, Transformers, Distortion, BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Sablayrolles, A.[Alexandre], Synnaeve, G.[Gabriel], Jégou, H.[Hervé],
Going deeper with Image Transformers,
ICCV21(32-42)
IEEE DOI 2203
Training, Neural networks, Training data, Data models, Circuit faults, Recognition and classification, Optimization and learning methods BibRef

Zhao, J.W.[Jia-Wei], Yan, K.[Ke], Zhao, Y.F.[Yi-Fan], Guo, X.W.[Xiao-Wei], Huang, F.Y.[Fei-Yue], Li, J.[Jia],
Transformer-based Dual Relation Graph for Multi-label Image Recognition,
ICCV21(163-172)
IEEE DOI 2203
Image recognition, Correlation, Computational modeling, Semantics, Benchmark testing, Representation learning BibRef

Chen, C.F.R.[Chun-Fu Richard], Fan, Q.F.[Quan-Fu], Panda, R.[Rameswar],
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification,
ICCV21(347-356)
IEEE DOI 2203
Image segmentation, Image recognition, Computational modeling, Semantics, Memory management, Object detection, Representation learning BibRef

Pan, Z.Z.[Zi-Zheng], Zhuang, B.[Bohan], Liu, J.[Jing], He, H.Y.[Hao-Yu], Cai, J.F.[Jian-Fei],
Scalable Vision Transformers with Hierarchical Pooling,
ICCV21(367-376)
IEEE DOI 2203
Visualization, Image recognition, Computational modeling, Scalability, Transformers, Computational efficiency, Efficient training and inference methods BibRef

Chefer, H.[Hila], Gur, S.[Shir], Wolf, L.B.[Lior B.],
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers,
ICCV21(387-396)
IEEE DOI 2203
Measurement, Visualization, Image segmentation, Computational modeling, Object detection, BibRef

Yuan, L.[Li], Chen, Y.P.[Yun-Peng], Wang, T.[Tao], Yu, W.H.[Wei-Hao], Shi, Y.J.[Yu-Jun], Jiang, Z.H.[Zi-Hang], Tay, F.E.H.[Francis E. H.], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI 2203
Training, Image resolution, Computational modeling, Image edge detection, Transformers, BibRef

Wu, B.[Bichen], Xu, C.F.[Chen-Feng], Dai, X.L.[Xiao-Liang], Wan, A.[Alvin], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Tomizuka, M.[Masayoshi], Gonzalez, J.[Joseph], Keutzer, K.[Kurt], Vajda, P.[Peter],
Visual Transformers: Where Do Transformers Really Belong in Vision Models?,
ICCV21(579-589)
IEEE DOI 2203
Training, Visualization, Image segmentation, Lips, Computational modeling, Semantics, Vision applications and systems BibRef

Hu, R.H.[Rong-Hang], Singh, A.[Amanpreet],
UniT: Multimodal Multitask Learning with a Unified Transformer,
ICCV21(1419-1429)
IEEE DOI 2203
Training, Natural languages, Object detection, Predictive models, Transformers, Multitasking, Representation learning BibRef

Qiu, Y.[Yue], Yamamoto, S.[Shintaro], Nakashima, K.[Kodai], Suzuki, R.[Ryota], Iwata, K.[Kenji], Kataoka, H.[Hirokatsu], Satoh, Y.[Yutaka],
Describing and Localizing Multiple Changes with Transformers,
ICCV21(1951-1960)
IEEE DOI 2203
Measurement, Location awareness, Codes, Natural languages, Benchmark testing, Transformers, Vision applications and systems BibRef

Song, M.[Myungseo], Choi, J.[Jinyoung], Han, B.H.[Bo-Hyung],
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform,
ICCV21(2360-2369)
IEEE DOI 2203
Training, Image coding, Neural networks, Rate-distortion, Transforms, Network architecture, Computational photography, Low-level and physics-based vision BibRef

Shenga, H.[Hualian], Cai, S.[Sijia], Liu, Y.[Yuan], Deng, B.[Bing], Huang, J.Q.[Jian-Qiang], Hua, X.S.[Xian-Sheng], Zhao, M.J.[Min-Jian],
Improving 3D Object Detection with Channel-wise Transformer,
ICCV21(2723-2732)
IEEE DOI 2203
Point cloud compression, Object detection, Detectors, Transforms, Transformers, Encoding, Detection and localization in 2D and 3D, BibRef

Zhang, P.C.[Peng-Chuan], Dai, X.[Xiyang], Yang, J.W.[Jian-Wei], Xiao, B.[Bin], Yuan, L.[Lu], Zhang, L.[Lei], Gao, J.F.[Jian-Feng],
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding,
ICCV21(2978-2988)
IEEE DOI 2203
Image segmentation, Image coding, Computational modeling, Memory management, Object detection, Transformers, Representation learning BibRef

Dong, Q.[Qi], Tu, Z.W.[Zhuo-Wen], Liao, H.[Haofu], Zhang, Y.T.[Yu-Ting], Mahadevan, V.[Vijay], Soatto, S.[Stefano],
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries,
ICCV21(3530-3539)
IEEE DOI 2203
Visualization, Detectors, Transformers, Task analysis, Standards, Detection and localization in 2D and 3D, Representation learning BibRef

Fan, H.Q.[Hao-Qi], Xiong, B.[Bo], Mangalam, K.[Karttikeya], Li, Y.[Yanghao], Yan, Z.C.[Zhi-Cheng], Malik, J.[Jitendra], Feichtenhofer, C.[Christoph],
Multiscale Vision Transformers,
ICCV21(6804-6815)
IEEE DOI 2203
Visualization, Image recognition, Codes, Computational modeling, Transformers, Complexity theory, Recognition and classification BibRef

Mahmood, K.[Kaleel], Mahmood, R.[Rigel], van Dijk, M.[Marten],
On the Robustness of Vision Transformers to Adversarial Examples,
ICCV21(7818-7827)
IEEE DOI 2203
Transformers, Robustness, Adversarial machine learning, Security, Machine learning architectures and formulations BibRef

Chen, X.L.[Xin-Lei], Xie, S.[Saining], He, K.[Kaiming],
An Empirical Study of Training Self-Supervised Vision Transformers,
ICCV21(9620-9629)
IEEE DOI 2203
Training, Benchmark testing, Transformers, Standards, Representation learning, Recognition and classification, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Caron, M.[Mathilde], Touvron, H.[Hugo], Misra, I.[Ishan], Jegou, H.[Hervé], Mairal, J.[Julien], Bojanowski, P.[Piotr], Joulin, A.[Armand],
Emerging Properties in Self-Supervised Vision Transformers,
ICCV21(9630-9640)
IEEE DOI 2203
Training, Image segmentation, Semantics, Layout, Image retrieval, Representation learning, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Yuan, Y.[Ye], Weng, X.[Xinshuo], Ou, Y.[Yanglan], Kitani, K.[Kris],
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting,
ICCV21(9793-9803)
IEEE DOI 2203
Uncertainty, Stochastic processes, Predictive models, Transformers, Encoding, Trajectory, Motion and tracking, Vision for robotics and autonomous vehicles BibRef

Xu, W.J.[Wei-Jian], Xu, Y.F.[Yi-Fan], Chang, T.[Tyler], Tu, Z.W.[Zhuo-Wen],
Co-Scale Conv-Attentional Image Transformers,
ICCV21(9961-9970)
IEEE DOI 2203
Image segmentation, Computational modeling, Object detection, Transformers, Convolutional neural networks, Task analysis, Recognition and classification BibRef

Wu, K.[Kan], Peng, H.W.[Hou-Wen], Chen, M.H.[Ming-Hao], Fu, J.L.[Jian-Long], Chao, H.Y.[Hong-Yang],
Rethinking and Improving Relative Position Encoding for Vision Transformer,
ICCV21(10013-10021)
IEEE DOI 2203
Image coding, Codes, Computational modeling, Transformers, Encoding, Natural language processing, Datasets and evaluation, Recognition and classification BibRef

Bhojanapalli, S.[Srinadh], Chakrabarti, A.[Ayan], Glasner, D.[Daniel], Li, D.[Daliang], Unterthiner, T.[Thomas], Veit, A.[Andreas],
Understanding Robustness of Transformers for Image Classification,
ICCV21(10211-10221)
IEEE DOI 2203
Perturbation methods, Transformers, Robustness, Data models, Convolutional neural networks, Recognition and classification BibRef

Yan, B.[Bin], Peng, H.[Houwen], Fu, J.L.[Jian-Long], Wang, D.[Dong], Lu, H.C.[Hu-Chuan],
Learning Spatio-Temporal Transformer for Visual Tracking,
ICCV21(10428-10437)
IEEE DOI 2203
Visualization, Target tracking, Smoothing methods, Pipelines, Benchmark testing, Transformers, BibRef

Heo, B.[Byeongho], Yun, S.[Sangdoo], Han, D.Y.[Dong-Yoon], Chun, S.[Sanghyuk], Choe, J.[Junsuk], Oh, S.J.[Seong Joon],
Rethinking Spatial Dimensions of Vision Transformers,
ICCV21(11916-11925)
IEEE DOI 2203
Dimensionality reduction, Computational modeling, Object detection, Transformers, Robustness, Recognition and classification BibRef

Voskou, A.[Andreas], Panousis, K.P.[Konstantinos P.], Kosmopoulos, D.[Dimitrios], Metaxas, D.N.[Dimitris N.], Chatzis, S.[Sotirios],
Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation,
ICCV21(11926-11935)
IEEE DOI 2203
Training, Memory management, Stochastic processes, Gesture recognition, Benchmark testing, Assistive technologies, BibRef

Ranftl, R.[René], Bochkovskiy, A.[Alexey], Koltun, V.[Vladlen],
Vision Transformers for Dense Prediction,
ICCV21(12159-12168)
IEEE DOI 2203
Image resolution, Semantics, Neural networks, Estimation, Training data, grouping and shape BibRef

Chen, M.H.[Ming-Hao], Peng, H.W.[Hou-Wen], Fu, J.L.[Jian-Long], Ling, H.B.[Hai-Bin],
AutoFormer: Searching Transformers for Visual Recognition,
ICCV21(12250-12260)
IEEE DOI 2203
Training, Convolutional codes, Visualization, Head, Search methods, Manuals, Recognition and classification BibRef

Yang, G.L.[Guang-Lei], Tang, H.[Hao], Ding, M.L.[Ming-Li], Sebe, N.[Nicu], Ricci, E.[Elisa],
Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction,
ICCV21(16249-16259)
IEEE DOI 2203
Correlation, Estimation, Logic gates, Transformers, Natural language processing, Vision applications and systems BibRef

Yuan, K.[Kun], Guo, S.P.[Shao-Peng], Liu, Z.[Ziwei], Zhou, A.[Aojun], Yu, F.W.[Feng-Wei], Wu, W.[Wei],
Incorporating Convolution Designs into Visual Transformers,
ICCV21(559-568)
IEEE DOI 2203
Training, Visualization, Costs, Convolution, Training data, Transformers, Feature extraction, Recognition and classification, Efficient training and inference methods BibRef

Chen, Z.[Zhengsu], Xie, L.X.[Ling-Xi], Niu, J.W.[Jian-Wei], Liu, X.F.[Xue-Feng], Wei, L.[Longhui], Tian, Q.[Qi],
Visformer: The Vision-friendly Transformer,
ICCV21(569-578)
IEEE DOI 2203
Convolutional codes, Training, Visualization, Protocols, Computational modeling, Fitting, Recognition and classification, Representation learning BibRef

Wang, W.[Wenhai], Xie, E.[Enze], Li, X.[Xiang], Fan, D.P.[Deng-Ping], Song, K.[Kaitao], Liang, D.[Ding], Lu, T.[Tong], Luo, P.[Ping], Shao, L.[Ling],
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions,
ICCV21(548-558)
IEEE DOI 2203
Image resolution, Costs, Semantics, Object detection, Transformers, Feature extraction, Recognition and classification, grouping and shape BibRef

Yao, Z.L.[Zhu-Liang], Cao, Y.[Yue], Lin, Y.T.[Yu-Tong], Liu, Z.[Ze], Zhang, Z.[Zheng], Hu, H.[Han],
Leveraging Batch Normalization for Vision Transformers,
NeruArch21(413-422)
IEEE DOI 2112
Training, Transformers, Feeds BibRef

Kim, K.[Kyungmin], Wu, B.C.[Bi-Chen], Dai, X.L.[Xiao-Liang], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Vajda, P.[Peter], Kim, S.[Seon],
Rethinking the Self-Attention in Vision Transformers,
ECV21(3065-3069)
IEEE DOI 2109
Computational modeling, Pattern recognition BibRef

Zhang, Z.X.[Zi-Xiao], Lu, X.Q.[Xiao-Qiang], Cao, G.J.[Guo-Jin], Yang, Y.T.[Yu-Ting], Jiao, L.C.[Li-Cheng], Liu, F.[Fang],
ViT-YOLO: Transformer-Based YOLO for Object Detection,
VisDrone21(2799-2808)
IEEE DOI 2112
Semantics, Detectors, Object detection, Feature extraction, Robustness BibRef

Kong, D.[Daehyeon], Kong, K.[Kyeongbo], Kim, K.[Kyunghun], Min, S.J.[Sung-Jun], Kang, S.J.[Suk-Ju],
Image-Adaptive Hint Generation via Vision Transformer for Outpainting,
WACV22(4029-4038)
IEEE DOI 2202
Image synthesis, Neural networks, Complex networks, Benchmark testing, Transformers, Vision Systems and Applications BibRef

Graham, B.[Ben], El-Nouby, A.[Alaaeldin], Touvron, H.[Hugo], Stock, P.[Pierre], Joulin, A.[Armand], Jégou, H.[Hervé], Douze, M.[Matthijs],
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference,
ICCV21(12239-12249)
IEEE DOI 2203
Training, Image resolution, Neural networks, Parallel processing, Transformers, Feature extraction, Representation learning BibRef

Horváth, J.[János], Baireddy, S.[Sriram], Hao, H.X.[Han-Xiang], Montserrat, D.M.[Daniel Mas], Delp, E.J.[Edward J.],
Manipulation Detection in Satellite Images Using Vision Transformer,
WMF21(1032-1041)
IEEE DOI 2109
BibRef
Earlier: A1, A4, A3, A5, Only:
Manipulation Detection in Satellite Images Using Deep Belief Networks,
WMF20(2832-2840)
IEEE DOI 2008
Image sensors, Satellites, Splicing, Forestry, Tools. Satellites, Image reconstruction, Training, Forgery, Heating systems, Feature extraction BibRef

Beal, J.[Josh], Wu, H.Y.[Hao-Yu], Park, D.H.[Dong Huk], Zhai, A.[Andrew], Kislyuk, D.[Dmitry],
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations,
WACV22(1431-1440)
IEEE DOI 2202
Visualization, Solid modeling, Systematics, Computational modeling, Transformers, Semi- and Un- supervised Learning BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Video Transformers .


Last update:Nov 30, 2023 at 15:51:27