14.5.10.6 Vision Transformers, ViT

Chapter Contents (Back)
Vision Transformers. Transformers. A subset
See also Attention in Vision Transformers. Shift, Scale, and Distortion Invariance. Shifted Window:
See also SWIN Transformer. Video specific:
See also Video Transformers.
See also Zero-Shot Learning.
See also Detection Transformer, DETR Applications.

Bazi, Y.[Yakoub], Bashmal, L.[Laila], Al Rahhal, M.M.[Mohamad M.], Al Dayil, R.[Reham], Al Ajlan, N.[Naif],
Vision Transformers for Remote Sensing Image Classification,
RS(13), No. 3, 2021, pp. xx-yy.
DOI Link 2102
BibRef

Li, T.[Tao], Zhang, Z.[Zheng], Pei, L.[Lishen], Gan, Y.[Yan],
HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval,
SPLetters(29), 2022, pp. 827-831.
IEEE DOI 2204
Transformers, Binary codes, Task analysis, Training, Image retrieval, Feature extraction, Databases, Binary embedding, image retrieval BibRef

Jiang, B.[Bo], Zhao, K.K.[Kang-Kang], Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI 2204
Measurement, Transformers, Image representation, Feature extraction, Visualization, transformer BibRef

Chen, Z.M.[Zhao-Min], Cui, Q.[Quan], Zhao, B.[Borui], Song, R.J.[Ren-Jie], Zhang, X.Q.[Xiao-Qin], Yoshie, O.[Osamu],
SST: Spatial and Semantic Transformers for Multi-Label Image Recognition,
IP(31), 2022, pp. 2570-2583.
IEEE DOI 2204
Correlation, Semantics, Transformers, Image recognition, Task analysis, Training, Feature extraction, label correlation BibRef

Wang, G.H.[Guang-Hui], Li, B.[Bin], Zhang, T.[Tao], Zhang, S.[Shubi],
A Network Combining a Transformer and a Convolutional Neural Network for Remote Sensing Image Change Detection,
RS(14), No. 9, 2022, pp. xx-yy.
DOI Link 2205
BibRef

Luo, G.[Gen], Zhou, Y.[Yiyi], Sun, X.S.[Xiao-Shuai], Wang, Y.[Yan], Cao, L.J.[Liu-Juan], Wu, Y.J.[Yong-Jian], Huang, F.Y.[Fei-Yue], Ji, R.R.[Rong-Rong],
Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks,
IP(31), 2022, pp. 3386-3398.
IEEE DOI 2205
Transformers, Task analysis, Computational modeling, Benchmark testing, Visualization, Convolution, Head, reference expression comprehension BibRef

Tu, Y.B.[Yun-Bin], Li, L.[Liang], Su, L.[Li], Gao, S.X.[Sheng-Xiang], Yan, C.G.[Cheng-Gang], Zha, Z.J.[Zheng-Jun], Yu, Z.T.[Zheng-Tao], Huang, Q.M.[Qing-Ming],
I2-Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning,
IP(31), 2022, pp. 3565-3577.
IEEE DOI 2206
Transformers, Semantics, Task analysis, Visualization, TV, Graph neural networks, TV Show captioning, transformer BibRef

Wang, J.Y.[Jia-Yun], Chakraborty, R.[Rudrasis], Yu, S.X.[Stella X.],
Transformer for 3D Point Clouds,
PAMI(44), No. 8, August 2022, pp. 4419-4431.
IEEE DOI 2207
Convolution, Feature extraction, Shape, Semantics, Task analysis, Measurement, point cloud, transformation, deformable, segmentation, 3D detection BibRef

Wang, L.[Libo], Li, R.[Rui], Zhang, C.[Ce], Fang, S.H.[Sheng-Hui], Duan, C.X.[Chen-Xi], Meng, X.L.[Xiao-Liang], Atkinson, P.M.[Peter M.],
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery,
PandRS(190), 2022, pp. 196-214.
Elsevier DOI 2208
Award, U.V. Helava, ISPRS. Semantic Segmentation, Remote Sensing, Vision Transformer, Fully Transformer Network, Global-local Context, Urban Scene BibRef

Kheldouni, A.[Amine], Boumhidi, J.[Jaouad],
A Study of Bidirectional Encoder Representations from Transformers for Sequential Recommendations,
ISCV22(1-5)
IEEE DOI 2208
Knowledge engineering, Recurrent neural networks, Predictive models, Markov processes BibRef

Li, Z.K.[Ze-Kun], Liu, Y.F.[Yu-Fan], Li, B.[Bing], Feng, B.L.[Bai-Lan], Wu, K.[Kebin], Peng, C.W.[Cheng-Wei], Hu, W.M.[Wei-Ming],
SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction,
CirSysVideo(32), No. 9, September 2022, pp. 6160-6173.
IEEE DOI 2209
Transformers, Semantics, Task analysis, Detectors, Image segmentation, Head, Convolution, Transformer, dense prediction, multi-level interaction BibRef

Wu, J.J.[Jia-Jing], Wei, Z.Q.[Zhi-Qiang], Zhang, J.P.[Jin-Peng], Zhang, Y.S.[Yu-Shi], Jia, D.N.[Dong-Ning], Yin, B.[Bo], Yu, Y.C.[Yun-Chao],
Full-Coupled Convolutional Transformer for Surface-Based Duct Refractivity Inversion,
RS(14), No. 17, 2022, pp. xx-yy.
DOI Link 2209
BibRef

Jiang, K.[Kai], Peng, P.[Peng], Lian, Y.[Youzao], Xu, W.S.[Wei-Sheng],
The encoding method of position embeddings in vision transformer,
JVCIR(89), 2022, pp. 103664.
Elsevier DOI 2212
Vision transformer, Position embeddings, Gabor filters BibRef

Han, K.[Kai], Wang, Y.H.[Yun-He], Chen, H.[Hanting], Chen, X.[Xinghao], Guo, J.[Jianyuan], Liu, Z.H.[Zhen-Hua], Tang, Y.[Yehui], Xiao, A.[An], Xu, C.J.[Chun-Jing], Xu, Y.X.[Yi-Xing], Yang, Z.H.[Zhao-Hui], Zhang, Y.[Yiman], Tao, D.C.[Da-Cheng],
A Survey on Vision Transformer,
PAMI(45), No. 1, January 2023, pp. 87-110.
IEEE DOI 2212
Survey, Vision Transformer. Transformers, Task analysis, Encoding, Computational modeling, Visualization, Object detection, high-level vision, video BibRef

Hou, Q.[Qibin], Jiang, Z.[Zihang], Yuan, L.[Li], Cheng, M.M.[Ming-Ming], Yan, S.C.[Shui-Cheng], Feng, J.S.[Jia-Shi],
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition,
PAMI(45), No. 1, January 2023, pp. 1328-1334.
IEEE DOI 2212
Transformers, Encoding, Visualization, Convolutional codes, Mixers, Computer architecture, Training data, Vision permutator, deep neural network BibRef

Yu, W.H.[Wei-Hao], Si, C.Y.[Chen-Yang], Zhou, P.[Pan], Luo, M.[Mi], Zhou, Y.C.[Yi-Chen], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng], Wang, X.C.[Xin-Chao],
MetaFormer Baselines for Vision,
PAMI(46), No. 2, February 2024, pp. 896-912.
IEEE DOI 2401
BibRef
And: A1, A4, A3, A2, A5, A8, A6, A7:
MetaFormer is Actually What You Need for Vision,
CVPR22(10809-10819)
IEEE DOI 2210
The abstracted architecture of Transformer. Computational modeling, Focusing, Transformers, Pattern recognition, Task analysis, retrieval BibRef

Zhou, D.[Daquan], Hou, Q.[Qibin], Yang, L.J.[Lin-Jie], Jin, X.J.[Xiao-Jie], Feng, J.S.[Jia-Shi],
Token Selection is a Simple Booster for Vision Transformers,
PAMI(45), No. 11, November 2023, pp. 12738-12746.
IEEE DOI 2310
BibRef

Yuan, L.[Li], Hou, Q.[Qibin], Jiang, Z.[Zihang], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
VOLO: Vision Outlooker for Visual Recognition,
PAMI(45), No. 5, May 2023, pp. 6575-6586.
IEEE DOI 2304
Transformers, Computer architecture, Computational modeling, Training, Data models, Task analysis, Visualization, image classification BibRef

Ren, S.[Sucheng], Zhou, D.[Daquan], He, S.F.[Sheng-Feng], Feng, J.S.[Jia-Shi], Wang, X.C.[Xin-Chao],
Shunted Self-Attention via Multi-Scale Token Aggregation,
CVPR22(10843-10852)
IEEE DOI 2210
Degradation, Deep learning, Costs, Computational modeling, Merging, Efficient learning and inferences BibRef

Wu, Y.H.[Yu-Huan], Liu, Y.[Yun], Zhan, X.[Xin], Cheng, M.M.[Ming-Ming],
P2T: Pyramid Pooling Transformer for Scene Understanding,
PAMI(45), No. 11, November 2023, pp. 12760-12771.
IEEE DOI 2310
BibRef

Li, Y.[Yehao], Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Mei, T.[Tao],
Contextual Transformer Networks for Visual Recognition,
PAMI(45), No. 2, February 2023, pp. 1489-1500.
IEEE DOI 2301
Transformers, Convolution, Visualization, Task analysis, Image recognition, Object detection, Transformer, image recognition BibRef

Wang, H.[Hang], Du, Y.[Youtian], Zhang, Y.[Yabin], Li, S.[Shuai], Zhang, L.[Lei],
One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing,
IP(32), 2023, pp. 190-202.
IEEE DOI 2301
Visualization, Proposals, Transformers, Task analysis, Detectors, Message passing, Predictive models, gated message passing BibRef

Kim, B.[Boah], Kim, J.[Jeongsol], Ye, J.C.[Jong Chul],
Task-Agnostic Vision Transformer for Distributed Learning of Image Processing,
IP(32), 2023, pp. 203-218.
IEEE DOI 2301
Task analysis, Transformers, Servers, Distance learning, Computer aided instruction, Tail, Head, Distributed learning, task-agnostic learning BibRef

Park, S.[Sangjoon], Ye, J.C.[Jong Chul],
Multi-Task Distributed Learning Using Vision Transformer With Random Patch Permutation,
MedImg(42), No. 7, July 2023, pp. 2091-2105.
IEEE DOI 2307
Task analysis, Transformers, Head, Tail, Servers, Multitasking, Distance learning, Federated learning, split learning, privacy preservation BibRef

Kiya, H.[Hitoshi], Iijima, R.[Ryota], Maungmaung, A.[Aprilpyone], Kinoshit, Y.[Yuma],
Image and Model Transformation with Secret Key for Vision Transformer,
IEICE(E106-D), No. 1, January 2023, pp. 2-11.
WWW Link. 2301
BibRef

Mou, C.[Chong], Zhang, J.[Jian],
TransCL: Transformer Makes Strong and Flexible Compressive Learning,
PAMI(45), No. 4, April 2023, pp. 5236-5251.
IEEE DOI 2303
Task analysis, Transformers, Image reconstruction, Image coding, Compressed sensing, Sensors, Cameras, Compressed sensing, semantic segmentation BibRef

Zhang, H.F.[Hao-Fei], Mao, F.[Feng], Xue, M.Q.[Meng-Qi], Fang, G.F.[Gong-Fan], Feng, Z.L.[Zun-Lei], Song, J.[Jie], Song, M.L.[Ming-Li],
Knowledge Amalgamation for Object Detection With Transformers,
IP(32), 2023, pp. 2093-2106.
IEEE DOI 2304
Transformers, Task analysis, Object detection, Detectors, Training, Feature extraction, Model reusing, vision transformers BibRef

Li, Y.[Ying], Chen, K.[Kehan], Sun, S.L.[Shi-Lei], He, C.[Chu],
Multi-scale homography estimation based on dual feature aggregation transformer,
IET-IPR(17), No. 5, 2023, pp. 1403-1416.
DOI Link 2304
image matching, image registration BibRef

Wang, G.Q.[Guan-Qun], Chen, H.[He], Chen, L.[Liang], Zhuang, Y.[Yin], Zhang, S.H.[Shang-Hang], Zhang, T.[Tong], Dong, H.[Hao], Gao, P.[Peng],
P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer for Remote Sensing Image Classification,
RS(15), No. 7, 2023, pp. 1773.
DOI Link 2304
BibRef

Zhang, Q.M.[Qi-Ming], Xu, Y.F.[Yu-Fei], Zhang, J.[Jing], Tao, D.C.[Da-Cheng],
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond,
IJCV(131), No. 5, May 2023, pp. 1141-1162.
Springer DOI 2305
BibRef

Fan, X.[Xinyi], Liu, H.J.[Hua-Jun],
FlexFormer: Flexible Transformer for efficient visual recognition,
PRL(169), 2023, pp. 95-101.
Elsevier DOI 2305
Vision transformer, Frequency analysis, Image classification BibRef

Cho, S.[Seokju], Hong, S.[Sunghwan], Kim, S.[Seungryong],
CATs++: Boosting Cost Aggregation With Convolutions and Transformers,
PAMI(45), No. 6, June 2023, pp. 7174-7194.
IEEE DOI
WWW Link. 2305
Costs, Transformers, Correlation, Semantics, Feature extraction, Task analysis, Cost aggregation, efficient transformer, semantic visual correspondence BibRef

Kim, B.J.[Bum Jun], Choi, H.[Hyeyeon], Jang, H.[Hyeonah], Lee, D.G.[Dong Gu], Jeong, W.[Wonseok], Kim, S.W.[Sang Woo],
Improved robustness of vision transformers via prelayernorm in patch embedding,
PR(141), 2023, pp. 109659.
Elsevier DOI 2306
Vision transformer, Patch embedding, Contrast enhancement, Robustness, Layer normalization, Convolutional neural network, Deep learning BibRef

He, Q.[Qibin], Sun, X.[Xian], Yan, Z.Y.[Zhi-Yuan], Wang, B.[Bing], Zhu, Z.[Zicong], Diao, W.H.[Wen-Hui], Yang, M.Y.[Michael Ying],
AST: Adaptive Self-supervised Transformer for optical remote sensing representation,
PandRS(200), 2023, pp. 41-54.
Elsevier DOI 2306
Cross-scale transformer, Interpretation, Masked image modeling, Optical remote sensing, Representation learning BibRef

Wang, Z.W.[Zi-Wei], Wang, C.Y.[Chang-Yuan], Xu, X.W.[Xiu-Wei], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Quantformer: Learning Extremely Low-Precision Vision Transformers,
PAMI(45), No. 7, July 2023, pp. 8813-8826.
IEEE DOI 2306
Quantization (signal), Transformers, Computational modeling, Search problems, Object detection, Image color analysis, vision transformers BibRef

Sun, S.Y.[Shu-Yang], Yue, X.Y.[Xiao-Yu], Zhao, H.S.[Heng-Shuang], Torr, P.H.S.[Philip H.S.], Bai, S.[Song],
Patch-Based Separable Transformer for Visual Recognition,
PAMI(45), No. 7, July 2023, pp. 9241-9247.
IEEE DOI 2306
Task analysis, Current transformers, Visualization, Feature extraction, Convolutional neural networks, instance segmentation BibRef

Yue, X.Y.[Xiao-Yu], Sun, S.Y.[Shu-Yang], Kuang, Z.H.[Zhang-Hui], Wei, M.[Meng], Torr, P.H.S.[Philip H.S.], Zhang, W.[Wayne], Lin, D.[Dahua],
Vision Transformer with Progressive Sampling,
ICCV21(377-386)
IEEE DOI 2203
Codes, Computational modeling, Interference, Transformers, Feature extraction, Recognition and classification, Representation learning BibRef

Peng, Z.L.[Zhi-Liang], Guo, Z.H.[Zong-Hao], Huang, W.[Wei], Wang, Y.W.[Yao-Wei], Xie, L.X.[Ling-Xi], Jiao, J.B.[Jian-Bin], Tian, Q.[Qi], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Recognition and Detection,
PAMI(45), No. 8, August 2023, pp. 9454-9468.
IEEE DOI 2307
Transformers, Feature extraction, Couplings, Visualization, Detectors, Convolution, Object detection, Feature fusion, vision transformer BibRef

Peng, Z.L.[Zhi-Liang], Huang, W.[Wei], Gu, S.Z.[Shan-Zhi], Xie, L.X.[Ling-Xi], Wang, Y.[Yaowei], Jiao, J.B.[Jian-Bin], Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Visual Recognition,
ICCV21(357-366)
IEEE DOI 2203
Couplings, Representation learning, Visualization, Fuses, Convolution, Object detection, Transformers, Representation learning BibRef

Feng, Z.Z.[Zhan-Zhou], Zhang, S.L.[Shi-Liang],
Efficient Vision Transformer via Token Merger,
IP(32), 2023, pp. 4156-4169.
IEEE DOI 2307
Corporate acquisitions, Transformers, Semantics, Task analysis, Visualization, Merging, Computational efficiency, sparese representation BibRef

Yang, J.H.[Jia-Hao], Li, X.Y.[Xiang-Yang], Zheng, M.[Mao], Wang, Z.[Zihan], Zhu, Y.Q.[Yong-Qing], Guo, X.Q.[Xiao-Qian], Yuan, Y.C.[Yu-Chen], Chai, Z.[Zifeng], Jiang, S.Q.[Shu-Qiang],
MemBridge: Video-Language Pre-Training With Memory-Augmented Inter-Modality Bridge,
IP(32), 2023, pp. 4073-4087.
IEEE DOI 2307

WWW Link. Bridges, Transformers, Computer architecture, Task analysis, Visualization, Feature extraction, Memory modules, memory module BibRef

Wang, D.L.[Duo-Lin], Chen, Y.[Yadang], Naz, B.[Bushra], Sun, L.[Le], Li, B.Z.[Bao-Zhu],
Spatial-Aware Transformer (SAT): Enhancing Global Modeling in Transformer Segmentation for Remote Sensing Images,
RS(15), No. 14, 2023, pp. 3607.
DOI Link 2307
BibRef

Huang, X.Y.[Xin-Yan], Liu, F.[Fang], Cui, Y.H.[Yuan-Hao], Chen, P.[Puhua], Li, L.L.[Ling-Ling], Li, P.F.[Peng-Fang],
Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification,
RS(15), No. 14, 2023, pp. 3645.
DOI Link 2307
BibRef

Yao, T.[Ting], Li, Y.[Yehao], Pan, Y.W.[Ying-Wei], Wang, Y.[Yu], Zhang, X.P.[Xiao-Ping], Mei, T.[Tao],
Dual Vision Transformer,
PAMI(45), No. 9, September 2023, pp. 10870-10882.
IEEE DOI 2309
Survey, Vision Transformer. BibRef

Rao, Y.M.[Yong-Ming], Liu, Z.[Zuyan], Zhao, W.L.[Wen-Liang], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks,
PAMI(45), No. 9, September 2023, pp. 10883-10897.
IEEE DOI 2309
BibRef

Li, J.[Jie], Liu, Z.[Zhao], Li, L.[Li], Lin, J.Q.[Jun-Qin], Yao, J.[Jian], Tu, J.[Jingmin],
Multi-view convolutional vision transformer for 3D object recognition,
JVCIR(95), 2023, pp. 103906.
Elsevier DOI 2309
Multi-view, 3D object recognition, Feature fusion, Convolutional neural networks BibRef

Shang, J.H.[Jing-Huan], Li, X.[Xiang], Kahatapitiya, K.[Kumara], Lee, Y.C.[Yu-Cheol], Ryoo, M.S.[Michael S.],
StARformer: Transformer With State-Action-Reward Representations for Robot Learning,
PAMI(45), No. 11, November 2023, pp. 12862-12877.
IEEE DOI 2310
BibRef
Earlier: A1, A3, A2, A5, Only:
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning,
ECCV22(XXIX:462-479).
Springer DOI 2211
BibRef

Duan, H.R.[Hao-Ran], Long, Y.[Yang], Wang, S.D.[Shi-Dong], Zhang, H.F.[Hao-Feng], Willcocks, C.G.[Chris G.], Shao, L.[Ling],
Dynamic Unary Convolution in Transformers,
PAMI(45), No. 11, November 2023, pp. 12747-12759.
IEEE DOI 2310
BibRef

Chen, S.M.[Shi-Ming], Hong, Z.M.[Zi-Ming], Hou, W.J.[Wen-Jin], Xie, G.S.[Guo-Sen], Song, Y.B.[Yi-Bing], Zhao, J.[Jian], You, X.G.[Xin-Ge], Yan, S.C.[Shui-Cheng], Shao, L.[Ling],
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning,
PAMI(45), No. 11, November 2023, pp. 12844-12861.
IEEE DOI 2310
BibRef

Qian, S.J.[Sheng-Ju], Zhu, Y.[Yi], Li, W.B.[Wen-Bo], Li, M.[Mu], Jia, J.Y.[Jia-Ya],
What Makes for Good Tokenizers in Vision Transformer?,
PAMI(45), No. 11, November 2023, pp. 13011-13023.
IEEE DOI 2310
BibRef

Sun, W.X.[Wei-Xuan], Qin, Z.[Zhen], Deng, H.[Hui], Wang, J.[Jianyuan], Zhang, Y.[Yi], Zhang, K.[Kaihao], Barnes, N.[Nick], Birchfield, S.[Stan], Kong, L.P.[Ling-Peng], Zhong, Y.[Yiran],
Vicinity Vision Transformer,
PAMI(45), No. 10, October 2023, pp. 12635-12649.
IEEE DOI 2310
BibRef

Cao, C.J.[Chen-Jie], Dong, Q.[Qiaole], Fu, Y.W.[Yan-Wei],
ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors,
PAMI(45), No. 10, October 2023, pp. 12667-12684.
IEEE DOI 2310
BibRef

Fang, Y.X.[Yu-Xin], Wang, X.G.[Xing-Gang], Wu, R.[Rui], Liu, W.Y.[Wen-Yu],
What Makes for Hierarchical Vision Transformer?,
PAMI(45), No. 10, October 2023, pp. 12714-12720.
IEEE DOI 2310
BibRef

Xu, P.[Peng], Zhu, X.T.[Xia-Tian], Clifton, D.A.[David A.],
Multimodal Learning With Transformers: A Survey,
PAMI(45), No. 10, October 2023, pp. 12113-12132.
IEEE DOI 2310
BibRef

Liu, J.[Jun], Guo, H.R.[Hao-Ran], He, Y.[Yile], Li, H.L.[Hua-Li],
Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification,
RS(15), No. 21, 2023, pp. 5208.
DOI Link 2311
BibRef

Lin, M.B.[Ming-Bao], Chen, M.Z.[Meng-Zhao], Zhang, Y.X.[Yu-Xin], Shen, C.H.[Chun-Hua], Ji, R.R.[Rong-Rong], Cao, L.J.[Liu-Juan],
Super Vision Transformer,
IJCV(131), No. 12, December 2023, pp. 3136-3151.
Springer DOI 2311
BibRef

Li, Z.Y.[Zhong-Yu], Gao, S.H.[Shang-Hua], Cheng, M.M.[Ming-Ming],
SERE: Exploring Feature Self-Relation for Self-Supervised Transformer,
PAMI(45), No. 12, December 2023, pp. 15619-15631.
IEEE DOI 2311
BibRef

Yuan, Y.H.[Yu-Hui], Liang, W.C.[Wei-Cong], Ding, H.H.[Heng-Hui], Liang, Z.H.[Zhan-Hao], Zhang, C.[Chao], Hu, H.[Han],
Expediting Large-Scale Vision Transformer for Dense Prediction Without Fine-Tuning,
PAMI(46), No. 1, January 2024, pp. 250-266.
IEEE DOI 2312
BibRef

Jiao, J.[Jiayu], Tang, Y.M.[Yu-Ming], Lin, K.Y.[Kun-Yu], Gao, Y.P.[Yi-Peng], Ma, A.J.[Andy J.], Wang, Y.W.[Yao-Wei], Zheng, W.S.[Wei-Shi],
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition,
MultMed(25), 2023, pp. 8906-8919.
IEEE DOI Code:
HTML Version. 2312
BibRef

Li, Z.[Zihan], Li, Y.X.[Yun-Xiang], Li, Q.D.[Qing-De], Wang, P.[Puyang], Guo, D.[Dazhou], Lu, L.[Le], Jin, D.[Dakai], Zhang, Y.[You], Hong, Q.Q.[Qing-Qi],
LViT: Language Meets Vision Transformer in Medical Image Segmentation,
MedImg(43), No. 1, January 2024, pp. 96-107.
IEEE DOI Code:
WWW Link. 2401
BibRef

Fu, K.[Kexue], Yuan, M.Z.[Ming-Zhi], Liu, S.L.[Shao-Lei], Wang, M.[Manning],
Boosting Point-BERT by Multi-Choice Tokens,
CirSysVideo(34), No. 1, January 2024, pp. 438-447.
IEEE DOI 2401
self-supervised pre-training task.
See also Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. BibRef

Ghosal, S.S.[Soumya Suvra], Li, Y.X.[Yi-Xuan],
Are Vision Transformers Robust to Spurious Correlations?,
IJCV(132), No. 3, March 2024, pp. 689-709.
Springer DOI 2402
BibRef

Yan, F.Y.[Fang-Yuan], Yan, B.[Bin], Liang, W.[Wei], Pei, M.T.[Ming-Tao],
Token labeling-guided multi-scale medical image classification,
PRL(178), 2024, pp. 28-34.
Elsevier DOI 2402
Medical image classification, Vision transformer, Token labeling BibRef

Li, Y.X.[Yue-Xiang], Huang, Y.W.[Ya-Wen], He, N.[Nanjun], Ma, K.[Kai], Zheng, Y.F.[Ye-Feng],
Improving vision transformer for medical image classification via token-wise perturbation,
JVCIR(98), 2024, pp. 104022.
Elsevier DOI 2402
Self-supervised learning, Vision transformer, Image classification BibRef


Li, K.C.[Kun-Chang], Wang, Y.[Yali], Li, Y.Z.[Yi-Zhuo], Wang, Y.[Yi], He, Y.[Yinan], Wang, L.M.[Li-Min], Qiao, Y.[Yu],
Unmasked Teacher: Towards Training-Efficient Video Foundation Models,
ICCV23(19891-19903)
IEEE DOI 2401
BibRef

Ding, S.R.[Shuang-Rui], Zhao, P.S.[Pei-Sen], Zhang, X.P.[Xiao-Peng], Qian, R.[Rui], Xiong, H.K.[Hong-Kai], Tian, Q.[Qi],
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation,
ICCV23(16899-16910)
IEEE DOI Code:
WWW Link. 2401
BibRef

Chen, M.Z.[Meng-Zhao], Lin, M.[Mingbao], Lin, Z.H.[Zhi-Hang], Zhang, Y.X.[Yu-Xin], Chao, F.[Fei], Ji, R.R.[Rong-Rong],
SMMix: Self-Motivated Image Mixing for Vision Transformers,
ICCV23(17214-17224)
IEEE DOI Code:
WWW Link. 2401
BibRef

Kim, D.[Dahun], Angelova, A.[Anelia], Kuo, W.C.[Wei-Cheng],
Contrastive Feature Masking Open-Vocabulary Vision Transformer,
ICCV23(15556-15566)
IEEE DOI 2401
BibRef

Zhang, Y.[Yuke], Chen, D.[Dake], Kundu, S.[Souvik], Li, C.H.[Cheng-Hao], Beerel, P.A.[Peter A.],
SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation,
ICCV23(5093-5102)
IEEE DOI 2401
BibRef

Li, Z.[Zhikai], Gu, Q.Y.[Qing-Yi],
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference,
ICCV23(17019-17029)
IEEE DOI Code:
WWW Link. 2401
BibRef

Frumkin, N.[Natalia], Gope, D.[Dibakar], Marculescu, D.[Diana],
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers,
ICCV23(16932-16942)
IEEE DOI Code:
WWW Link. 2401
BibRef

Li, Z.[Zhikai], Xiao, J.[Junrui], Yang, L.[Lianwei], Gu, Q.Y.[Qing-Yi],
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers,
ICCV23(17181-17190)
IEEE DOI Code:
WWW Link. 2401
BibRef

Havtorn, J.D.[Jakob Drachmann], Royer, A.[Amélie], Blankevoort, T.[Tijmen], Bejnordi, B.E.[Babak Ehteshami],
MSViT: Dynamic Mixed-scale Tokenization for Vision Transformers,
NIVT23(838-848)
IEEE DOI 2401
BibRef

Haurum, J.B.[Joakim Bruslund], Escalera, S.[Sergio], Taylor, G.W.[Graham W.], Moeslund, T.B.[Thomas B.],
Which Tokens to Use? Investigating Token Reduction in Vision Transformers,
NIVT23(773-783)
IEEE DOI Code:
WWW Link. 2401
BibRef

Wang, X.[Xijun], Chu, X.J.[Xiao-Jie], Han, C.[Chunrui], Zhang, X.Y.[Xiang-Yu],
SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers,
NIVT23(731-741)
IEEE DOI 2401
BibRef

Chen, Y.H.[Yi-Hsin], Weng, Y.C.[Ying-Chieh], Kao, C.H.[Chia-Hao], Chien, C.[Cheng], Chiu, W.C.[Wei-Chen], Peng, W.H.[Wen-Hsiao],
TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception,
ICCV23(23240-23250)
IEEE DOI 2401
BibRef

Li, Y.[Yanyu], Hu, J.[Ju], Wen, Y.[Yang], Evangelidis, G.[Georgios], Salahi, K.[Kamyar], Wang, Y.Z.[Yan-Zhi], Tulyakov, S.[Sergey], Ren, J.[Jian],
Rethinking Vision Transformers for MobileNet Size and Speed,
ICCV23(16843-16854)
IEEE DOI 2401
BibRef

Nurgazin, M.[Maxat], Tu, N.A.[Nguyen Anh],
A Comparative Study of Vision Transformer Encoders and Few-shot Learning for Medical Image Classification,
CVAMD23(2505-2513)
IEEE DOI 2401
BibRef

Yeganeh, Y.[Yousef], Farshad, A.[Azade], Weinberger, P.[Peter], Ahmadi, S.A.[Seyed-Ahmad], Adeli, E.[Ehsan], Navab, N.[Nassir],
Transformers Pay Attention to Convolutions Leveraging Emerging Properties of ViTs by Dual Attention-Image Network,
CVAMD23(2296-2307)
IEEE DOI 2401
BibRef

Zheng, J.H.[Jia-Hao], Yang, L.Q.[Long-Qi], Li, Y.[Yiying], Yang, K.[Ke], Wang, Z.Y.[Zhi-Yuan], Zhou, J.[Jun],
Lightweight Vision Transformer with Spatial and Channel Enhanced Self-Attention,
REDLCV23(1484-1488)
IEEE DOI 2401
BibRef

Xie, W.[Wei], Zhao, Z.[Zimeng], Li, S.Y.[Shi-Ying], Zuo, B.H.[Bing-Hui], Wang, Y.G.[Yan-Gang],
Nonrigid Object Contact Estimation With Regional Unwrapping Transformer,
ICCV23(9308-9317)
IEEE DOI 2401
BibRef

Vasu, P.K.A.[Pavan Kumar Anasosalu], Gabriel, J.[James], Zhu, J.[Jeff], Tuzel, O.[Oncel], Ranjan, A.[Anurag],
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization,
ICCV23(5762-5772)
IEEE DOI Code:
WWW Link. 2401
BibRef

Hyeon-Woo, N.[Nam], Yu-Ji, K.[Kim], Heo, B.[Byeongho], Han, D.Y.[Dong-Yoon], Oh, S.J.[Seong Joon], Oh, T.H.[Tae-Hyun],
Scratching Visual Transformer's Back with Uniform Attention,
ICCV23(5784-5795)
IEEE DOI 2401
BibRef

Tang, C.[Chen], Zhang, L.L.[Li Lyna], Jiang, H.Q.[Hui-Qiang], Xu, J.H.[Jia-Hang], Cao, T.[Ting], Zhang, Q.[Quanlu], Yang, Y.Q.[Yu-Qing], Wang, Z.[Zhi], Yang, M.[Mao],
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices,
ICCV23(5806-5817)
IEEE DOI 2401
BibRef

Ren, S.[Sucheng], Yang, X.Y.[Xing-Yi], Liu, S.[Songhua], Wang, X.C.[Xin-Chao],
SG-Former: Self-guided Transformer with Evolving Token Reallocation,
ICCV23(5980-5991)
IEEE DOI Code:
WWW Link. 2401
BibRef

Lin, W.F.[Wei-Feng], Wu, Z.H.[Zi-Heng], Chen, J.[Jiayu], Huang, J.[Jun], Jin, L.W.[Lian-Wen],
Scale-Aware Modulation Meet Transformer,
ICCV23(5992-6003)
IEEE DOI Code:
WWW Link. 2401
BibRef

Zhang, H.K.[Hao-Kui], Hu, W.Z.[Wen-Ze], Wang, X.Y.[Xiao-Yu],
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer,
ICCV23(6037-6046)
IEEE DOI Code:
WWW Link. 2401
BibRef

He, Y.F.[Ye-Fei], Lou, Z.Y.[Zhen-Yu], Zhang, L.[Luoming], Liu, J.[Jing], Wu, W.J.[Wei-Jia], Zhou, H.[Hong], Zhuang, B.[Bohan],
BiViT: Extremely Compressed Binary Vision Transformers,
ICCV23(5628-5640)
IEEE DOI 2401
BibRef

Dutson, M.[Matthew], Li, Y.[Yin], Gupta, M.[Mohit],
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers,
ICCV23(16865-16877)
IEEE DOI 2401
BibRef

Wang, Z.Q.[Zi-Qing], Fang, Y.T.[Yue-Tong], Cao, J.H.[Jia-Hang], Zhang, Q.[Qiang], Wang, Z.[Zhongrui], Xu, R.[Renjing],
Masked Spiking Transformer,
ICCV23(1761-1771)
IEEE DOI Code:
WWW Link. 2401
BibRef

Peebles, W.[William], Xie, S.[Saining],
Scalable Diffusion Models with Transformers,
ICCV23(4172-4182)
IEEE DOI 2401
BibRef

Zeng, W.X.[Wen-Xuan], Li, M.[Meng], Xiong, W.J.[Wen-Jie], Tong, T.[Tong], Lu, W.J.[Wen-Jie], Tan, J.[Jin], Wang, R.S.[Run-Sheng], Huang, R.[Ru],
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention,
ICCV23(5029-5040)
IEEE DOI Code:
WWW Link. 2401
BibRef

Mentzer, F.[Fabian], Agustson, E.[Eirikur], Tschannen, M.[Michael],
M2T: Masking Transformers Twice for Faster Decoding,
ICCV23(5317-5326)
IEEE DOI 2401
BibRef

Psomas, B.[Bill], Kakogeorgiou, I.[Ioannis], Karantzalos, K.[Konstantinos], Avrithis, Y.[Yannis],
Keep It SimPool:Who Said Supervised Transformers Suffer from Attention Deficit?,
ICCV23(5327-5337)
IEEE DOI Code:
WWW Link. 2401
BibRef

Xiao, H.[Han], Zheng, W.Z.[Wen-Zhao], Zhu, Z.[Zheng], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Token-Label Alignment for Vision Transformers,
ICCV23(5472-5481)
IEEE DOI Code:
WWW Link. 2401
BibRef

Yu, R.Y.[Run-Yi], Wang, Z.N.[Zhen-Nan], Wang, Y.H.[Yin-Huai], Li, K.[Kehan], Liu, C.[Chang], Duan, H.[Haoyi], Ji, X.Y.[Xiang-Yang], Chen, J.[Jie],
LaPE: Layer-adaptive Position Embedding for Vision Transformers with Independent Layer Normalization,
ICCV23(5863-5873)
IEEE DOI 2401
BibRef

Roy, A.[Anurag], Verma, V.K.[Vinay K.], Voonna, S.[Sravan], Ghosh, K.[Kripabandhu], Ghosh, S.[Saptarshi], Das, A.[Abir],
Exemplar-Free Continual Transformer with Convolutions,
ICCV23(5874-5884)
IEEE DOI 2401
BibRef

Xu, Y.X.[Yi-Xing], Li, C.[Chao], Li, D.[Dong], Sheng, X.[Xiao], Jiang, F.[Fan], Tian, L.[Lu], Sirasao, A.[Ashish],
FDViT: Improve the Hierarchical Architecture of Vision Transformer,
ICCV23(5927-5937)
IEEE DOI 2401
BibRef

Han, D.C.[Dong-Chen], Pan, X.[Xuran], Han, Y.Z.[Yi-Zeng], Song, S.[Shiji], Huang, G.[Gao],
FLatten Transformer: Vision Transformer using Focused Linear Attention,
ICCV23(5938-5948)
IEEE DOI Code:
WWW Link. 2401
BibRef

Chen, Y.J.[Yong-Jie], Liu, H.M.[Hong-Min], Yin, H.R.[Hao-Ran], Fan, B.[Bin],
Building Vision Transformers with Hierarchy Aware Feature Aggregation,
ICCV23(5885-5895)
IEEE DOI 2401
BibRef

Quétu, V.[Victor], Milovanovic, M.[Marta], Tartaglione, E.[Enzo],
Sparse Double Descent in Vision Transformers: Real or Phantom Threat?,
CIAP23(II:490-502).
Springer DOI 2312
BibRef

Ak, K.E.[Kenan Emir], Lee, G.G.[Gwang-Gook], Xu, Y.[Yan], Shen, M.W.[Ming-Wei],
Leveraging Efficient Training and Feature Fusion in Transformers for Multimodal Classification,
ICIP23(1420-1424)
IEEE DOI 2312
BibRef

Popovic, N.[Nikola], Paudel, D.P.[Danda Pani], Probst, T.[Thomas], Van Gool, L.J.[Luc J.],
Token-Consistent Dropout For Calibrated Vision Transformers,
ICIP23(1030-1034)
IEEE DOI 2312
BibRef

Sajjadi, M.S.M.[Mehdi S. M.], Mahendran, A.[Aravindh], Kipf, T.[Thomas], Pot, E.[Etienne], Duckworth, D.[Daniel], Lucic, M.[Mario], Greff, K.[Klaus],
RUST: Latent Neural Scene Representations from Unposed Imagery,
CVPR23(17297-17306)
IEEE DOI 2309
BibRef

Bowman, B.[Benjamin], Achille, A.[Alessandro], Zancato, L.[Luca], Trager, M.[Matthew], Perera, P.[Pramuditha], Paolini, G.[Giovanni], Soatto, S.[Stefano],
Ŕ-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting,
CVPR23(14984-14993)
IEEE DOI 2309
BibRef

Nakhli, R.[Ramin], Moghadam, P.A.[Puria Azadi], Mi, H.Y.[Hao-Yang], Farahani, H.[Hossein], Baras, A.[Alexander], Gilks, B.[Blake], Bashashati, A.[Ali],
Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images,
CVPR23(11547-11557)
IEEE DOI 2309
BibRef

Gärtner, E.[Erik], Metz, L.[Luke], Andriluka, M.[Mykhaylo], Freeman, C.D.[C. Daniel], Sminchisescu, C.[Cristian],
Transformer-Based Learned Optimization,
CVPR23(11970-11979)
IEEE DOI 2309
BibRef

Li, J.C.[Jia-Chen], Hassani, A.[Ali], Walton, S.[Steven], Shi, H.[Humphrey],
ConvMLP: Hierarchical Convolutional MLPs for Vision,
WFM23(6307-6316)
IEEE DOI 2309
multi-layer perceptron BibRef

Walmer, M.[Matthew], Suri, S.[Saksham], Gupta, K.[Kamal], Shrivastava, A.[Abhinav],
Teaching Matters: Investigating the Role of Supervision in Vision Transformers,
CVPR23(7486-7496)
IEEE DOI 2309
BibRef

Wang, S.G.[Shi-Guang], Xie, T.[Tao], Cheng, J.[Jian], Zhang, X.C.[Xing-Cheng], Liu, H.J.[Hai-Jun],
MDL-NAS: A Joint Multi-domain Learning Framework for Vision Transformer,
CVPR23(20094-20104)
IEEE DOI 2309
BibRef

Ko, D.[Dohwan], Choi, J.[Joonmyung], Choi, H.K.[Hyeong Kyu], On, K.W.[Kyoung-Woon], Roh, B.[Byungseok], Kim, H.W.J.[Hyun-Woo J.],
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models,
CVPR23(20105-20115)
IEEE DOI 2309
BibRef

Ren, S.[Sucheng], Wei, F.Y.[Fang-Yun], Zhang, Z.[Zheng], Hu, H.[Han],
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models,
CVPR23(3687-3697)
IEEE DOI 2309
BibRef

He, J.F.[Jian-Feng], Gao, Y.[Yuan], Zhang, T.Z.[Tian-Zhu], Zhang, Z.[Zhe], Wu, F.[Feng],
D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers,
CVPR23(2904-2914)
IEEE DOI 2309
BibRef

Chen, X.Y.[Xuan-Yao], Liu, Z.J.[Zhi-Jian], Tang, H.T.[Hao-Tian], Yi, L.[Li], Zhao, H.[Hang], Han, S.[Song],
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer,
CVPR23(2061-2070)
IEEE DOI 2309
BibRef

Wei, S.Y.[Si-Yuan], Ye, T.Z.[Tian-Zhu], Zhang, S.[Shen], Tang, Y.[Yao], Liang, J.J.[Jia-Jun],
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers,
CVPR23(2092-2101)
IEEE DOI 2309
BibRef

Lin, Y.B.[Yan-Bo], Sung, Y.L.[Yi-Lin], Lei, J.[Jie], Bansal, M.[Mohit], Bertasius, G.[Gedas],
Vision Transformers are Parameter-Efficient Audio-Visual Learners,
CVPR23(2299-2309)
IEEE DOI 2309
BibRef

Das, R.[Rajshekhar], Dukler, Y.[Yonatan], Ravichandran, A.[Avinash], Swaminathan, A.[Ashwin],
Learning Expressive Prompting With Residuals for Vision Transformers,
CVPR23(3366-3377)
IEEE DOI 2309
BibRef

Zheng, M.X.[Meng-Xin], Lou, Q.[Qian], Jiang, L.[Lei],
TrojViT: Trojan Insertion in Vision Transformers,
CVPR23(4025-4034)
IEEE DOI 2309
BibRef

Guo, Y.[Yong], Stutz, D.[David], Schiele, B.[Bernt],
Improving Robustness of Vision Transformers by Reducing Sensitivity to Patch Corruptions,
CVPR23(4108-4118)
IEEE DOI 2309
BibRef

Li, Y.X.[Yan-Xi], Xu, C.[Chang],
Trade-off between Robustness and Accuracy of Vision Transformers,
CVPR23(7558-7568)
IEEE DOI 2309
BibRef

Tarasiou, M.[Michail], Chavez, E.[Erik], Zafeiriou, S.[Stefanos],
ViTs for SITS: Vision Transformers for Satellite Image Time Series,
CVPR23(10418-10428)
IEEE DOI 2309
BibRef

Yu, Z.Z.[Zhong-Zhi], Wu, S.[Shang], Fu, Y.G.[Yong-Gan], Zhang, S.[Shunyao], Lin, Y.Y.C.[Ying-Yan Celine],
Hint-Aug: Drawing Hints from Foundation Vision Transformers towards Boosted Few-shot Parameter-Efficient Tuning,
CVPR23(11102-11112)
IEEE DOI 2309
BibRef

Kim, D.[Dahun], Angelova, A.[Anelia], Kuo, W.C.[Wei-Cheng],
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers,
CVPR23(11144-11154)
IEEE DOI 2309
BibRef

Hou, J.[Ji], Dai, X.L.[Xiao-Liang], He, Z.J.[Zi-Jian], Dai, A.[Angela], Nießner, M.[Matthias],
Mask3D: Pretraining 2D Vision Transformers by Learning Masked 3D Priors,
CVPR23(13510-13519)
IEEE DOI 2309
BibRef

Xu, Z.Z.[Zheng-Zhuo], Liu, R.[Ruikang], Yang, S.[Shuo], Chai, Z.[Zenghao], Yuan, C.[Chun],
Learning Imbalanced Data with Vision Transformers,
CVPR23(15793-15803)
IEEE DOI 2309
BibRef

Zhang, J.P.[Jian-Ping], Huang, Y.Z.[Yi-Zhan], Wu, W.B.[Wei-Bin], Lyu, M.R.[Michael R.],
Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization,
CVPR23(16415-16424)
IEEE DOI 2309
BibRef

Yang, H.[Huanrui], Yin, H.X.[Hong-Xu], Shen, M.[Maying], Molchanov, P.[Pavlo], Li, H.[Hai], Kautz, J.[Jan],
Global Vision Transformer Pruning with Hessian-Aware Saliency,
CVPR23(18547-18557)
IEEE DOI 2309
BibRef

Nakamura, R.[Ryo], Kataoka, H.[Hirokatsu], Takashima, S.[Sora], Noriega, E.J.M.[Edgar Josafat Martinez], Yokota, R.[Rio], Inoue, N.[Nakamasa],
Pre-training Vision Transformers with Very Limited Synthesized Images,
ICCV23(20303-20312)
IEEE DOI 2401
BibRef

Takashima, S.[Sora], Hayamizu, R.[Ryo], Inoue, N.[Nakamasa], Kataoka, H.[Hirokatsu], Yokota, R.[Rio],
Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves,
CVPR23(18579-18588)
IEEE DOI 2309
BibRef

Kang, D.[Dahyun], Koniusz, P.[Piotr], Cho, M.[Minsu], Murray, N.[Naila],
Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification and Segmentation,
CVPR23(19627-19638)
IEEE DOI 2309
BibRef

Liu, Y.J.[Yi-Jiang], Yang, H.R.[Huan-Rui], Dong, Z.[Zhen], Keutzer, K.[Kurt], Du, L.[Li], Zhang, S.H.[Shang-Hang],
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers,
CVPR23(20321-20330)
IEEE DOI 2309
BibRef

Park, J.[Jeongsoo], Johnson, J.[Justin],
RGB No More: Minimally-Decoded JPEG Vision Transformers,
CVPR23(22334-22346)
IEEE DOI 2309
BibRef

Yu, C.[Chong], Chen, T.[Tao], Gan, Z.X.[Zhong-Xue], Fan, J.Y.[Jia-Yuan],
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization,
CVPR23(22658-22668)
IEEE DOI 2309
BibRef

Bao, F.[Fan], Nie, S.[Shen], Xue, K.W.[Kai-Wen], Cao, Y.[Yue], Li, C.X.[Chong-Xuan], Su, H.[Hang], Zhu, J.[Jun],
All are Worth Words: A ViT Backbone for Diffusion Models,
CVPR23(22669-22679)
IEEE DOI 2309
BibRef

Li, B.[Bonan], Hu, Y.[Yinhan], Nie, X.C.[Xue-Cheng], Han, C.Y.[Cong-Ying], Jiang, X.J.[Xiang-Jian], Guo, T.D.[Tian-De], Liu, L.Q.[Luo-Qi],
DropKey for Vision Transformer,
CVPR23(22700-22709)
IEEE DOI 2309
BibRef

Lan, S.Y.[Shi-Yi], Yang, X.[Xitong], Yu, Z.[Zhiding], Wu, Z.[Zuxuan], Alvarez, J.M.[Jose M.], Anandkumar, A.[Anima],
Vision Transformers are Good Mask Auto-Labelers,
CVPR23(23745-23755)
IEEE DOI 2309
BibRef

Yu, L.[Lu], Xiang, W.[Wei],
X-Pruner: eXplainable Pruning for Vision Transformers,
CVPR23(24355-24363)
IEEE DOI 2309
BibRef

Singh, A.[Apoorv],
Training Strategies for Vision Transformers for Object Detection,
WAD23(110-118)
IEEE DOI 2309
BibRef

Hukkelĺs, H.[Hĺkon], Lindseth, F.[Frank],
Does Image Anonymization Impact Computer Vision Training?,
WAD23(140-150)
IEEE DOI 2309
BibRef

Marnissi, M.A.[Mohamed Amine],
Revolutionizing Thermal Imaging: GAN-Based Vision Transformers for Image Enhancement,
ICIP23(2735-2739)
IEEE DOI 2312
BibRef

Marnissi, M.A.[Mohamed Amine], Fathallah, A.[Abir],
GAN-based Vision Transformer for High-Quality Thermal Image Enhancement,
GCV23(817-825)
IEEE DOI 2309
BibRef

Scheibenreif, L.[Linus], Mommert, M.[Michael], Borth, D.[Damian],
Masked Vision Transformers for Hyperspectral Image Classification,
EarthVision23(2166-2176)
IEEE DOI 2309
BibRef

Komorowski, P.[Piotr], Baniecki, H.[Hubert], Biecek, P.[Przemyslaw],
Towards Evaluating Explanations of Vision Transformers for Medical Imaging,
XAI4CV23(3726-3732)
IEEE DOI 2309
BibRef

Nalmpantis, A.[Angelos], Panagiotopoulos, A.[Apostolos], Gkountouras, J.[John], Papakostas, K.[Konstantinos], Aziz, W.[Wilker],
Vision DiffMask: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking,
XAI4CV23(3756-3763)
IEEE DOI 2309
BibRef

Ronen, T.[Tomer], Levy, O.[Omer], Golbert, A.[Avram],
Vision Transformers with Mixed-Resolution Tokenization,
ECV23(4613-4622)
IEEE DOI 2309
BibRef

Le, P.H.C.[Phuoc-Hoan Charles], Li, X.[Xinlin],
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models,
ECV23(4665-4674)
IEEE DOI 2309
BibRef

Ma, D.[Dongning], Zhao, P.F.[Peng-Fei], Jiao, X.[Xun],
PerfHD: Efficient ViT Architecture Performance Ranking using Hyperdimensional Computing,
NAS23(2230-2237)
IEEE DOI 2309
BibRef

Wang, J.[Jun], Alamayreh, O.[Omran], Tondi, B.[Benedetta], Barni, M.[Mauro],
Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture,
WMF23(953-962)
IEEE DOI 2309
BibRef

Tian, R.[Rui], Wu, Z.[Zuxuan], Dai, Q.[Qi], Hu, H.[Han], Qiao, Y.[Yu], Jiang, Y.G.[Yu-Gang],
ResFormer: Scaling ViTs with Multi-Resolution Training,
CVPR23(22721-22731)
IEEE DOI 2309
BibRef

Li, Y.[Yi], Min, K.[Kyle], Tripathi, S.[Subarna], Vasconcelos, N.M.[Nuno M.],
SViTT: Temporal Learning of Sparse Video-Text Transformers,
CVPR23(18919-18929)
IEEE DOI 2309
BibRef

Beyer, L.[Lucas], Izmailov, P.[Pavel], Kolesnikov, A.[Alexander], Caron, M.[Mathilde], Kornblith, S.[Simon], Zhai, X.H.[Xiao-Hua], Minderer, M.[Matthias], Tschannen, M.[Michael], Alabdulmohsin, I.[Ibrahim], Pavetic, F.[Filip],
FlexiViT: One Model for All Patch Sizes,
CVPR23(14496-14506)
IEEE DOI 2309
BibRef

Chang, S.N.[Shu-Ning], Wang, P.[Pichao], Lin, M.[Ming], Wang, F.[Fan], Zhang, D.J.H.[David Jun-Hao], Jin, R.[Rong], Shou, M.Z.[Mike Zheng],
Making Vision Transformers Efficient from A Token Sparsification View,
CVPR23(6195-6205)
IEEE DOI 2309
BibRef

Naeem, M.F.[Muhammad Ferjad], Khan, M.G.Z.A.[Muhammad Gul Zain Ali], Xian, Y.Q.[Yong-Qin], Afzal, M.Z.[Muhammad Zeshan], Stricker, D.[Didier], Van Gool, L.J.[Luc J.], Tombari, F.[Federico],
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification,
CVPR23(15169-15179)
IEEE DOI 2309
BibRef

Phan, L.[Lam], Nguyen, H.T.H.[Hiep Thi Hong], Warrier, H.[Harikrishna], Gupta, Y.[Yogesh],
Patch Embedding as Local Features: Unifying Deep Local and Global Features via Vision Transformer for Image Retrieval,
ACCV22(II:204-221).
Springer DOI 2307
BibRef

Guo, X.D.[Xin-Dong], Sun, Y.[Yu], Zhao, R.[Rong], Kuang, L.Q.[Li-Qun], Han, X.[Xie],
SWPT: Spherical Window-based Point Cloud Transformer,
ACCV22(I:396-412).
Springer DOI 2307
BibRef

Wang, W.J.[Wen-Ju], Chen, G.[Gang], Zhou, H.R.[Hao-Ran], Wang, X.L.[Xiao-Lin],
OVPT: Optimal Viewset Pooling Transformer for 3d Object Recognition,
ACCV22(I:486-503).
Springer DOI 2307
BibRef

Kim, D.[Daeho], Kim, J.[Jaeil],
Vision Transformer Compression and Architecture Exploration with Efficient Embedding Space Search,
ACCV22(III:524-540).
Springer DOI 2307
BibRef

Lee, Y.S.[Yun-Sung], Lee, G.[Gyuseong], Ryoo, K.[Kwangrok], Go, H.[Hyojun], Park, J.[Jihye], Kim, S.[Seungryong],
Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling,
VIPriors22(706-720).
Springer DOI 2304
Transformers vs. CNN different benefits. Best of both. BibRef

Amir, S.[Shir], Gandelsman, Y.[Yossi], Bagon, S.[Shai], Dekel, T.[Tali],
On the Effectiveness of VIT Features as Local Semantic Descriptors,
SelfLearn22(39-55).
Springer DOI 2304
BibRef

Deng, X.[Xuran], Liu, C.B.[Chuan-Bin], Lu, Z.Y.[Zhi-Ying],
Recombining Vision Transformer Architecture for Fine-grained Visual Categorization,
MMMod23(II: 127-138).
Springer DOI 2304
BibRef

Tonkes, V.[Vincent], Sabatelli, M.[Matthia],
How Well Do Vision Transformers (vts) Transfer to the Non-natural Image Domain? An Empirical Study Involving Art Classification,
VisArt22(234-250).
Springer DOI 2304
BibRef

Rangrej, S.B.[Samrudhdhi B], Liang, K.J.[Kevin J], Hassner, T.[Tal], Clark, J.J.[James J],
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction,
WACV23(3402-3412)
IEEE DOI 2302
Predictive models, Transformers, Cameras, Spatiotemporal phenomena, Sensors, Observability BibRef

Mo, S.T.[Shen-Tong], Sun, Z.[Zhun], Li, C.[Chao],
Multi-level Contrastive Learning for Self-Supervised Vision Transformers,
WACV23(2777-2786)
IEEE DOI 2302
Training, Representation learning, Head, Semantic segmentation, Self-supervised learning, visual reasoning BibRef

Liu, Y.[Yue], Matsoukas, C.[Christos], Strand, F.[Fredrik], Azizpour, H.[Hossein], Smith, K.[Kevin],
PatchDropout: Economizing Vision Transformers Using Patch Dropout,
WACV23(3942-3951)
IEEE DOI 2302
Training, Image resolution, Computational modeling, Biological system modeling, Memory management, Transformers, Biomedical/healthcare/medicine BibRef

Marin, D.[Dmitrii], Chang, J.H.R.[Jen-Hao Rick], Ranjan, A.[Anurag], Prabhu, A.[Anish], Rastegari, M.[Mohammad], Tuzel, O.[Oncel],
Token Pooling in Vision Transformers for Image Classification,
WACV23(12-21)
IEEE DOI 2302
Filtering, Semantic segmentation, Pose estimation, Transformers, Encoding, Convolutional neural networks, and algorithms (including transfer) BibRef

Song, C.H.[Chull Hwan], Yoon, J.Y.[Joo-Young], Choi, S.[Shunghyun], Avrithis, Y.[Yannis],
Boosting vision transformers for image retrieval,
WACV23(107-117)
IEEE DOI 2302
Training, Location awareness, Image retrieval, Self-supervised learning, Image representation, Transformers BibRef

Yang, J.[Jinyu], Liu, J.J.[Jing-Jing], Xu, N.[Ning], Huang, J.Z.[Jun-Zhou],
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation,
WACV23(520-530)
IEEE DOI 2302
Benchmark testing, Image representation, Transformers, Convolutional neural networks, Task analysis, and algorithms (including transfer) BibRef

Saavedra-Ruiz, M.[Miguel], Morin, S.[Sacha], Paull, L.[Liam],
Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers,
CRV22(197-204)
IEEE DOI 2301
Adaptation models, Image segmentation, Image resolution, Navigation, Transformers, Robot sensing systems, Visual Servoing BibRef

Patel, K.[Krushi], Bur, A.M.[Andrés M.], Li, F.J.[Feng-Jun], Wang, G.H.[Guang-Hui],
Aggregating Global Features into Local Vision Transformer,
ICPR22(1141-1147)
IEEE DOI 2212
Source coding, Computational modeling, Information processing, Performance gain, Transformers BibRef

Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zechun], Xing, E.[Eric],
Sliced Recursive Transformer,
ECCV22(XXIV:727-744).
Springer DOI 2211
BibRef

Shao, Y.[Yidi], Loy, C.C.[Chen Change], Dai, B.[Bo],
Transformer with Implicit Edges for Particle-Based Physics Simulation,
ECCV22(XIX:549-564).
Springer DOI 2211
BibRef

Wang, W.[Wen], Zhang, J.[Jing], Cao, Y.[Yang], Shen, Y.L.[Yong-Liang], Tao, D.C.[Da-Cheng],
Towards Data-Efficient Detection Transformers,
ECCV22(IX:88-105).
Springer DOI 2211
BibRef

Lorenzana, M.B.[Marlon Bran], Engstrom, C.[Craig], Chandra, S.S.[Shekhar S.],
Transformer Compressed Sensing Via Global Image Tokens,
ICIP22(3011-3015)
IEEE DOI 2211
Training, Limiting, Image resolution, Neural networks, Image representation, Transformers, MRI BibRef

Lu, X.Y.[Xiao-Yong], Du, S.[Songlin],
NCTR: Neighborhood Consensus Transformer for Feature Matching,
ICIP22(2726-2730)
IEEE DOI 2211
Learning systems, Impedance matching, Aggregates, Pose estimation, Neural networks, Transformers, Local feature matching, graph neural network BibRef

Jeny, A.A.[Afsana Ahsan], Junayed, M.S.[Masum Shah], Islam, M.B.[Md Baharul],
An Efficient End-To-End Image Compression Transformer,
ICIP22(1786-1790)
IEEE DOI 2211
Image coding, Correlation, Limiting, Computational modeling, Rate-distortion, Video compression, Transformers, entropy model BibRef

Bai, J.W.[Jia-Wang], Yuan, L.[Li], Xia, S.T.[Shu-Tao], Yan, S.C.[Shui-Cheng], Li, Z.F.[Zhi-Feng], Liu, W.[Wei],
Improving Vision Transformers by Revisiting High-Frequency Components,
ECCV22(XXIV:1-18).
Springer DOI 2211
BibRef

Li, K.[Kehan], Yu, R.[Runyi], Wang, Z.[Zhennan], Yuan, L.[Li], Song, G.[Guoli], Chen, J.[Jie],
Locality Guidance for Improving Vision Transformers on Tiny Datasets,
ECCV22(XXIV:110-127).
Springer DOI 2211
BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.C.[Alan C.], Li, Y.[Yinxiao],
MaxViT: Multi-axis Vision Transformer,
ECCV22(XXIV:459-479).
Springer DOI 2211
BibRef

Yang, R.[Rui], Ma, H.L.[Hai-Long], Wu, J.[Jie], Tang, Y.S.[Yan-Song], Xiao, X.F.[Xue-Feng], Zheng, M.[Min], Li, X.[Xiu],
ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer,
ECCV22(XXIV:480-496).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], El-Nouby, A.[Alaaeldin], Verbeek, J.[Jakob], Jégou, H.[Hervé],
Three Things Everyone Should Know About Vision Transformers,
ECCV22(XXIV:497-515).
Springer DOI 2211
BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Jégou, H.[Hervé],
DeiT III: Revenge of the ViT,
ECCV22(XXIV:516-533).
Springer DOI 2211
BibRef

Li, Y.H.[Yang-Hao], Mao, H.Z.[Han-Zi], Girshick, R.[Ross], He, K.M.[Kai-Ming],
Exploring Plain Vision Transformer Backbones for Object Detection,
ECCV22(IX:280-296).
Springer DOI 2211
BibRef

Yu, Q.H.[Qi-Hang], Wang, H.Y.[Hui-Yu], Qiao, S.Y.[Si-Yuan], Collins, M.[Maxwell], Zhu, Y.K.[Yu-Kun], Adam, H.[Hartwig], Yuille, A.L.[Alan L.], Chen, L.C.[Liang-Chieh],
k-means Mask Transformer,
ECCV22(XXIX:288-307).
Springer DOI 2211
BibRef

Pham, K.[Khoi], Kafle, K.[Kushal], Lin, Z.[Zhe], Ding, Z.H.[Zhi-Hong], Cohen, S.[Scott], Tran, Q.[Quan], Shrivastava, A.[Abhinav],
Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers,
ECCV22(XXV:201-219).
Springer DOI 2211
BibRef

Yu, W.X.[Wen-Xin], Zhang, H.[Hongru], Lan, T.X.[Tian-Xiang], Hu, Y.C.[Yu-Cheng], Yin, D.[Dong],
CBPT: A New Backbone for Enhancing Information Transmission of Vision Transformers,
ICIP22(156-160)
IEEE DOI 2211
Merging, Information processing, Object detection, Transformers, Computational complexity, Vision Transformer, Backbone BibRef

Takeda, M.[Mana], Yanai, K.[Keiji],
Continual Learning in Vision Transformer,
ICIP22(616-620)
IEEE DOI 2211
Learning systems, Image recognition, Transformers, Natural language processing, Convolutional neural networks, Vision Transformer BibRef

Zhou, W.L.[Wei-Lian], Kamata, S.I.[Sei-Ichiro], Luo, Z.[Zhengbo], Xue, X.[Xi],
Rethinking Unified Spectral-Spatial-Based Hyperspectral Image Classification Under 3D Configuration of Vision Transformer,
ICIP22(711-715)
IEEE DOI 2211
Flowcharts, Correlation, Convolution, Transformers, Hyperspectral image classification, 3D coordinate positional embedding BibRef

Li, J.[Junbo], Zhang, H.[Huan], Xie, C.[Cihang],
ViP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers,
ECCV22(XXV:573-587).
Springer DOI 2211
BibRef

Cao, Y.H.[Yun-Hao], Yu, H.[Hao], Wu, J.X.[Jian-Xin],
Training Vision Transformers with only 2040 Images,
ECCV22(XXV:220-237).
Springer DOI 2211
BibRef

Wang, C.[Cong], Xu, H.M.[Hong-Min], Zhang, X.[Xiong], Wang, L.[Li], Zheng, Z.[Zhitong], Liu, H.F.[Hai-Feng],
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger,
ECCV22(XX:739-756).
Springer DOI 2211
BibRef

Wu, B.[Boxi], Gu, J.D.[Jin-Dong], Li, Z.F.[Zhi-Feng], Cai, D.[Deng], He, X.F.[Xiao-Fei], Liu, W.[Wei],
Towards Efficient Adversarial Training on Vision Transformers,
ECCV22(XIII:307-325).
Springer DOI 2211
BibRef

Gu, J.D.[Jin-Dong], Tresp, V.[Volker], Qin, Y.[Yao],
Are Vision Transformers Robust to Patch Perturbations?,
ECCV22(XII:404-421).
Springer DOI 2211
BibRef

Zong, Z.[Zhuofan], Li, K.[Kunchang], Song, G.[Guanglu], Wang, Y.[Yali], Qiao, Y.[Yu], Leng, B.[Biao], Liu, Y.[Yu],
Self-slimmed Vision Transformer,
ECCV22(XI:432-448).
Springer DOI 2211
BibRef

Fayyaz, M.[Mohsen], Koohpayegani, S.A.[Soroush Abbasi], Jafari, F.R.[Farnoush Rezaei], Sengupta, S.[Sunando], Joze, H.R.V.[Hamid Reza Vaezi], Sommerlade, E.[Eric], Pirsiavash, H.[Hamed], Gall, J.[Jürgen],
Adaptive Token Sampling for Efficient Vision Transformers,
ECCV22(XI:396-414).
Springer DOI 2211
BibRef

Li, Z.K.[Zhi-Kai], Ma, L.P.[Li-Ping], Chen, M.J.[Meng-Juan], Xiao, J.R.[Jun-Rui], Gu, Q.Y.[Qing-Yi],
Patch Similarity Aware Data-Free Quantization for Vision Transformers,
ECCV22(XI:154-170).
Springer DOI 2211
BibRef

Weng, Z.J.[Ze-Jia], Yang, X.T.[Xi-Tong], Li, A.[Ang], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang],
Semi-supervised Vision Transformers,
ECCV22(XXX:605-620).
Springer DOI 2211
BibRef

Su, T.[Tong], Ye, S.[Shuo], Song, C.Q.[Cheng-Qun], Cheng, J.[Jun],
Mask-Vit: an Object Mask Embedding in Vision Transformer for Fine-Grained Visual Classification,
ICIP22(1626-1630)
IEEE DOI 2211
Knowledge engineering, Visualization, Focusing, Interference, Benchmark testing, Transformers, Feature extraction, Knowledge Embedding BibRef

Gai, L.[Lulu], Chen, W.[Wei], Gao, R.[Rui], Chen, Y.W.[Yan-Wei], Qiao, X.[Xu],
Using Vision Transformers in 3-D Medical Image Classifications,
ICIP22(696-700)
IEEE DOI 2211
Deep learning, Training, Visualization, Transfer learning, Optimization methods, Self-supervised learning, Transformers, 3-D medical image classifications BibRef

Wu, K.[Kan], Zhang, J.[Jinnian], Peng, H.[Houwen], Liu, M.C.[Meng-Chen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
TinyViT: Fast Pretraining Distillation for Small Vision Transformers,
ECCV22(XXI:68-85).
Springer DOI 2211
BibRef

Gao, L.[Li], Nie, D.[Dong], Li, B.[Bo], Ren, X.F.[Xiao-Feng],
Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation,
ECCV22(XXIII:744-761).
Springer DOI 2211
BibRef

Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Li, Y.[Yehao], Ngo, C.W.[Chong-Wah], Mei, T.[Tao],
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning,
ECCV22(XXV:328-345).
Springer DOI 2211
BibRef

Yuan, Z.H.[Zhi-Hang], Xue, C.H.[Chen-Hao], Chen, Y.Q.[Yi-Qi], Wu, Q.[Qiang], Sun, G.Y.[Guang-Yu],
PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization,
ECCV22(XII:191-207).
Springer DOI 2211
BibRef

Kong, Z.L.[Zheng-Lun], Dong, P.Y.[Pei-Yan], Ma, X.L.[Xiao-Long], Meng, X.[Xin], Niu, W.[Wei], Sun, M.S.[Meng-Shu], Shen, X.[Xuan], Yuan, G.[Geng], Ren, B.[Bin], Tang, H.[Hao], Qin, M.[Minghai], Wang, Y.Z.[Yan-Zhi],
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning,
ECCV22(XI:620-640).
Springer DOI 2211
BibRef

Pan, J.[Junting], Bulat, A.[Adrian], Tan, F.[Fuwen], Zhu, X.T.[Xia-Tian], Dudziak, L.[Lukasz], Li, H.S.[Hong-Sheng], Tzimiropoulos, G.[Georgios], Martinez, B.[Brais],
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers,
ECCV22(XI:294-311).
Springer DOI 2211
BibRef

Xiang, H.[Hao], Xu, R.S.[Run-Sheng], Ma, J.Q.[Jia-Qi],
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer,
ICCV23(284-295)
IEEE DOI Code:
WWW Link. 2401
BibRef

Xu, R.S.[Run-Sheng], Xiang, H.[Hao], Tu, Z.Z.[Zheng-Zhong], Xia, X.[Xin], Yang, M.H.[Ming-Hsuan], Ma, J.Q.[Jia-Qi],
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer,
ECCV22(XXIX:107-124).
Springer DOI 2211
BibRef

Liu, Y.[Yong], Mai, S.Q.[Si-Qi], Chen, X.N.[Xiang-Ning], Hsieh, C.J.[Cho-Jui], You, Y.[Yang],
Towards Efficient and Scalable Sharpness-Aware Minimization,
CVPR22(12350-12360)
IEEE DOI 2210

WWW Link. Training, Schedules, Scalability, Perturbation methods, Stochastic processes, Transformers, Minimization, Vision applications and systems BibRef

Ren, P.Z.[Peng-Zhen], Li, C.[Changlin], Wang, G.[Guangrun], Xiao, Y.[Yun], Du, Q.[Qing], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun],
Beyond Fixation: Dynamic Window Visual Transformer,
CVPR22(11977-11987)
IEEE DOI 2210
Performance evaluation, Visualization, Systematics, Computational modeling, Scalability, Transformers, Deep learning architectures and techniques BibRef

Bhattacharjee, D.[Deblina], Zhang, T.[Tong], Süsstrunk, S.[Sabine], Salzmann, M.[Mathieu],
MuIT: An End-to-End Multitask Learning Transformer,
CVPR22(12021-12031)
IEEE DOI 2210
Heart, Image segmentation, Computational modeling, Image edge detection, Semantics, Estimation, Predictive models, Scene analysis and understanding BibRef

Fang, J.[Jiemin], Xie, L.X.[Ling-Xi], Wang, X.G.[Xing-Gang], Zhang, X.P.[Xiao-Peng], Liu, W.Y.[Wen-Yu], Tian, Q.[Qi],
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens,
CVPR22(12053-12062)
IEEE DOI 2210
Deep learning, Visualization, Neural networks, Graphics processing units, retrieval BibRef

Sandler, M.[Mark], Zhmoginov, A.[Andrey], Vladymyrov, M.[Max], Jackson, A.[Andrew],
Fine-tuning Image Transformers using Learnable Memory,
CVPR22(12145-12154)
IEEE DOI 2210
Deep learning, Adaptation models, Costs, Computational modeling, Memory management, Transformers, Transfer/low-shot/long-tail learning BibRef

Yu, X.[Xumin], Tang, L.[Lulu], Rao, Y.M.[Yong-Ming], Huang, T.J.[Tie-Jun], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling,
CVPR22(19291-19300)
IEEE DOI 2210
Point cloud compression, Solid modeling, Computational modeling, Bit error rate, Transformers, Pattern recognition, Deep learning architectures and techniques BibRef

Park, C.[Chunghyun], Jeong, Y.[Yoonwoo], Cho, M.[Minsu], Park, J.[Jaesik],
Fast Point Transformer,
CVPR22(16928-16937)
IEEE DOI 2210
Point cloud compression, Shape, Semantics, Neural networks, Transformers, grouping and shape analysis BibRef

Zeng, W.[Wang], Jin, S.[Sheng], Liu, W.T.[Wen-Tao], Qian, C.[Chen], Luo, P.[Ping], Ouyang, W.L.[Wan-Li], Wang, X.G.[Xiao-Gang],
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer,
CVPR22(11091-11101)
IEEE DOI 2210
Visualization, Shape, Pose estimation, Semantics, Pose estimation and tracking, Deep learning architectures and techniques BibRef

Xie, Z.D.[Zhen-Da], Zhang, Z.[Zheng], Cao, Y.[Yue], Lin, Y.T.[Yu-Tong], Bao, J.M.[Jian-Min], Yao, Z.L.[Zhu-Liang], Dai, Q.[Qi], Hu, H.[Han],
SimMIM: a Simple Framework for Masked Image Modeling,
CVPR22(9643-9653)
IEEE DOI 2210

WWW Link. Representation learning, Training, Head, Self-supervised learning, Predictive models, Data models, Self- semi- meta- Representation learning BibRef

Tu, Z.Z.[Zheng-Zhong], Talebi, H.[Hossein], Zhang, H.[Han], Yang, F.[Feng], Milanfar, P.[Peyman], Bovik, A.[Alan], Li, Y.X.[Yin-Xiao],
MAXIM: Multi-Axis MLP for Image Processing,
CVPR22(5759-5770)
IEEE DOI 2210

WWW Link. Training, Photography, Adaptation models, Visualization, Computational modeling, Transformers, Low-level vision, Computational photography BibRef

Yun, S.[Sukmin], Lee, H.[Hankook], Kim, J.[Jaehyung], Shin, J.[Jinwoo],
Patch-level Representation Learning for Self-supervised Vision Transformers,
CVPR22(8344-8353)
IEEE DOI 2210
Training, Representation learning, Visualization, Neural networks, Object detection, Self-supervised learning, Transformers, Self- semi- meta- unsupervised learning BibRef

Hou, Z.J.[Ze-Jiang], Kung, S.Y.[Sun-Yuan],
Multi-Dimensional Vision Transformer Compression via Dependency Guided Gaussian Process Search,
EVW22(3668-3677)
IEEE DOI 2210
Adaptation models, Image coding, Head, Computational modeling, Neurons, Gaussian processes, Transformers BibRef

Salman, H.[Hadi], Jain, S.[Saachi], Wong, E.[Eric], Madry, A.[Aleksander],
Certified Patch Robustness via Smoothed Vision Transformers,
CVPR22(15116-15126)
IEEE DOI 2210
Visualization, Smoothing methods, Costs, Computational modeling, Transformers, Adversarial attack and defense BibRef

Wang, Y.K.[Yi-Kai], Chen, X.H.[Xing-Hao], Cao, L.[Lele], Huang, W.B.[Wen-Bing], Sun, F.C.[Fu-Chun], Wang, Y.H.[Yun-He],
Multimodal Token Fusion for Vision Transformers,
CVPR22(12176-12185)
IEEE DOI 2210
Point cloud compression, Image segmentation, Shape, Semantics, Object detection, Vision+X BibRef

Tang, Y.[Yehui], Han, K.[Kai], Wang, Y.H.[Yun-He], Xu, C.[Chang], Guo, J.Y.[Jian-Yuan], Xu, C.[Chao], Tao, D.C.[Da-Cheng],
Patch Slimming for Efficient Vision Transformers,
CVPR22(12155-12164)
IEEE DOI 2210
Visualization, Quantization (signal), Computational modeling, Aggregates, Benchmark testing, Representation learning BibRef

Zhang, J.[Jinnian], Peng, H.[Houwen], Wu, K.[Kan], Liu, M.C.[Meng-Chen], Xiao, B.[Bin], Fu, J.L.[Jian-Long], Yuan, L.[Lu],
MiniViT: Compressing Vision Transformers with Weight Multiplexing,
CVPR22(12135-12144)
IEEE DOI 2210
Multiplexing, Performance evaluation, Image coding, Codes, Computational modeling, Benchmark testing, Vision applications and systems BibRef

Chen, J.N.[Jie-Neng], Sun, S.Y.[Shu-Yang], He, J.[Ju], Torr, P.H.S.[Philip H.S.], Yuille, A.L.[Alan L.], Bai, S.[Song],
TransMix: Attend to Mix for Vision Transformers,
CVPR22(12125-12134)
IEEE DOI 2210
Training, Image segmentation, Codes, Semantics, Object detection, Benchmark testing, Transformers, Representation learning BibRef

Liu, H.[Hao], Jiang, X.H.[Xing-Hua], Li, X.[Xin], Bao, Z.M.[Zhi-Min], Jiang, D.Q.[De-Qiang], Ren, B.[Bo],
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition,
CVPR22(12063-12072)
IEEE DOI 2210
Visualization, Image segmentation, Semantics, Redundancy, Object detection, Deep learning architectures and techniques BibRef

Chen, T.L.[Tian-Long], Zhang, Z.Y.[Zhen-Yu], Cheng, Y.[Yu], Awadallah, A.[Ahmed], Wang, Z.Y.[Zhang-Yang],
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy,
CVPR22(12010-12020)
IEEE DOI 2210
Training, Convolutional codes, Deep learning, Computational modeling, Redundancy, Deep learning architectures and techniques BibRef

Yin, H.X.[Hong-Xu], Vahdat, A.[Arash], Alvarez, J.M.[Jose M.], Mallya, A.[Arun], Kautz, J.[Jan], Molchanov, P.[Pavlo],
A-ViT: Adaptive Tokens for Efficient Vision Transformer,
CVPR22(10799-10808)
IEEE DOI 2210
Training, Adaptive systems, Network architecture, Transformers, Throughput, Hardware, Complexity theory, Efficient learning and inferences BibRef

Lu, J.H.[Jia-Hao], Zhang, X.S.[Xi Sheryl], Zhao, T.L.[Tian-Li], He, X.Y.[Xiang-Yu], Cheng, J.[Jian],
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers,
CVPR22(10041-10050)
IEEE DOI 2210
Privacy, Data privacy, Federated learning, Computational modeling, Training data, Transformers, Market research, Privacy and federated learning BibRef

Hatamizadeh, A.[Ali], Yin, H.X.[Hong-Xu], Roth, H.[Holger], Li, W.Q.[Wen-Qi], Kautz, J.[Jan], Xu, D.[Daguang], Molchanov, P.[Pavlo],
GradViT: Gradient Inversion of Vision Transformers,
CVPR22(10011-10020)
IEEE DOI 2210
Measurement, Differential privacy, Neural networks, Transformers, Pattern recognition, Security, Iterative methods, Privacy and federated learning BibRef

Zhang, H.[Haofei], Duan, J.R.[Jia-Rui], Xue, M.Q.[Meng-Qi], Song, J.[Jie], Sun, L.[Li], Song, M.L.[Ming-Li],
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training,
CVPR22(8934-8943)
IEEE DOI 2210
Training, Upper bound, Neural networks, Training data, Network architecture, Transformers, Computer vision theory, Efficient learning and inferences BibRef

Chavan, A.[Arnav], Shen, Z.Q.[Zhi-Qiang], Liu, Z.[Zhuang], Liu, Z.[Zechun], Cheng, K.T.[Kwang-Ting], Xing, E.[Eric],
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space,
CVPR22(4921-4931)
IEEE DOI 2210
Training, Performance evaluation, Image coding, Force, Graphics processing units, Vision applications and systems BibRef

Hong, W.X.[Wei-Xiang], Lao, J.W.[Jiang-Wei], Ren, W.[Wang], Wang, J.[Jian], Chen, J.D.[Jing-Dong], Chu, W.[Wei],
Training Object Detectors from Scratch: An Empirical Study in the Era of Vision Transformer,
CVPR22(4652-4661)
IEEE DOI 2210
Training, Visualization, Semantics, Detectors, Object detection, Transformers, Recognition: detection, categorization, retrieval, Deep learning architectures and techniques BibRef

Chen, Z.Y.[Zhao-Yu], Li, B.[Bo], Wu, S.[Shuang], Xu, J.H.[Jiang-He], Ding, S.H.[Shou-Hong], Zhang, W.Q.[Wen-Qiang],
Shape Matters: Deformable Patch Attack,
ECCV22(IV:529-548).
Springer DOI 2211
BibRef

Chen, Z.Y.[Zhao-Yu], Li, B.[Bo], Xu, J.H.[Jiang-He], Wu, S.[Shuang], Ding, S.H.[Shou-Hong], Zhang, W.Q.[Wen-Qiang],
Towards Practical Certifiable Patch Defense with Vision Transformer,
CVPR22(15127-15137)
IEEE DOI 2210
Smoothing methods, Toy manufacturing industry, Semantics, Network architecture, Transformers, Robustness, Adversarial attack and defense BibRef

Chen, R.J.[Richard J.], Chen, C.[Chengkuan], Li, Y.C.[Yi-Cong], Chen, T.Y.[Tiffany Y.], Trister, A.D.[Andrew D.], Krishnan, R.G.[Rahul G.], Mahmood, F.[Faisal],
Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning,
CVPR22(16123-16134)
IEEE DOI 2210
Training, Visualization, Self-supervised learning, Image representation, Transformers, Self- semi- meta- unsupervised learning BibRef

Yang, Z.[Zhao], Wang, J.Q.[Jia-Qi], Tang, Y.S.[Yan-Song], Chen, K.[Kai], Zhao, H.S.[Heng-Shuang], Torr, P.H.S.[Philip H.S.],
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation,
CVPR22(18134-18144)
IEEE DOI 2210
Image segmentation, Visualization, Image coding, Shape, Linguistics, Transformers, Feature extraction, Segmentation, grouping and shape analysis BibRef

Scheibenreif, L.[Linus], Hanna, J.[Joëlle], Mommert, M.[Michael], Borth, D.[Damian],
Self-supervised Vision Transformers for Land-cover Segmentation and Classification,
EarthVision22(1421-1430)
IEEE DOI 2210
Training, Earth, Image segmentation, Computational modeling, Conferences, Transformers BibRef

Zhai, X.H.[Xiao-Hua], Kolesnikov, A.[Alexander], Houlsby, N.[Neil], Beyer, L.[Lucas],
Scaling Vision Transformers,
CVPR22(1204-1213)
IEEE DOI 2210
Training, Error analysis, Computational modeling, Neural networks, Memory management, Training data, Transfer/low-shot/long-tail learning BibRef

Guo, J.Y.[Jian-Yuan], Han, K.[Kai], Wu, H.[Han], Tang, Y.[Yehui], Chen, X.H.[Xing-Hao], Wang, Y.H.[Yun-He], Xu, C.[Chang],
CMT: Convolutional Neural Networks Meet Vision Transformers,
CVPR22(12165-12175)
IEEE DOI 2210
Visualization, Image recognition, Force, Object detection, Transformers, Representation learning BibRef

Meng, L.C.[Ling-Chen], Li, H.D.[Heng-Duo], Chen, B.C.[Bor-Chun], Lan, S.Y.[Shi-Yi], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang], Lim, S.N.[Ser-Nam],
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition,
CVPR22(12299-12308)
IEEE DOI 2210
Image recognition, Head, Law enforcement, Computational modeling, Redundancy, Transformers, Efficient learning and inferences, retrieval BibRef

Herrmann, C.[Charles], Sargent, K.[Kyle], Jiang, L.[Lu], Zabih, R.[Ramin], Chang, H.[Huiwen], Liu, C.[Ce], Krishnan, D.[Dilip], Sun, D.Q.[De-Qing],
Pyramid Adversarial Training Improves ViT Performance,
CVPR22(13409-13419)
IEEE DOI 2210
Training, Image recognition, Stochastic processes, Transformers, Robustness, retrieval, Recognition: detection BibRef

Li, C.L.[Chang-Lin], Zhuang, B.[Bohan], Wang, G.R.[Guang-Run], Liang, X.D.[Xiao-Dan], Chang, X.J.[Xiao-Jun], Yang, Y.[Yi],
Automated Progressive Learning for Efficient Training of Vision Transformers,
CVPR22(12476-12486)
IEEE DOI 2210
Training, Adaptation models, Schedules, Computational modeling, Estimation, Manuals, Transformers, Representation learning BibRef

Guo, J.Y.[Jian-Yuan], Tang, Y.H.[Ye-Hui], Han, K.[Kai], Chen, X.H.[Xing-Hao], Wu, H.[Han], Xu, C.[Chao], Xu, C.[Chang], Wang, Y.H.[Yun-He],
Hire-MLP: Vision MLP via Hierarchical Rearrangement,
CVPR22(816-826)
IEEE DOI 2210
Representation learning, Image segmentation, Semantics, Object detection, Transformers, Representation learning BibRef

Pu, M.Y.[Meng-Yang], Huang, Y.P.[Ya-Ping], Liu, Y.M.[Yu-Ming], Guan, Q.J.[Qing-Ji], Ling, H.B.[Hai-Bin],
EDTER: Edge Detection with Transformer,
CVPR22(1392-1402)
IEEE DOI 2210
Head, Image edge detection, Semantics, Detectors, Transformers, Feature extraction, Segmentation, grouping and shape analysis, Scene analysis and understanding BibRef

Zhu, R.[Rui], Li, Z.Q.[Zheng-Qin], Matai, J.[Janarbek], Porikli, F.M.[Fatih M.], Chandraker, M.[Manmohan],
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes,
CVPR22(2812-2821)
IEEE DOI 2210
Photorealism, Shape, Computational modeling, Lighting, Transformers, Physics-based vision and shape-from-X BibRef

Ermolov, A.[Aleksandr], Mirvakhabova, L.[Leyla], Khrulkov, V.[Valentin], Sebe, N.[Nicu], Oseledets, I.[Ivan],
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning,
CVPR22(7399-7409)
IEEE DOI 2210
Measurement, Geometry, Visualization, Semantics, Self-supervised learning, Transformer cores, Transformers, Representation learning BibRef

Lee, Y.[Youngwan], Kim, J.[Jonghee], Willette, J.[Jeffrey], Hwang, S.J.[Sung Ju],
MPViT: Multi-Path Vision Transformer for Dense Prediction,
CVPR22(7277-7286)
IEEE DOI 2210
Image segmentation, Semantics, Object detection, Transformers, Feature extraction, Pattern recognition, Recognition: detection, Representation learning BibRef

Zhang, C.Z.[Chong-Zhi], Zhang, M.Y.[Ming-Yuan], Zhang, S.H.[Shang-Hang], Jin, D.S.[Dai-Sheng], Zhou, Q.[Qiang], Cai, Z.A.[Zhong-Ang], Zhao, H.[Haiyu], Liu, X.L.[Xiang-Long], Liu, Z.W.[Zi-Wei],
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts,
CVPR22(7267-7276)
IEEE DOI 2210
Training, Representation learning, Systematics, Shape, Taxonomy, Self-supervised learning, Transformers, Recognition: detection, Representation learning BibRef

Hou, Z.[Zhi], Yu, B.[Baosheng], Tao, D.C.[Da-Cheng],
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning,
CVPR22(7246-7256)
IEEE DOI 2210
Training, Deep learning, Representation learning, Neural networks, Tail, Transformers, Transfer/low-shot/long-tail learning, Self- semi- meta- unsupervised learning BibRef

Zamir, S.W.[Syed Waqas], Arora, A.[Aditya], Khan, S.[Salman], Hayat, M.[Munawar], Khan, F.S.[Fahad Shahbaz], Yang, M.H.[Ming-Hsuan],
Restormer: Efficient Transformer for High-Resolution Image Restoration,
CVPR22(5718-5729)
IEEE DOI 2210
Computational modeling, Transformer cores, Transformers, Data models, Image restoration, Task analysis, Deep learning architectures and techniques BibRef

Zhao, H.S.[Heng-Shuang], Jiang, L.[Li], Jia, J.Y.[Jia-Ya], Torr, P.H.S.[Philip H.S.], Koltun, V.[Vladlen],
Point Transformer,
ICCV21(16239-16248)
IEEE DOI 2203
Point cloud compression, Measurement, Image segmentation, Semantics, Object detection, Transformer cores, Recognition and classification BibRef

Lin, K.[Kevin], Wang, L.J.[Li-Juan], Liu, Z.C.[Zi-Cheng],
Mesh Graphormer,
ICCV21(12919-12928)
IEEE DOI 2203
Convolutional codes, Solid modeling, Network topology, Transformers, Gestures and body pose BibRef

Casey, E.[Evan], Pérez, V.[Víctor], Li, Z.[Zhuoru],
The Animation Transformer: Visual Correspondence via Segment Matching,
ICCV21(11303-11312)
IEEE DOI 2203
Visualization, Image segmentation, Image color analysis, Production, Animation, Transformers, grouping and shape BibRef

Reizenstein, J.[Jeremy], Shapovalov, R.[Roman], Henzler, P.[Philipp], Sbordone, L.[Luca], Labatut, P.[Patrick], Novotny, D.[David],
Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction,
ICCV21(10881-10891)
IEEE DOI 2203
Award, Marr Prize, HM. Point cloud compression, Transformers, Rendering (computer graphics), Cameras, Image reconstruction, 3D from multiview and other sensors BibRef

Feng, W.X.[Wei-Xin], Wang, Y.J.[Yuan-Jiang], Ma, L.H.[Li-Hua], Yuan, Y.[Ye], Zhang, C.[Chi],
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning,
ICCV21(10150-10160)
IEEE DOI 2203
Training, Representation learning, Visualization, Protocols, Object detection, Semisupervised learning, Transformers, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Wu, H.P.[Hai-Ping], Xiao, B.[Bin], Codella, N.[Noel], Liu, M.C.[Meng-Chen], Dai, X.Y.[Xi-Yang], Yuan, L.[Lu], Zhang, L.[Lei],
CvT: Introducing Convolutions to Vision Transformers,
ICCV21(22-31)
IEEE DOI 2203
Code, Vision Transformer.
WWW Link. Convolutional codes, Image resolution, Image recognition, Performance gain, Transformers, Distortion, BibRef

Touvron, H.[Hugo], Cord, M.[Matthieu], Sablayrolles, A.[Alexandre], Synnaeve, G.[Gabriel], Jégou, H.[Hervé],
Going deeper with Image Transformers,
ICCV21(32-42)
IEEE DOI 2203
Training, Neural networks, Training data, Data models, Circuit faults, Recognition and classification, Optimization and learning methods BibRef

Zhao, J.W.[Jia-Wei], Yan, K.[Ke], Zhao, Y.F.[Yi-Fan], Guo, X.W.[Xiao-Wei], Huang, F.Y.[Fei-Yue], Li, J.[Jia],
Transformer-based Dual Relation Graph for Multi-label Image Recognition,
ICCV21(163-172)
IEEE DOI 2203
Image recognition, Correlation, Computational modeling, Semantics, Benchmark testing, Representation learning BibRef

Pan, Z.Z.[Zi-Zheng], Zhuang, B.[Bohan], Liu, J.[Jing], He, H.Y.[Hao-Yu], Cai, J.F.[Jian-Fei],
Scalable Vision Transformers with Hierarchical Pooling,
ICCV21(367-376)
IEEE DOI 2203
Visualization, Image recognition, Computational modeling, Scalability, Transformers, Computational efficiency, Efficient training and inference methods BibRef

Yuan, L.[Li], Chen, Y.P.[Yun-Peng], Wang, T.[Tao], Yu, W.H.[Wei-Hao], Shi, Y.J.[Yu-Jun], Jiang, Z.H.[Zi-Hang], Tay, F.E.H.[Francis E. H.], Feng, J.S.[Jia-Shi], Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI 2203
Training, Image resolution, Computational modeling, Image edge detection, Transformers, BibRef

Wu, B.[Bichen], Xu, C.F.[Chen-Feng], Dai, X.L.[Xiao-Liang], Wan, A.[Alvin], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Tomizuka, M.[Masayoshi], Gonzalez, J.[Joseph], Keutzer, K.[Kurt], Vajda, P.[Peter],
Visual Transformers: Where Do Transformers Really Belong in Vision Models?,
ICCV21(579-589)
IEEE DOI 2203
Training, Visualization, Image segmentation, Lips, Computational modeling, Semantics, Vision applications and systems BibRef

Hu, R.H.[Rong-Hang], Singh, A.[Amanpreet],
UniT: Multimodal Multitask Learning with a Unified Transformer,
ICCV21(1419-1429)
IEEE DOI 2203
Training, Natural languages, Object detection, Predictive models, Transformers, Multitasking, Representation learning BibRef

Qiu, Y.[Yue], Yamamoto, S.[Shintaro], Nakashima, K.[Kodai], Suzuki, R.[Ryota], Iwata, K.[Kenji], Kataoka, H.[Hirokatsu], Satoh, Y.[Yutaka],
Describing and Localizing Multiple Changes with Transformers,
ICCV21(1951-1960)
IEEE DOI 2203
Measurement, Location awareness, Codes, Natural languages, Benchmark testing, Transformers, Vision applications and systems BibRef

Song, M.[Myungseo], Choi, J.[Jinyoung], Han, B.H.[Bo-Hyung],
Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform,
ICCV21(2360-2369)
IEEE DOI 2203
Training, Image coding, Neural networks, Rate-distortion, Transforms, Network architecture, Computational photography, Low-level and physics-based vision BibRef

Shenga, H.[Hualian], Cai, S.[Sijia], Liu, Y.[Yuan], Deng, B.[Bing], Huang, J.Q.[Jian-Qiang], Hua, X.S.[Xian-Sheng], Zhao, M.J.[Min-Jian],
Improving 3D Object Detection with Channel-wise Transformer,
ICCV21(2723-2732)
IEEE DOI 2203
Point cloud compression, Object detection, Detectors, Transforms, Transformers, Encoding, Detection and localization in 2D and 3D, BibRef

Zhang, P.C.[Peng-Chuan], Dai, X.[Xiyang], Yang, J.W.[Jian-Wei], Xiao, B.[Bin], Yuan, L.[Lu], Zhang, L.[Lei], Gao, J.F.[Jian-Feng],
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding,
ICCV21(2978-2988)
IEEE DOI 2203
Image segmentation, Image coding, Computational modeling, Memory management, Object detection, Transformers, Representation learning BibRef

Dong, Q.[Qi], Tu, Z.W.[Zhuo-Wen], Liao, H.[Haofu], Zhang, Y.T.[Yu-Ting], Mahadevan, V.[Vijay], Soatto, S.[Stefano],
Visual Relationship Detection Using Part-and-Sum Transformers with Composite Queries,
ICCV21(3530-3539)
IEEE DOI 2203
Visualization, Detectors, Transformers, Task analysis, Standards, Detection and localization in 2D and 3D, Representation learning BibRef

Fan, H.Q.[Hao-Qi], Xiong, B.[Bo], Mangalam, K.[Karttikeya], Li, Y.[Yanghao], Yan, Z.C.[Zhi-Cheng], Malik, J.[Jitendra], Feichtenhofer, C.[Christoph],
Multiscale Vision Transformers,
ICCV21(6804-6815)
IEEE DOI 2203
Visualization, Image recognition, Codes, Computational modeling, Transformers, Complexity theory, Recognition and classification BibRef

Mahmood, K.[Kaleel], Mahmood, R.[Rigel], van Dijk, M.[Marten],
On the Robustness of Vision Transformers to Adversarial Examples,
ICCV21(7818-7827)
IEEE DOI 2203
Transformers, Robustness, Adversarial machine learning, Security, Machine learning architectures and formulations BibRef

Chen, X.L.[Xin-Lei], Xie, S.[Saining], He, K.[Kaiming],
An Empirical Study of Training Self-Supervised Vision Transformers,
ICCV21(9620-9629)
IEEE DOI 2203
Training, Benchmark testing, Transformers, Standards, Representation learning, Recognition and classification, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Caron, M.[Mathilde], Touvron, H.[Hugo], Misra, I.[Ishan], Jegou, H.[Hervé], Mairal, J.[Julien], Bojanowski, P.[Piotr], Joulin, A.[Armand],
Emerging Properties in Self-Supervised Vision Transformers,
ICCV21(9630-9640)
IEEE DOI 2203
Training, Image segmentation, Semantics, Layout, Image retrieval, Representation learning, Transfer/Low-shot/Semi/Unsupervised Learning BibRef

Yuan, Y.[Ye], Weng, X.[Xinshuo], Ou, Y.[Yanglan], Kitani, K.[Kris],
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting,
ICCV21(9793-9803)
IEEE DOI 2203
Uncertainty, Stochastic processes, Predictive models, Transformers, Encoding, Trajectory, Motion and tracking, Vision for robotics and autonomous vehicles BibRef

Wu, K.[Kan], Peng, H.W.[Hou-Wen], Chen, M.H.[Ming-Hao], Fu, J.L.[Jian-Long], Chao, H.Y.[Hong-Yang],
Rethinking and Improving Relative Position Encoding for Vision Transformer,
ICCV21(10013-10021)
IEEE DOI 2203
Image coding, Codes, Computational modeling, Transformers, Encoding, Natural language processing, Datasets and evaluation, Recognition and classification BibRef

Bhojanapalli, S.[Srinadh], Chakrabarti, A.[Ayan], Glasner, D.[Daniel], Li, D.[Daliang], Unterthiner, T.[Thomas], Veit, A.[Andreas],
Understanding Robustness of Transformers for Image Classification,
ICCV21(10211-10221)
IEEE DOI 2203
Perturbation methods, Transformers, Robustness, Data models, Convolutional neural networks, Recognition and classification BibRef

Yan, B.[Bin], Peng, H.[Houwen], Fu, J.L.[Jian-Long], Wang, D.[Dong], Lu, H.C.[Hu-Chuan],
Learning Spatio-Temporal Transformer for Visual Tracking,
ICCV21(10428-10437)
IEEE DOI 2203
Visualization, Target tracking, Smoothing methods, Pipelines, Benchmark testing, Transformers, BibRef

Heo, B.[Byeongho], Yun, S.[Sangdoo], Han, D.Y.[Dong-Yoon], Chun, S.[Sanghyuk], Choe, J.[Junsuk], Oh, S.J.[Seong Joon],
Rethinking Spatial Dimensions of Vision Transformers,
ICCV21(11916-11925)
IEEE DOI 2203
Dimensionality reduction, Computational modeling, Object detection, Transformers, Robustness, Recognition and classification BibRef

Voskou, A.[Andreas], Panousis, K.P.[Konstantinos P.], Kosmopoulos, D.[Dimitrios], Metaxas, D.N.[Dimitris N.], Chatzis, S.[Sotirios],
Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation,
ICCV21(11926-11935)
IEEE DOI 2203
Training, Memory management, Stochastic processes, Gesture recognition, Benchmark testing, Assistive technologies, BibRef

Ranftl, R.[René], Bochkovskiy, A.[Alexey], Koltun, V.[Vladlen],
Vision Transformers for Dense Prediction,
ICCV21(12159-12168)
IEEE DOI 2203
Image resolution, Semantics, Neural networks, Estimation, Training data, grouping and shape BibRef

Chen, M.H.[Ming-Hao], Peng, H.W.[Hou-Wen], Fu, J.L.[Jian-Long], Ling, H.B.[Hai-Bin],
AutoFormer: Searching Transformers for Visual Recognition,
ICCV21(12250-12260)
IEEE DOI 2203
Training, Convolutional codes, Visualization, Head, Search methods, Manuals, Recognition and classification BibRef

Yuan, K.[Kun], Guo, S.P.[Shao-Peng], Liu, Z.W.[Zi-Wei], Zhou, A.[Aojun], Yu, F.W.[Feng-Wei], Wu, W.[Wei],
Incorporating Convolution Designs into Visual Transformers,
ICCV21(559-568)
IEEE DOI 2203
Training, Visualization, Costs, Convolution, Training data, Transformers, Feature extraction, Recognition and classification, Efficient training and inference methods BibRef

Chen, Z.[Zhengsu], Xie, L.X.[Ling-Xi], Niu, J.W.[Jian-Wei], Liu, X.F.[Xue-Feng], Wei, L.[Longhui], Tian, Q.[Qi],
Visformer: The Vision-friendly Transformer,
ICCV21(569-578)
IEEE DOI 2203
Convolutional codes, Training, Visualization, Protocols, Computational modeling, Fitting, Recognition and classification, Representation learning BibRef

Wang, W.[Wenhai], Xie, E.[Enze], Li, X.[Xiang], Fan, D.P.[Deng-Ping], Song, K.[Kaitao], Liang, D.[Ding], Lu, T.[Tong], Luo, P.[Ping], Shao, L.[Ling],
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions,
ICCV21(548-558)
IEEE DOI 2203
Image resolution, Costs, Semantics, Object detection, Transformers, Feature extraction, Recognition and classification, grouping and shape BibRef

Yao, Z.L.[Zhu-Liang], Cao, Y.[Yue], Lin, Y.T.[Yu-Tong], Liu, Z.[Ze], Zhang, Z.[Zheng], Hu, H.[Han],
Leveraging Batch Normalization for Vision Transformers,
NeruArch21(413-422)
IEEE DOI 2112
Training, Transformers, Feeds BibRef

Zhang, Z.X.[Zi-Xiao], Lu, X.Q.[Xiao-Qiang], Cao, G.J.[Guo-Jin], Yang, Y.T.[Yu-Ting], Jiao, L.C.[Li-Cheng], Liu, F.[Fang],
ViT-YOLO: Transformer-Based YOLO for Object Detection,
VisDrone21(2799-2808)
IEEE DOI 2112
Semantics, Detectors, Object detection, Feature extraction, Robustness BibRef

Graham, B.[Ben], El-Nouby, A.[Alaaeldin], Touvron, H.[Hugo], Stock, P.[Pierre], Joulin, A.[Armand], Jégou, H.[Hervé], Douze, M.[Matthijs],
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference,
ICCV21(12239-12249)
IEEE DOI 2203
Training, Image resolution, Neural networks, Parallel processing, Transformers, Feature extraction, Representation learning BibRef

Horváth, J.[János], Baireddy, S.[Sriram], Hao, H.X.[Han-Xiang], Montserrat, D.M.[Daniel Mas], Delp, E.J.[Edward J.],
Manipulation Detection in Satellite Images Using Vision Transformer,
WMF21(1032-1041)
IEEE DOI 2109
BibRef
Earlier: A1, A4, A3, A5, Only:
Manipulation Detection in Satellite Images Using Deep Belief Networks,
WMF20(2832-2840)
IEEE DOI 2008
Image sensors, Satellites, Splicing, Forestry, Tools. Satellites, Image reconstruction, Training, Forgery, Heating systems, Feature extraction BibRef

Beal, J.[Josh], Wu, H.Y.[Hao-Yu], Park, D.H.[Dong Huk], Zhai, A.[Andrew], Kislyuk, D.[Dmitry],
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations,
WACV22(1431-1440)
IEEE DOI 2202
Visualization, Solid modeling, Systematics, Computational modeling, Transformers, Semi- and Un- supervised Learning BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Attention in Vision Transformers .


Last update:Mar 16, 2024 at 20:36:19