Hu, H.Q.[Hao-Qi],
Lu, X.F.[Xiao-Feng],
Zhang, X.P.[Xin-Peng],
Zhang, T.X.[Tian-Xing],
Sun, G.L.[Guang-Ling],
Inheritance Attention Matrix-Based Universal Adversarial
Perturbations on Vision Transformers,
SPLetters(28), 2021, pp. 1923-1927.
IEEE DOI
2110
Perturbation methods, Robustness, Visualization, Transformers,
Optimization, Task analysis, Head, Vision Transformers, self-attention
BibRef
Xue, Z.X.[Zhi-Xiang],
Tan, X.[Xiong],
Yu, X.[Xuchu],
Liu, B.[Bing],
Yu, A.Z.[An-Zhu],
Zhang, P.Q.[Peng-Qiang],
Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data
Classification,
IP(31), 2022, pp. 3095-3110.
IEEE DOI
2205
Feature extraction, Transformers, Hyperspectral imaging,
Laser radar, Data mining, Collaboration, Data models,
cross attention fusion
BibRef
Heo, J.[Jiseong],
Wang, Y.[Yooseung],
Park, J.[Jihun],
Occlusion-aware spatial attention transformer for occluded object
recognition,
PRL(159), 2022, pp. 70-76.
Elsevier DOI
2206
Occluded object recognition, Visual transformer, Spatial attention
BibRef
Yu, X.H.[Xiao-Han],
Wang, J.[Jun],
Zhao, Y.[Yang],
Gao, Y.S.[Yong-Sheng],
Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained
visual categorization,
PR(135), 2023, pp. 109131.
Elsevier DOI
2212
Ultra-fine-grained visual categorization, Vision transformer,
Self-supervised learning, Attentive mixing
BibRef
Wu, G.[Gaojie],
Zheng, W.S.[Wei-Shi],
Lu, Y.T.[Yu-Tong],
Tian, Q.[Qi],
PSLT: A Light-Weight Vision Transformer With Ladder Self-Attention
and Progressive Shift,
PAMI(45), No. 9, September 2023, pp. 11120-11135.
IEEE DOI
2309
BibRef
Li, K.C.[Kun-Chang],
Wang, Y.[Yali],
Zhang, J.H.[Jun-Hao],
Gao, P.[Peng],
Song, G.L.[Guang-Lu],
Liu, Y.[Yu],
Li, H.S.[Hong-Sheng],
Qiao, Y.[Yu],
UniFormer: Unifying Convolution and Self-Attention for Visual
Recognition,
PAMI(45), No. 10, October 2023, pp. 12581-12600.
IEEE DOI
2310
Unify CNN and Transformers
BibRef
Li, H.L.[Hao-Ling],
Xue, M.Q.[Meng-Qi],
Song, J.[Jie],
Zhang, H.F.[Hao-Fei],
Huang, W.Q.[Wen-Qi],
Liang, L.Y.[Ling-Yu],
Song, M.L.[Ming-Li],
Constituent Attention for Vision Transformers,
CVIU(237), 2023, pp. 103838.
Elsevier DOI Code:
WWW Link.
2311
Vision Transformer, Attention mechanism, Classification,
Interpretability for deep learning
BibRef
Qin, R.[Ruiru],
Wang, C.Z.[Chuan-Zhi],
Wu, Y.M.[Yong-Mei],
Du, H.[Huafei],
Lv, M.Y.[Ming-Yun],
A U-Shaped Convolution-Aided Transformer with Double Attention for
Hyperspectral Image Classification,
RS(16), No. 2, 2024, pp. 288.
DOI Link
2402
BibRef
Wang, W.X.[Wen-Xiao],
Chen, W.[Wei],
Qiu, Q.[Qibo],
Chen, L.[Long],
Wu, B.X.[Bo-Xi],
Lin, B.B.[Bin-Bin],
He, X.F.[Xiao-Fei],
Liu, W.[Wei],
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale
Attention,
PAMI(46), No. 5, May 2024, pp. 3123-3136.
IEEE DOI
2404
Transformers, Task analysis, Feature extraction, Visualization,
Object detection, Costs, Adaptation models, Image classification,
vision transformer
BibRef
Zhang, Q.M.[Qi-Ming],
Zhang, J.[Jing],
Xu, Y.F.[Yu-Fei],
Tao, D.C.[Da-Cheng],
Vision Transformer With Quadrangle Attention,
PAMI(46), No. 5, May 2024, pp. 3608-3624.
IEEE DOI
2404
Transformers, Task analysis, Shape, Feature extraction,
Adaptation models, Semantic segmentation, vision transformer
BibRef
Huang, L.[Lan],
Bai, X.Y.[Xing-Yu],
Zeng, J.[Jia],
Yu, M.Q.[Meng-Qiang],
Pang, W.[Wei],
Wang, K.P.[Kang-Ping],
FAM: Improving columnar vision transformer with feature attention
mechanism,
CVIU(242), 2024, pp. 103981.
Elsevier DOI
2404
Vision transformer, Feature adjustment, Network structure improvement
BibRef
Li, M.X.[Ming-Xiu],
Yu, W.[Wei],
Liu, Q.L.[Qing-Lin],
Li, Z.L.[Zong-Lin],
Li, R.[Ru],
Zhong, B.[Bineng],
Zhang, S.P.[Sheng-Ping],
Hybrid Transformers With Attention-Guided Spatial Embeddings for
Makeup Transfer and Removal,
CirSysVideo(34), No. 4, April 2024, pp. 2876-2890.
IEEE DOI
2404
Faces, Feature extraction, Semantics, Transformers, Shape,
Image color analysis, Data mining, Makeup transfer, makeup removal,
vision transformer
BibRef
Nie, X.S.[Xue-Song],
Jin, H.Y.[Hao-Yuan],
Yan, Y.F.[Yun-Feng],
Chen, X.[Xi],
Zhu, Z.H.[Zhi-Hang],
Qi, D.L.[Dong-Lian],
ScopeViT: Scale-Aware Vision Transformer,
PR(153), 2024, pp. 110470.
Elsevier DOI
2405
Vision transformer, Multi-scale features, Efficient attention mechanism
BibRef
Hanyu, T.[Taisei],
Yamazaki, K.[Kashu],
Tran, M.[Minh],
McCann, R.A.[Roy A.],
Liao, H.T.[Hai-Tao],
Rainwater, C.[Chase],
Adkins, M.[Meredith],
Cothren, J.[Jackson],
Le, N.[Ngan],
AerialFormer: Multi-Resolution Transformer for Aerial Image
Segmentation,
RS(16), No. 16, 2024, pp. 2930.
DOI Link
2408
BibRef
Wang, D.Z.[De-Zheng],
Wei, X.Y.[Xiao-Yi],
Chen, C.Y.[Cong-Yan],
CAST: An innovative framework for Cross-dimensional Attention
Structure in Transformers,
PR(159), 2025, pp. 111153.
Elsevier DOI
2412
Cross-dimensional attention structure,
Static attention mechanism, Time series forecasting
BibRef
van Engelenhoven, A.[Adjorn],
Strisciuglio, N.[Nicola],
Talavera, E.[Estefanía],
CAST: Clustering self-Attention using Surrogate Tokens for efficient
transformers,
PRL(186), 2024, pp. 30-36.
Elsevier DOI
2412
Self-attention mechanism, Clustering self-attention mechanism,
Complexity, Efficient transformers, LRA benchmark
BibRef
Zheng, G.Y.[Guang-Yao],
Zang, B.[Bo],
Yang, P.H.[Peng-Hui],
Zhang, W.B.[Wen-Bo],
Li, B.[Bin],
FE-SKViT: A Feature-Enhanced ViT Model with Skip Attention for
Automatic Modulation Recognition,
RS(16), No. 22, 2024, pp. 4204.
DOI Link
2412
BibRef
Lu, J.C.[Jia-Chen],
Zhang, J.G.[Jun-Ge],
Zhu, X.T.[Xia-Tian],
Feng, J.F.[Jian-Feng],
Xiang, T.[Tao],
Zhang, L.[Li],
Softmax-Free Linear Transformers,
IJCV(132), No. 8, August 2024, pp. 3355-3374.
Springer DOI Code:
WWW Link.
2408
Approximage the self-attention by linear function.
BibRef
Li, C.H.[Cheng-Hao],
Zhang, C.N.[Chao-Ning],
Toward a deeper understanding: RetNet viewed through Convolution,
PR(155), 2024, pp. 110625.
Elsevier DOI Code:
WWW Link.
2408
Boost local response of ViT.
Convolutional neural network, Vision transformer, RetNet
BibRef
Liao, H.X.[Hui-Xian],
Li, X.S.[Xiao-Sen],
Qin, X.[Xiao],
Wang, W.J.[Wen-Ji],
He, G.D.[Guo-Dui],
Huang, H.J.[Hao-Jie],
Guo, X.[Xu],
Chun, X.[Xin],
Zhang, J.Y.[Jin-Yong],
Fu, Y.Q.[Yun-Qin],
Qin, Z.Y.[Zheng-You],
EPSViTs: A hybrid architecture for image classification based on
parameter-shared multi-head self-attention,
IVC(149), 2024, pp. 105130.
Elsevier DOI
2408
Image classification, Multi-head self-attention,
Parameter-shared, Hybrid architecture
BibRef
Sa, J.W.[Jae-Won],
Ryu, J.[Junhwan],
Kim, H.[Heegon],
ECTFormer: An efficient Conv-Transformer model design for image
recognition,
PR(159), 2025, pp. 111092.
Elsevier DOI
2412
Conv-Transformer network, Lightweight architecture,
Dynamic kernel sizes, Efficient overlapping patchify,
Efficient self-attention mechanism
BibRef
Li, J.F.[Jin-Feng],
Feng, M.L.[Mei-Ling],
Xia, C.Y.[Cheng-Yi],
DBCvT: Double Branch Convolutional Transformer for Medical Image
Classification,
PRL(186), 2024, pp. 250-257.
Elsevier DOI
2412
Convolutional Neural Networks, Transformer, Self-attention,
Channel attention, Medical Image Classification
BibRef
Liao, Y.[Yi],
Gao, Y.S.[Yong-Sheng],
Zhang, W.C.[Wei-Chuan],
Dynamic accumulated attention map for interpreting evolution of
decision-making in vision transformer,
PR(165), 2025, pp. 111607.
Elsevier DOI Code:
WWW Link.
2505
Explanation map, Attention flow, Vision transformer, Image classification
BibRef
Shi, Y.L.[Yu-Long],
Sun, M.W.[Ming-Wei],
Wang, Y.S.[Yong-Shuai],
Ma, J.H.[Jia-Hao],
Chen, Z.Q.[Zeng-Qiang],
EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention,
Cyber(55), No. 3, March 2025, pp. 1288-1300.
IEEE DOI Code:
WWW Link.
2503
Visualization, Transformers, Physiology, Photoreceptors,
Computational modeling, Biological information theory,
eagle vision transformer (EViTs)
BibRef
Long, W.[Wei],
Chen, Z.Y.[Zi-Yang],
Li, W.T.[Wen-Ting],
Zhang, Y.J.[Yong-Jun],
Yao, H.[He],
Peng, J.X.[Jia-Xin],
Cui, Z.W.[Zhong-Wei],
Leveraging negative correlation for Full-Range Self-Attention in
Vision Transformers,
PR(169), 2026, pp. 111899.
Elsevier DOI
2509
Self-attention, Full-range self-attention, Vision transformer,
Image classification
BibRef
Shan, J.[Jiquan],
Wang, J.X.[Jun-Xiao],
Zhao, L.F.[Li-Feng],
Cai, L.[Liang],
Zhang, H.Y.[Hong-Yuan],
Liritzis, I.[Ioannis],
AnchorFormer: Differentiable anchor attention for efficient vision
transformer,
PRL(197), 2025, pp. 124-131.
Elsevier DOI
2510
Vision transformer, Efficient transformer, Anchor attention
BibRef
Bae, J.[Jongseong],
Kim, S.[Susang],
Cho, M.[Minsu],
Kim, H.Y.[Ha Young],
MVFormer: Diversifying feature normalization and token mixing for
efficient vision transformers,
PRL(197), 2025, pp. 72-80.
Elsevier DOI
2510
Vision transformer, Diverse feature learning, Normalization, Token mixer
BibRef
Li, Y.[Yang],
Jiao, L.C.[Li-Cheng],
Liu, X.[Xu],
Liu, F.[Fang],
Li, L.L.[Ling-Ling],
Chen, P.[Puhua],
Semantic-Aware Wavelet Transformer for Pyramid Learning Object
Detection,
MultMed(27), 2025, pp. 8016-8028.
IEEE DOI
2511
Transformers, Discrete wavelet transforms, Object detection,
Semantics, Spatial resolution, Head, Correlation, Convolution,
and computation burden
BibRef
Liu, Z.[Zuyan],
Rao, Y.M.[Yong-Ming],
Zhao, W.L.[Wen-Liang],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Efficient High-Order Spatial Interactions for Visual Perception,
PAMI(48), No. 1, January 2026, pp. 33-46.
IEEE DOI
2512
BibRef
Earlier: A2, A3, A4, A5, Only:
AMixer:
Adaptive Weight Mixing for Self-Attention Free Vision Transformers,
ECCV22(XXI:50-67).
Springer DOI
2211
Transformers, Convolution, Point cloud compression, Visualization,
Logic gates, Computational modeling, Training, Image recognition,
recursive gated convolution.
BibRef
Guo, H.L.[Han-Lin],
Lv, W.J.[Wei-Jia],
Shen, Z.[Zhi],
Wang, D.H.[Da-Han],
Zhang, Y.[Yukang],
CPFormer-Net: Correspondence Pruning Transformer With Structured
Context Aggregation,
SPLetters(33), 2026, pp. 111-115.
IEEE DOI
2512
Transformers, Semantics, Feature extraction, Context modeling,
Computer architecture, Attention mechanisms, Cognition, Accuracy,
structured context aggregation
BibRef
Hang, J.F.[Jing-Fan],
Yang, X.Q.[Xian-Qiang],
Enhancing local attention with global information interaction via
progressive cluster propagation,
PR(172), 2026, pp. 112713.
Elsevier DOI Code:
WWW Link.
2601
K-Means clustering, Vision transformer, Image recognition,
Semantic segmentation, Object detection
BibRef
Lin, S.H.[Si-Hao],
Lyu, P.M.[Pu-Meng],
Liu, D.R.[Dong-Rui],
Li, Z.H.[Zhi-Hui],
Wang, W.G.[Wen-Guan],
Chang, X.J.[Xiao-Jun],
Zheng, Y.H.[Yu-Hui],
Entropy-Guided Condensing for Vision Transformer,
IJCV(134), No. 1, January 2026, pp. 86.
Springer DOI
2602
BibRef
Wu, C.[Chong],
Che, M.L.[Mao-Lin],
Yan, H.[Hong],
The CUR Decomposition of Self-Attention Matrices in Vision
Transformers,
PAMI(48), No. 4, April 2026, pp. 4792-4809.
IEEE DOI
2603
Matrix decomposition, Complexity theory, Transformers, Kernel,
Attention mechanisms, Linear approximation, Sparse matrices,
vision transformer
BibRef
Li, Y.[Yuan],
Wu, X.[Xiang],
Wang, J.C.[Jia-Cun],
Bo, Y.M.[Yu-Ming],
Ni, F.[Feng],
Jiang, C.H.[Chang-Hui],
Differential attention vision transformer with adaptive spatial
feature conditioning for remote sensing scene classification,
PR(178), 2026, pp. 113461.
Elsevier DOI
2605
Vision transformer, Differential attention,
Scene classification, Remote sensing, Adaptive feature conditioning
BibRef
Liu, Y.H.[Yi-Hang],
Wen, Y.[Ying],
Yang, L.Z.[Long-Zhen],
He, L.H.[Liang-Hua],
Zhou, M.[MengChu],
A General Framework for Efficient Medical Image Analysis via Shared
Attention Vision Transformer,
MedImg(45), No. 5, May 2026, pp. 2001-2014.
IEEE DOI Code:
WWW Link.
2605
Biomedical imaging, Visualization, Feature extraction,
Image analysis, Adaptation models, Tuning, Transfer learning,
parameter efficiency
BibRef
Zhou, S.[Sai],
Liu, M.[Meiqin],
Zhou, J.[Jing],
Zheng, R.H.[Rong-Hao],
Enhancing Vision Transformer With Shift Expansion Linear Attention
for Image Classification and Object Tracking,
CirSysVideo(36), No. 6, June 2026, pp. 9042-9056.
IEEE DOI Code:
WWW Link.
2606
Head, Object tracking, Image classification, Feature extraction,
Transformers, Accuracy, Videos, Optimization, object tracking
BibRef
Savathrakis, G.[Giorgos],
Argyros, A.[Antonis],
Enact: Entropy-Based Clustering of Attention Input for Reducing the
Computational Needs of Object Detection Transformers,
ICIP25(295-300)
IEEE DOI Code:
WWW Link.
2601
Training, Accuracy, Codes, Memory management, Graphics processing units,
Object detection, Transformers, Entropy, Trans-formers
BibRef
Fan, Q.H.[Qi-Hang],
Huang, H.B.[Huai-Bo],
He, R.[Ran],
Breaking the Low-Rank Dilemma of Linear Attention,
CVPR25(25271-25280)
IEEE DOI Code:
WWW Link.
2508
Degradation, Training, Attention mechanisms,
Computational modeling, Buildings, Transformers, Complexity theory
BibRef
Miao, Z.C.[Zi-Chen],
Chen, W.[Wei],
Qiu, Q.[Qiang],
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based
Large Models,
CVPR25(20146-20146)
IEEE DOI
2508
Training, Measurement, Convex hulls, Convolution,
Computational modeling, Transformers, Tuning
BibRef
Sun, Y.W.[Yu-Wei],
Ochiai, H.[Hideya],
Wu, Z.R.[Zhi-Rong],
Lin, S.[Stephen],
Kanai, R.[Ryota],
Associative Transformer,
CVPR25(4518-4527)
IEEE DOI
2508
Training, Associative memory, Adaptation models, Attention mechanisms,
Transformers, Cognition, Trojan horses, Videos, transformers
BibRef
Chen, L.Y.[Li-Yan],
Meyer, G.P.[Gregory P.],
Zhang, Z.[Zaiwei],
Wolff, E.M.[Eric M.],
Vernaza, P.[Paul],
Flash3D: Super-scaling Point Transformers through Joint
Hardware-Geometry Locality,
CVPR25(6595-6604)
IEEE DOI
2508
Point cloud compression, Attention mechanisms, Costs, Fuses,
Memory management, Graphics processing units, Transformers, flashattention
BibRef
Zhang, W.[Wei],
Zhang, B.P.[Bao-Peng],
Teng, Z.[Zhu],
Luo, W.X.[Wen-Xin],
Zou, J.[Junnan],
Fan, J.P.[Jian-Ping],
Less Attention is More: Prompt Transformer for Generalized Category
Discovery,
CVPR25(30322-30331)
IEEE DOI Code:
WWW Link.
2508
Visualization, Adaptation models, Refining, Transformers,
Feature extraction, Brain modeling, Standards, Visual perception
BibRef
Zhu, J.C.[Jia-Chen],
Chen, X.L.[Xin-Lei],
He, K.[Kaiming],
LeCun, Y.[Yann],
Liu, Z.[Zhuang],
Transformers without Normalization,
CVPR25(14901-14911)
IEEE DOI
2508
Training, Computational modeling, Self-supervised learning,
Artificial neural networks, Transformers, Tuning, transformer, normalization
BibRef
Peng, Z.L.[Ze-Lin],
Huang, Y.[Yu],
Xu, Z.Q.[Zheng-Qin],
Tang, F.L.[Fei-Long],
Hu, M.[Ming],
Yang, X.K.[Xiao-Kang],
Shen, W.[Wei],
Star with Bilinear Mapping,
CVPR25(25292-25302)
IEEE DOI Code:
WWW Link.
2508
Transformer-like, but gets global context.
Computational modeling, Semantic segmentation, Stars, Transformers,
Complexity theory, Computational efficiency, Context modeling,
Image classification
BibRef
Nottebaum, M.[Moritz],
Dunnhofer, M.[Matteo],
Micheloni, C.[Christian],
LowFormer: Hardware Efficient Design for Convolutional Transformer
Backbones,
WACV25(7008-7018)
IEEE DOI Code:
WWW Link.
2505
Convolutional codes, Accuracy, Semantic segmentation,
Graphics processing units, attention
BibRef
Chowdhury, A.R.[Amartya Roy],
Diddigi, R.B.[Raghuram Bharadwaj],
Prabuchandran, K.J.,
Tripathi, A.M.[Achyut Mani],
Bandit-based Attention Mechanism in Vision Transformers,
WACV25(9597-9606)
IEEE DOI Code:
WWW Link.
2505
Training, Codes, Computational modeling, Focusing, Transformer cores,
Transformers, Throughput, Computational efficiency, Complexity theory
BibRef
Alam, Q.M.[Quazi Mishkatul],
Tarchoun, B.[Bilel],
Alouani, I.[Ihsen],
Abu-Ghazaleh, N.[Nael],
Adversarial Attention Deficit: Fooling Deformable Vision Transformers
with Collaborative Adversarial Patches,
WACV25(7123-7132)
IEEE DOI
2505
Deformable models, Noise, Collaboration, Object detection, Transformers
BibRef
Ren, S.[Sucheng],
Zhou, D.[Daquan],
He, S.F.[Sheng-Feng],
Feng, J.S.[Jia-Shi],
Wang, X.C.[Xin-Chao],
Shunted Self-Attention via Multi-Scale Token Aggregation,
CVPR22(10843-10852)
IEEE DOI
2210
Degradation, Deep learning, Costs, Computational modeling, Merging,
Efficient learning and inferences
BibRef
Qiang, Y.[Yao],
Li, C.Y.[Cheng-Yin],
Khanduri, P.[Prashant],
Zhu, D.X.[Dong-Xiao],
Fairness-aware Vision Transformer via Debiased Self-attention,
ECCV24(XXXVII: 358-376).
Springer DOI
2412
BibRef
Gong, H.H.[Hui-Hui],
Dong, M.J.[Min-Jing],
Ma, S.Q.[Si-Qi],
Camtepe, S.[Seyit],
Nepal, S.[Surya],
Xu, C.[Chang],
Random Entangled Tokens for Adversarially Robust Vision Transformer,
CVPR24(24554-24563)
IEEE DOI
2410
Training, Benchmark testing, Transformers,
Robustness, Vision Transformers, Self-Attention Mechanism
BibRef
Lee, S.[Sanghyeok],
Choi, J.[Joonmyung],
Kim, H.W.J.[Hyun-Woo J.],
Multi-Criteria Token Fusion with One-Step-Ahead Attention for
Efficient Vision Transformers,
CVPR24(15741-15750)
IEEE DOI Code:
WWW Link.
2410
Training, Degradation, Costs, Fuses, Computational modeling,
Transformers, Efficient ViTs, Token Fusion, Token Reduction, Token Merging
BibRef
Zhang, S.X.[Shuo-Xi],
Liu, H.P.[Han-Peng],
Lin, S.[Stephen],
He, K.[Kun],
You Only Need Less Attention at Each Stage in Vision Transformers,
CVPR24(6057-6066)
IEEE DOI
2410
Deep learning, Computational modeling,
Transformers, Computational efficiency, efficient training
BibRef
Li, L.[Lujun],
Wei, Z.[Zimian],
Dong, P.[Peijie],
Luo, W.H.[Wen-Han],
Xue, W.[Wei],
Liu, Q.F.[Qi-Feng],
Guo, Y.[Yike],
Attnzero: Efficient Attention Discovery for Vision Transformers,
ECCV24(V: 20-37).
Springer DOI
2412
BibRef
Bao-Long, N.H.[Nguyen-Huu],
Zhang, C.Y.[Chen-Yu],
Shi, Y.Z.[Yu-Zhi],
Hirakawa, T.[Tsubasa],
Yamashita, T.[Takayoshi],
Matsui, T.[Tohgoroh],
Fujiyoshi, H.[Hironobu],
Debiformer: Vision Transformer with Deformable Agent Bi-level Routing
Attention,
ACCV24(X: 445-462).
Springer DOI
2412
BibRef
Yang, X.[Xuan],
Yuan, L.Z.[Liang-Zhe],
Wilber, K.[Kimberly],
Sharma, A.[Astuti],
Gu, X.Y.[Xiu-Ye],
Qiao, S.Y.[Si-Yuan],
Debats, S.[Stephanie],
Wang, H.S.[Hui-Sheng],
Adam, H.[Hartwig],
Sirotenko, M.[Mikhail],
Chen, L.C.[Liang-Chieh],
PolyMaX: General Dense Prediction with Mask Transformer,
WACV24(1039-1050)
IEEE DOI
2404
Codes, Image synthesis, Semantic segmentation, Estimation,
Benchmark testing, Algorithms,
Image recognition and understanding
BibRef
Nie, X.S.[Xue-Song],
Chen, X.[Xi],
Jin, H.Y.[Hao-Yuan],
Zhu, Z.H.[Zhi-Hang],
Yan, Y.F.[Yun-Feng],
Qi, D.L.[Dong-Lian],
Triplet Attention Transformer for Spatiotemporal Predictive Learning,
WACV24(7021-7030)
IEEE DOI
2404
Computational modeling, Self-supervised learning,
Predictive models, Parallel processing, Transformers, and algorithms
BibRef
Cai, H.[Han],
Li, J.[Junyan],
Hu, M.[Muyan],
Gan, C.[Chuang],
Han, S.[Song],
EfficientViT: Lightweight Multi-Scale Attention for High-Resolution
Dense Prediction,
ICCV23(17256-17267)
IEEE DOI
2401
BibRef
Ryu, J.B.[Jong-Bin],
Han, D.Y.[Dong-Yoon],
Lim, J.W.[Jong-Woo],
Gramian Attention Heads are Strong yet Efficient Vision Learners,
ICCV23(5818-5828)
IEEE DOI Code:
WWW Link.
2401
BibRef
Xu, R.H.[Rui-Han],
Zhang, H.[Haokui],
Hu, W.Z.[Wen-Ze],
Zhang, S.L.[Shi-Liang],
Wang, X.Y.[Xiao-Yu],
ParCNetV2: Oversized Kernel with Enhanced Attention*,
ICCV23(5729-5739)
IEEE DOI Code:
WWW Link.
2401
BibRef
Zhao, B.Y.[Bing-Yin],
Yu, Z.[Zhiding],
Lan, S.Y.[Shi-Yi],
Cheng, Y.T.[Yu-Tao],
Anandkumar, A.[Anima],
Lao, Y.J.[Ying-Jie],
Alvarez, J.M.[Jose M.],
Fully Attentional Networks with Self-emerging Token Labeling,
ICCV23(5562-5572)
IEEE DOI
2401
BibRef
Guo, Y.[Yong],
Stutz, D.[David],
Schiele, B.[Bernt],
Robustifying Token Attention for Vision Transformers,
ICCV23(17511-17522)
IEEE DOI Code:
WWW Link.
2401
BibRef
Zhao, Y.P.[You-Peng],
Tang, H.D.[Hua-Dong],
Jiang, Y.Y.[Ying-Ying],
A, Y.[Yong],
Wu, Q.[Qiang],
Wang, J.[Jun],
Parameter-Efficient Vision Transformer with Linear Attention,
ICIP23(1275-1279)
IEEE DOI
2312
BibRef
Shi, L.[Lili],
Huang, H.D.[Hai-Duo],
Song, B.[Bowei],
Tan, M.[Meng],
Zhao, W.Z.[Wen-Zhe],
Xia, T.[Tian],
Ren, P.J.[Peng-Ju],
TAQ: Top-K Attention-Aware Quantization for Vision Transformers,
ICIP23(1750-1754)
IEEE DOI
2312
BibRef
Baili, N.[Nada],
Frigui, H.[Hichem],
ADA-VIT: Attention-Guided Data Augmentation for Vision Transformers,
ICIP23(385-389)
IEEE DOI
2312
BibRef
Ding, M.Y.[Ming-Yu],
Shen, Y.K.[Yi-Kang],
Fan, L.J.[Li-Jie],
Chen, Z.F.[Zhen-Fang],
Chen, Z.[Zitian],
Luo, P.[Ping],
Tenenbaum, J.[Josh],
Gan, C.[Chuang],
Visual Dependency Transformers:
Dependency Tree Emerges from Reversed Attention,
CVPR23(14528-14539)
IEEE DOI
2309
BibRef
Song, J.C.[Jie-Chong],
Mou, C.[Chong],
Wang, S.Q.[Shi-Qi],
Ma, S.W.[Si-Wei],
Zhang, J.[Jian],
Optimization-Inspired Cross-Attention Transformer for Compressive
Sensing,
CVPR23(6174-6184)
IEEE DOI
2309
BibRef
Hassani, A.[Ali],
Walton, S.[Steven],
Li, J.C.[Jia-Chen],
Li, S.[Shen],
Shi, H.[Humphrey],
Neighborhood Attention Transformer,
CVPR23(6185-6194)
IEEE DOI
2309
BibRef
Liu, Z.J.[Zhi-Jian],
Yang, X.Y.[Xin-Yu],
Tang, H.T.[Hao-Tian],
Yang, S.[Shang],
Han, S.[Song],
FlatFormer: Flattened Window Attention for Efficient Point Cloud
Transformer,
CVPR23(1200-1211)
IEEE DOI
2309
BibRef
Pan, X.[Xuran],
Ye, T.Z.[Tian-Zhu],
Xia, Z.F.[Zhuo-Fan],
Song, S.[Shiji],
Huang, G.[Gao],
Slide-Transformer: Hierarchical Vision Transformer with Local
Self-Attention,
CVPR23(2082-2091)
IEEE DOI
2309
BibRef
Zhu, L.[Lei],
Wang, X.J.[Xin-Jiang],
Ke, Z.H.[Zhang-Han],
Zhang, W.[Wayne],
Lau, R.[Rynson],
BiFormer: Vision Transformer with Bi-Level Routing Attention,
CVPR23(10323-10333)
IEEE DOI
2309
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Pi, J.[Jimin],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Beyond Attentive Tokens: Incorporating Token Importance and Diversity
for Efficient Vision Transformers,
CVPR23(10334-10343)
IEEE DOI
2309
BibRef
Liu, X.Y.[Xin-Yu],
Peng, H.[Houwen],
Zheng, N.X.[Ning-Xin],
Yang, Y.Q.[Yu-Qing],
Hu, H.[Han],
Yuan, Y.X.[Yi-Xuan],
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group
Attention,
CVPR23(14420-14430)
IEEE DOI
2309
BibRef
You, H.R.[Hao-Ran],
Xiong, Y.[Yunyang],
Dai, X.L.[Xiao-Liang],
Wu, B.[Bichen],
Zhang, P.Z.[Pei-Zhao],
Fan, H.Q.[Hao-Qi],
Vajda, P.[Peter],
Lin, Y.Y.C.[Ying-Yan Celine],
Castling-ViT: Compressing Self-Attention via Switching Towards
Linear-Angular Attention at Vision Transformer Inference,
CVPR23(14431-14442)
IEEE DOI
2309
BibRef
Grainger, R.[Ryan],
Paniagua, T.[Thomas],
Song, X.[Xi],
Cuntoor, N.[Naresh],
Lee, M.W.[Mun Wai],
Wu, T.F.[Tian-Fu],
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers,
CVPR23(18568-18578)
IEEE DOI
2309
BibRef
Wei, C.[Cong],
Duke, B.[Brendan],
Jiang, R.[Ruowei],
Aarabi, P.[Parham],
Taylor, G.W.[Graham W.],
Shkurti, F.[Florian],
Sparsifiner: Learning Sparse Instance-Dependent Attention for
Efficient Vision Transformers,
CVPR23(22680-22689)
IEEE DOI
2309
BibRef
Bhattacharyya, M.[Mayukh],
Chattopadhyay, S.[Soumitri],
Nag, S.[Sayan],
DeCAtt: Efficient Vision Transformers with Decorrelated Attention
Heads,
ECV23(4695-4699)
IEEE DOI
2309
BibRef
Zhang, Y.[Yuke],
Chen, D.[Dake],
Kundu, S.[Souvik],
Li, C.H.[Cheng-Hao],
Beerel, P.A.[Peter A.],
SAL-ViT: Towards Latency Efficient Private Inference on ViT using
Selective Attention Search with a Learnable Softmax Approximation,
ICCV23(5093-5102)
IEEE DOI
2401
BibRef
Yeganeh, Y.[Yousef],
Farshad, A.[Azade],
Weinberger, P.[Peter],
Ahmadi, S.A.[Seyed-Ahmad],
Adeli, E.[Ehsan],
Navab, N.[Nassir],
Transformers Pay Attention to Convolutions Leveraging Emerging
Properties of ViTs by Dual Attention-Image Network,
CVAMD23(2296-2307)
IEEE DOI
2401
BibRef
Zheng, J.H.[Jia-Hao],
Yang, L.Q.[Long-Qi],
Li, Y.Y.[Yi-Ying],
Yang, K.[Ke],
Wang, Z.Y.[Zhi-Yuan],
Zhou, J.[Jun],
Lightweight Vision Transformer with Spatial and Channel Enhanced
Self-Attention,
REDLCV23(1484-1488)
IEEE DOI
2401
BibRef
Hyeon-Woo, N.[Nam],
Yu-Ji, K.[Kim],
Heo, B.[Byeongho],
Han, D.Y.[Dong-Yoon],
Oh, S.J.[Seong Joon],
Oh, T.H.[Tae-Hyun],
Scratching Visual Transformer's Back with Uniform Attention,
ICCV23(5784-5795)
IEEE DOI
2401
BibRef
Zhang, H.K.[Hao-Kui],
Hu, W.Z.[Wen-Ze],
Wang, X.Y.[Xiao-Yu],
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer,
ICCV23(6037-6046)
IEEE DOI Code:
WWW Link.
2401
BibRef
Zeng, W.X.[Wen-Xuan],
Li, M.[Meng],
Xiong, W.J.[Wen-Jie],
Tong, T.[Tong],
Lu, W.J.[Wen-Jie],
Tan, J.[Jin],
Wang, R.S.[Run-Sheng],
Huang, R.[Ru],
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision
Transformer with Heterogeneous Attention,
ICCV23(5029-5040)
IEEE DOI Code:
WWW Link.
2401
BibRef
Psomas, B.[Bill],
Kakogeorgiou, I.[Ioannis],
Karantzalos, K.[Konstantinos],
Avrithis, Y.[Yannis],
Keep It SimPool:Who Said Supervised Transformers Suffer from
Attention Deficit?,
ICCV23(5327-5337)
IEEE DOI Code:
WWW Link.
2401
BibRef
Han, D.C.[Dong-Chen],
Pan, X.[Xuran],
Han, Y.Z.[Yi-Zeng],
Song, S.[Shiji],
Huang, G.[Gao],
FLatten Transformer: Vision Transformer using Focused Linear
Attention,
ICCV23(5938-5948)
IEEE DOI Code:
WWW Link.
2401
BibRef
Tatsunami, Y.[Yuki],
Taki, M.[Masato],
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
Locality?,
ACCV22(VI:459-475).
Springer DOI
2307
WWW Link. Address computational comlexity.
BibRef
Bolya, D.[Daniel],
Fu, C.Y.[Cheng-Yang],
Dai, X.L.[Xiao-Liang],
Zhang, P.Z.[Pei-Zhao],
Hoffman, J.[Judy],
Hydra Attention: Efficient Attention with Many Heads,
CADK22(35-49).
Springer DOI
2304
Transformers computation explodes with large images. Multiple heads.
BibRef
Chen, X.Y.[Xiang-Yu],
Hu, Q.H.[Qing-Hao],
Li, K.[Kaidong],
Zhong, C.[Cuncong],
Wang, G.H.[Guang-Hui],
Accumulated Trivial Attention Matters in Vision Transformers on Small
Datasets,
WACV23(3973-3981)
IEEE DOI
2302
Codes, Focusing, Transformers, Convolutional neural networks,
Task analysis, Algorithms: Machine learning architectures,
and algorithms (including transfer)
BibRef
Lan, H.[Hai],
Wang, X.[Xihao],
Shen, H.[Hao],
Liang, P.D.[Pei-Dong],
Wei, X.[Xian],
Couplformer: Rethinking Vision Transformer with Coupling Attention,
WACV23(6464-6473)
IEEE DOI
2302
Couplings, Visualization, Image segmentation,
Computational modeling, Memory management, Object detection,
Visualization
BibRef
Debnath, B.[Biplob],
Po, O.[Oliver],
Chowdhury, F.A.[Farhan Asif],
Chakradhar, S.[Srimat],
Cosine Similarity based Few-Shot Video Classifier with
Attention-based Aggregation,
ICPR22(1273-1279)
IEEE DOI
2212
Training, Head, Pipelines, Benchmark testing, Feature extraction,
Transformers
BibRef
Mari, C.R.[Carlos Roig],
Gonzalez, D.V.[David Varas],
Bou-Balust, E.[Elisenda],
Multi-Scale Transformer-Based Feature Combination for Image Retrieval,
ICIP22(3166-3170)
IEEE DOI
2211
Visualization, Semantics, Image retrieval, Feature extraction,
Transformers, Internet, Image retrieval, Attention, Multi-scale,
Feature combination
BibRef
Furukawa, R.[Ryouichi],
Hotta, K.[Kazuhiro],
Local Embedding for Axial Attention,
ICIP22(2586-2590)
IEEE DOI
2211
Deep learning, Image segmentation, Visualization,
Computational modeling, Neural networks, Transformers.
BibRef
Ding, M.Y.[Ming-Yu],
Xiao, B.[Bin],
Codella, N.[Noel],
Luo, P.[Ping],
Wang, J.D.[Jing-Dong],
Yuan, L.[Lu],
DaViT: Dual Attention Vision Transformers,
ECCV22(XXIV:74-92).
Springer DOI
2211
BibRef
Wang, P.C.[Pi-Chao],
Wang, X.[Xue],
Wang, F.[Fan],
Lin, M.[Ming],
Chang, S.N.[Shu-Ning],
Li, H.[Hao],
Jin, R.[Rong],
KVT: k-NN Attention for Boosting Vision Transformers,
ECCV22(XXIV:285-302).
Springer DOI
2211
BibRef
Li, A.[Ang],
Jiao, J.C.[Ji-Chao],
Li, N.[Ning],
Qi, W.J.[Wang-Jing],
Xu, W.[Wei],
Pang, M.[Min],
Conmw Transformer: A General Vision Transformer Backbone With
Merged-Window Attention,
ICIP22(1551-1555)
IEEE DOI
2211
Image resolution, Convolution, Transformers, Feature extraction, Tokenization,
Computational efficiency, Vision Transformer, hybrid architecture
BibRef
Zhang, Q.M.[Qi-Ming],
Xu, Y.F.[Yu-Fei],
Zhang, J.[Jing],
Tao, D.C.[Da-Cheng],
VSA: Learning Varied-Size Window Attention in Vision Transformers,
ECCV22(XXV:466-483).
Springer DOI
2211
BibRef
Mallick, R.[Rupayan],
Benois-Pineau, J.[Jenny],
Zemmari, A.[Akka],
I Saw: A Self-Attention Weighted Method for Explanation of Visual
Transformers,
ICIP22(3271-3275)
IEEE DOI
2211
Measurement, Correlation coefficient, Visualization,
Image segmentation, Databases, Object detection, Transformers,
Gaze Fixation Density Maps
BibRef
Song, Z.K.[Zi-Kai],
Yu, J.Q.[Jun-Qing],
Chen, Y.P.P.[Yi-Ping Phoebe],
Yang, W.[Wei],
Transformer Tracking with Cyclic Shifting Window Attention,
CVPR22(8781-8790)
IEEE DOI
2210
WWW Link. Visualization, Target tracking, Image recognition,
Optimization methods, Benchmark testing
BibRef
Yang, C.L.[Cheng-Lin],
Wang, Y.L.[Yi-Lin],
Zhang, J.M.[Jian-Ming],
Zhang, H.[He],
Wei, Z.J.[Zi-Jun],
Lin, Z.[Zhe],
Yuille, A.L.[Alan L.],
Lite Vision Transformer with Enhanced Self-Attention,
CVPR22(11988-11998)
IEEE DOI
2210
Convolutional codes, Image segmentation, Visualization,
Convolution, Semantics, Merging, Predictive models,
Deep learning architectures and techniques
BibRef
Xia, Z.F.[Zhuo-Fan],
Pan, X.[Xuran],
Song, S.[Shiji],
Li, L.E.[Li Erran],
Huang, G.[Gao],
Vision Transformer with Deformable Attention,
CVPR22(4784-4793)
IEEE DOI
2210
Deformable models, Adaptation models, Computational modeling,
Predictive models, Transformers, Data models,
grouping and shape analysis
BibRef
Yu, T.[Tong],
Khalitov, R.[Ruslan],
Cheng, L.[Lei],
Yang, Z.R.[Zhi-Rong],
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better
than Dot-Product Self-Attention,
CVPR22(681-690)
IEEE DOI
2210
Protocols, Costs, Scalability, Neural networks, Stacking, Genomics,
Transformers, Deep learning architectures and techniques,
Representation learning
BibRef
Cheng, B.[Bowen],
Misra, I.[Ishan],
Schwing, A.G.[Alexander G.],
Kirillov, A.[Alexander],
Girdhar, R.[Rohit],
Masked-attention Mask Transformer for Universal Image Segmentation,
CVPR22(1280-1289)
IEEE DOI
2210
Image segmentation, Shape, Computational modeling, Semantics,
Transformers, Feature extraction, retrieval
BibRef
Rangrej, S.B.[Samrudhdhi B.],
Srinidhi, C.L.[Chetan L.],
Clark, J.J.[James J.],
Consistency driven Sequential Transformers Attention Model for
Partially Observable Scenes,
CVPR22(2508-2517)
IEEE DOI
2210
Training, Computational modeling, Imaging, Predictive models,
Transformers, Prediction algorithms, Visual reasoning
BibRef
Chen, C.F.R.[Chun-Fu Richard],
Fan, Q.F.[Quan-Fu],
Panda, R.[Rameswar],
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
Classification,
ICCV21(347-356)
IEEE DOI
2203
Image segmentation, Image recognition, Computational modeling,
Semantics, Memory management, Object detection, Representation learning
BibRef
Chefer, H.[Hila],
Gur, S.[Shir],
Wolf, L.B.[Lior B.],
Generic Attention-model Explainability for Interpreting Bi-Modal and
Encoder-Decoder Transformers,
ICCV21(387-396)
IEEE DOI
2203
Measurement, Visualization, Image segmentation,
Computational modeling, Object detection,
BibRef
Xu, W.J.[Wei-Jian],
Xu, Y.F.[Yi-Fan],
Chang, T.[Tyler],
Tu, Z.W.[Zhuo-Wen],
Co-Scale Conv-Attentional Image Transformers,
ICCV21(9961-9970)
IEEE DOI
2203
Image segmentation, Computational modeling, Object detection,
Transformers, Convolutional neural networks, Task analysis,
Recognition and classification
BibRef
Yang, G.L.[Guang-Lei],
Tang, H.[Hao],
Ding, M.L.[Ming-Li],
Sebe, N.[Nicu],
Ricci, E.[Elisa],
Transformer-Based Attention Networks for Continuous Pixel-Wise
Prediction,
ICCV21(16249-16259)
IEEE DOI
2203
Correlation, Estimation, Logic gates,
Transformers, Natural language processing,
Vision applications and systems
BibRef
Kim, K.[Kyungmin],
Wu, B.C.[Bi-Chen],
Dai, X.L.[Xiao-Liang],
Zhang, P.Z.[Pei-Zhao],
Yan, Z.C.[Zhi-Cheng],
Vajda, P.[Peter],
Kim, S.[Seon],
Rethinking the Self-Attention in Vision Transformers,
ECV21(3065-3069)
IEEE DOI
2109
Computational modeling
BibRef
Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Video Transformers .