14.5.9.6.1 Attention in Vision Transformers

Chapter Contents (Back)
Vision Transformers. Transformers.

Hu, H.Q.[Hao-Qi], Lu, X.F.[Xiao-Feng], Zhang, X.P.[Xin-Peng], Zhang, T.X.[Tian-Xing], Sun, G.L.[Guang-Ling],
Inheritance Attention Matrix-Based Universal Adversarial Perturbations on Vision Transformers,
SPLetters(28), 2021, pp. 1923-1927.
IEEE DOI 2110
Perturbation methods, Robustness, Visualization, Transformers, Optimization, Task analysis, Head, Vision Transformers, self-attention BibRef

Xue, Z.X.[Zhi-Xiang], Tan, X.[Xiong], Yu, X.[Xuchu], Liu, B.[Bing], Yu, A.[Anzhu], Zhang, P.Q.[Peng-Qiang],
Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification,
IP(31), 2022, pp. 3095-3110.
IEEE DOI 2205
Feature extraction, Transformers, Hyperspectral imaging, Laser radar, Data mining, Collaboration, Data models, cross attention fusion BibRef

Heo, J.[Jiseong], Wang, Y.[Yooseung], Park, J.[Jihun],
Occlusion-aware spatial attention transformer for occluded object recognition,
PRL(159), 2022, pp. 70-76.
Elsevier DOI 2206
Occluded object recognition, Visual transformer, Spatial attention BibRef

Yu, X.H.[Xiao-Han], Wang, J.[Jun], Zhao, Y.[Yang], Gao, Y.S.[Yong-Sheng],
Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization,
PR(135), 2023, pp. 109131.
Elsevier DOI 2212
Ultra-fine-grained visual categorization, Vision transformer, Self-supervised learning, Attentive mixing BibRef

Wu, G.[Gaojie], Zheng, W.S.[Wei-Shi], Lu, Y.T.[Yu-Tong], Tian, Q.[Qi],
PSLT: A Light-Weight Vision Transformer With Ladder Self-Attention and Progressive Shift,
PAMI(45), No. 9, September 2023, pp. 11120-11135.
IEEE DOI 2309
BibRef

Li, K.C.[Kun-Chang], Wang, Y.[Yali], Zhang, J.H.[Jun-Hao], Gao, P.[Peng], Song, G.[Guanglu], Liu, Y.[Yu], Li, H.S.[Hong-Sheng], Qiao, Y.[Yu],
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition,
PAMI(45), No. 10, October 2023, pp. 12581-12600.
IEEE DOI 2310
Unify CNN and Transformers BibRef

Li, H.L.[Hao-Ling], Xue, M.Q.[Meng-Qi], Song, J.[Jie], Zhang, H.F.[Hao-Fei], Huang, W.Q.[Wen-Qi], Liang, L.[Lingyu], Song, M.L.[Ming-Li],
Constituent Attention for Vision Transformers,
CVIU(237), 2023, pp. 103838.
Elsevier DOI Code:
WWW Link. 2311
Vision Transformer, Attention mechanism, Classification, Interpretability for deep learning BibRef

Qin, R.[Ruiru], Wang, C.Z.[Chuan-Zhi], Wu, Y.M.[Yong-Mei], Du, H.[Huafei], Lv, M.Y.[Ming-Yun],
A U-Shaped Convolution-Aided Transformer with Double Attention for Hyperspectral Image Classification,
RS(16), No. 2, 2024, pp. 288.
DOI Link 2402
BibRef

Wang, W.X.[Wen-Xiao], Chen, W.[Wei], Qiu, Q.[Qibo], Chen, L.[Long], Wu, B.[Boxi], Lin, B.B.[Bin-Bin], He, X.F.[Xiao-Fei], Liu, W.[Wei],
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention,
PAMI(46), No. 5, May 2024, pp. 3123-3136.
IEEE DOI 2404
Transformers, Task analysis, Feature extraction, Visualization, Object detection, Costs, Adaptation models, Image classification, vision transformer BibRef

Zhang, Q.M.[Qi-Ming], Zhang, J.[Jing], Xu, Y.F.[Yu-Fei], Tao, D.C.[Da-Cheng],
Vision Transformer With Quadrangle Attention,
PAMI(46), No. 5, May 2024, pp. 3608-3624.
IEEE DOI 2404
Transformers, Task analysis, Shape, Feature extraction, Adaptation models, Semantic segmentation, vision transformer BibRef

Huang, L.[Lan], Bai, X.Y.[Xing-Yu], Zeng, J.[Jia], Yu, M.Q.[Meng-Qiang], Pang, W.[Wei], Wang, K.P.[Kang-Ping],
FAM: Improving columnar vision transformer with feature attention mechanism,
CVIU(242), 2024, pp. 103981.
Elsevier DOI 2404
Vision transformer, Feature adjustment, Network structure improvement BibRef

Li, M.X.[Ming-Xiu], Yu, W.[Wei], Liu, Q.L.[Qing-Lin], Li, Z.L.[Zong-Lin], Li, R.[Ru], Zhong, B.[Bineng], Zhang, S.P.[Sheng-Ping],
Hybrid Transformers With Attention-Guided Spatial Embeddings for Makeup Transfer and Removal,
CirSysVideo(34), No. 4, April 2024, pp. 2876-2890.
IEEE DOI 2404
Faces, Feature extraction, Semantics, Transformers, Shape, Image color analysis, Data mining, Makeup transfer, makeup removal, vision transformer BibRef


Yang, X.[Xuan], Yuan, L.Z.[Liang-Zhe], Wilber, K.[Kimberly], Sharma, A.[Astuti], Gu, X.Y.[Xiu-Ye], Qiao, S.Y.[Si-Yuan], Debats, S.[Stephanie], Wang, H.S.[Hui-Sheng], Adam, H.[Hartwig], Sirotenko, M.[Mikhail], Chen, L.C.[Liang-Chieh],
PolyMaX: General Dense Prediction with Mask Transformer,
WACV24(1039-1050)
IEEE DOI 2404
Codes, Image synthesis, Semantic segmentation, Estimation, Computer architecture, Benchmark testing, Algorithms, Image recognition and understanding BibRef

Nie, X.S.[Xue-Song], Chen, X.[Xi], Jin, H.Y.[Hao-Yuan], Zhu, Z.H.[Zhi-Hang], Yan, Y.F.[Yun-Feng], Qi, D.L.[Dong-Lian],
Triplet Attention Transformer for Spatiotemporal Predictive Learning,
WACV24(7021-7030)
IEEE DOI 2404
Computational modeling, Self-supervised learning, Predictive models, Parallel processing, Transformers, and algorithms BibRef

Cai, H.[Han], Li, J.[Junyan], Hu, M.[Muyan], Gan, C.[Chuang], Han, S.[Song],
EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction,
ICCV23(17256-17267)
IEEE DOI 2401
BibRef

Ryu, J.[Jongbin], Han, D.Y.[Dong-Yoon], Lim, J.W.[Jong-Woo],
Gramian Attention Heads are Strong yet Efficient Vision Learners,
ICCV23(5818-5828)
IEEE DOI Code:
WWW Link. 2401
BibRef

Xu, R.H.[Rui-Han], Zhang, H.[Haokui], Hu, W.Z.[Wen-Ze], Zhang, S.L.[Shi-Liang], Wang, X.Y.[Xiao-Yu],
ParCNetV2: Oversized Kernel with Enhanced Attention*,
ICCV23(5729-5739)
IEEE DOI Code:
WWW Link. 2401
BibRef

Zhao, B.Y.[Bing-Yin], Yu, Z.[Zhiding], Lan, S.Y.[Shi-Yi], Cheng, Y.[Yutao], Anandkumar, A.[Anima], Lao, Y.J.[Ying-Jie], Alvarez, J.M.[Jose M.],
Fully Attentional Networks with Self-emerging Token Labeling,
ICCV23(5562-5572)
IEEE DOI 2401
BibRef

Guo, Y.[Yong], Stutz, D.[David], Schiele, B.[Bernt],
Robustifying Token Attention for Vision Transformers,
ICCV23(17511-17522)
IEEE DOI Code:
WWW Link. 2401
BibRef

Zhao, Y.[Youpeng], Tang, H.D.[Hua-Dong], Jiang, Y.Y.[Ying-Ying], A, Y.[Yong], Wu, Q.[Qiang], Wang, J.[Jun],
Parameter-Efficient Vision Transformer with Linear Attention,
ICIP23(1275-1279)
IEEE DOI 2312
BibRef

Shi, L.[Lili], Huang, H.D.[Hai-Duo], Song, B.[Bowei], Tan, M.[Meng], Zhao, W.Z.[Wen-Zhe], Xia, T.[Tian], Ren, P.J.[Peng-Ju],
TAQ: Top-K Attention-Aware Quantization for Vision Transformers,
ICIP23(1750-1754)
IEEE DOI 2312
BibRef

Baili, N.[Nada], Frigui, H.[Hichem],
ADA-VIT: Attention-Guided Data Augmentation for Vision Transformers,
ICIP23(385-389)
IEEE DOI 2312
BibRef

Ding, M.Y.[Ming-Yu], Shen, Y.[Yikang], Fan, L.J.[Li-Jie], Chen, Z.F.[Zhen-Fang], Chen, Z.[Zitian], Luo, P.[Ping], Tenenbaum, J.[Josh], Gan, C.[Chuang],
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention,
CVPR23(14528-14539)
IEEE DOI 2309
BibRef

Song, J.C.[Jie-Chong], Mou, C.[Chong], Wang, S.Q.[Shi-Qi], Ma, S.W.[Si-Wei], Zhang, J.[Jian],
Optimization-Inspired Cross-Attention Transformer for Compressive Sensing,
CVPR23(6174-6184)
IEEE DOI 2309
BibRef

Hassani, A.[Ali], Walton, S.[Steven], Li, J.C.[Jia-Chen], Li, S.[Shen], Shi, H.[Humphrey],
Neighborhood Attention Transformer,
CVPR23(6185-6194)
IEEE DOI 2309
BibRef

Liu, Z.J.[Zhi-Jian], Yang, X.Y.[Xin-Yu], Tang, H.T.[Hao-Tian], Yang, S.[Shang], Han, S.[Song],
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer,
CVPR23(1200-1211)
IEEE DOI 2309
BibRef

Pan, X.[Xuran], Ye, T.Z.[Tian-Zhu], Xia, Z.F.[Zhuo-Fan], Song, S.[Shiji], Huang, G.[Gao],
Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention,
CVPR23(2082-2091)
IEEE DOI 2309
BibRef

Zhu, L.[Lei], Wang, X.J.[Xin-Jiang], Ke, Z.H.[Zhang-Han], Zhang, W.[Wayne], Lau, R.[Rynson],
BiFormer: Vision Transformer with Bi-Level Routing Attention,
CVPR23(10323-10333)
IEEE DOI 2309
BibRef

Long, S.[Sifan], Zhao, Z.[Zhen], Pi, J.[Jimin], Wang, S.S.[Sheng-Sheng], Wang, J.D.[Jing-Dong],
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers,
CVPR23(10334-10343)
IEEE DOI 2309
BibRef

Liu, X.Y.[Xin-Yu], Peng, H.[Houwen], Zheng, N.X.[Ning-Xin], Yang, Y.Q.[Yu-Qing], Hu, H.[Han], Yuan, Y.X.[Yi-Xuan],
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention,
CVPR23(14420-14430)
IEEE DOI 2309
BibRef

You, H.R.[Hao-Ran], Xiong, Y.[Yunyang], Dai, X.L.[Xiao-Liang], Wu, B.[Bichen], Zhang, P.Z.[Pei-Zhao], Fan, H.Q.[Hao-Qi], Vajda, P.[Peter], Lin, Y.Y.C.[Ying-Yan Celine],
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference,
CVPR23(14431-14442)
IEEE DOI 2309
BibRef

Grainger, R.[Ryan], Paniagua, T.[Thomas], Song, X.[Xi], Cuntoor, N.[Naresh], Lee, M.W.[Mun Wai], Wu, T.F.[Tian-Fu],
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers,
CVPR23(18568-18578)
IEEE DOI 2309
BibRef

Wei, C.[Cong], Duke, B.[Brendan], Jiang, R.[Ruowei], Aarabi, P.[Parham], Taylor, G.W.[Graham W.], Shkurti, F.[Florian],
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers,
CVPR23(22680-22689)
IEEE DOI 2309
BibRef

Bhattacharyya, M.[Mayukh], Chattopadhyay, S.[Soumitri], Nag, S.[Sayan],
DeCAtt: Efficient Vision Transformers with Decorrelated Attention Heads,
ECV23(4695-4699)
IEEE DOI 2309
BibRef

Tatsunami, Y.[Yuki], Taki, M.[Masato],
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?,
ACCV22(VI:459-475).
Springer DOI 2307

WWW Link. Address computational comlexity. BibRef

Bolya, D.[Daniel], Fu, C.Y.[Cheng-Yang], Dai, X.L.[Xiao-Liang], Zhang, P.Z.[Pei-Zhao], Hoffman, J.[Judy],
Hydra Attention: Efficient Attention with Many Heads,
CADK22(35-49).
Springer DOI 2304
Transformers computation explodes with large images. Multiple heads. BibRef

Chen, X.Y.[Xiang-Yu], Hu, Q.[Qinghao], Li, K.[Kaidong], Zhong, C.[Cuncong], Wang, G.H.[Guang-Hui],
Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets,
WACV23(3973-3981)
IEEE DOI 2302
Codes, Focusing, Transformers, Convolutional neural networks, Task analysis, Algorithms: Machine learning architectures, and algorithms (including transfer) BibRef

Lan, H.[Hai], Wang, X.[Xihao], Shen, H.[Hao], Liang, P.[Peidong], Wei, X.[Xian],
Couplformer: Rethinking Vision Transformer with Coupling Attention,
WACV23(6464-6473)
IEEE DOI 2302
Couplings, Visualization, Image segmentation, Computational modeling, Memory management, Object detection, Visualization BibRef

Debnath, B.[Biplob], Po, O.[Oliver], Chowdhury, F.A.[Farhan Asif], Chakradhar, S.[Srimat],
Cosine Similarity based Few-Shot Video Classifier with Attention-based Aggregation,
ICPR22(1273-1279)
IEEE DOI 2212
Training, Head, Pipelines, Benchmark testing, Feature extraction, Transformers BibRef

Mari, C.R.[Carlos Roig], Gonzalez, D.V.[David Varas], Bou-Balust, E.[Elisenda],
Multi-Scale Transformer-Based Feature Combination for Image Retrieval,
ICIP22(3166-3170)
IEEE DOI 2211
Visualization, Semantics, Image retrieval, Feature extraction, Transformers, Internet, Image retrieval, Attention, Multi-scale, Feature combination BibRef

Furukawa, R.[Ryouichi], Hotta, K.[Kazuhiro],
Local Embedding for Axial Attention,
ICIP22(2586-2590)
IEEE DOI 2211
Deep learning, Image segmentation, Visualization, Computational modeling, Neural networks, Transformers. BibRef

Ding, M.Y.[Ming-Yu], Xiao, B.[Bin], Codella, N.[Noel], Luo, P.[Ping], Wang, J.D.[Jing-Dong], Yuan, L.[Lu],
DaViT: Dual Attention Vision Transformers,
ECCV22(XXIV:74-92).
Springer DOI 2211
BibRef

Wang, P.C.[Pi-Chao], Wang, X.[Xue], Wang, F.[Fan], Lin, M.[Ming], Chang, S.N.[Shu-Ning], Li, H.[Hao], Jin, R.[Rong],
KVT: k-NN Attention for Boosting Vision Transformers,
ECCV22(XXIV:285-302).
Springer DOI 2211
BibRef

Rao, Y.M.[Yong-Ming], Zhao, W.L.[Wen-Liang], Zhou, J.[Jie], Lu, J.W.[Ji-Wen],
AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers,
ECCV22(XXI:50-67).
Springer DOI 2211
BibRef

Li, A.[Ang], Jiao, J.[Jichao], Li, N.[Ning], Qi, W.[Wangjing], Xu, W.[Wei], Pang, M.[Min],
Conmw Transformer: A General Vision Transformer Backbone With Merged-Window Attention,
ICIP22(1551-1555)
IEEE DOI 2211
Image resolution, Convolution, Transformers, Feature extraction, Tokenization, Computational efficiency, Vision Transformer, hybrid architecture BibRef

Zhang, Q.M.[Qi-Ming], Xu, Y.F.[Yu-Fei], Zhang, J.[Jing], Tao, D.C.[Da-Cheng],
VSA: Learning Varied-Size Window Attention in Vision Transformers,
ECCV22(XXV:466-483).
Springer DOI 2211
BibRef

Mallick, R.[Rupayan], Benois-Pineau, J.[Jenny], Zemmari, A.[Akka],
I Saw: A Self-Attention Weighted Method for Explanation of Visual Transformers,
ICIP22(3271-3275)
IEEE DOI 2211
Measurement, Correlation coefficient, Visualization, Image segmentation, Databases, Object detection, Transformers, Gaze Fixation Density Maps BibRef

Song, Z.K.[Zi-Kai], Yu, J.Q.[Jun-Qing], Chen, Y.P.P.[Yi-Ping Phoebe], Yang, W.[Wei],
Transformer Tracking with Cyclic Shifting Window Attention,
CVPR22(8781-8790)
IEEE DOI 2210

WWW Link. Visualization, Target tracking, Image recognition, Optimization methods, Benchmark testing BibRef

Yang, C.L.[Cheng-Lin], Wang, Y.L.[Yi-Lin], Zhang, J.M.[Jian-Ming], Zhang, H.[He], Wei, Z.J.[Zi-Jun], Lin, Z.[Zhe], Yuille, A.L.[Alan L.],
Lite Vision Transformer with Enhanced Self-Attention,
CVPR22(11988-11998)
IEEE DOI 2210
Convolutional codes, Image segmentation, Visualization, Convolution, Semantics, Merging, Predictive models, Deep learning architectures and techniques BibRef

Xia, Z.F.[Zhuo-Fan], Pan, X.[Xuran], Song, S.[Shiji], Li, L.E.[Li Erran], Huang, G.[Gao],
Vision Transformer with Deformable Attention,
CVPR22(4784-4793)
IEEE DOI 2210
Deformable models, Adaptation models, Computational modeling, Predictive models, Transformers, Data models, grouping and shape analysis BibRef

Yu, T.[Tong], Khalitov, R.[Ruslan], Cheng, L.[Lei], Yang, Z.R.[Zhi-Rong],
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention,
CVPR22(681-690)
IEEE DOI 2210
Protocols, Costs, Scalability, Neural networks, Stacking, Genomics, Transformers, Deep learning architectures and techniques, Representation learning BibRef

Cheng, B.[Bowen], Misra, I.[Ishan], Schwing, A.G.[Alexander G.], Kirillov, A.[Alexander], Girdhar, R.[Rohit],
Masked-attention Mask Transformer for Universal Image Segmentation,
CVPR22(1280-1289)
IEEE DOI 2210
Image segmentation, Shape, Computational modeling, Semantics, Transformers, Feature extraction, retrieval BibRef

Rangrej, S.B.[Samrudhdhi B.], Srinidhi, C.L.[Chetan L.], Clark, J.J.[James J.],
Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes,
CVPR22(2508-2517)
IEEE DOI 2210
Training, Computational modeling, Imaging, Predictive models, Transformers, Prediction algorithms, Visual reasoning BibRef

Chen, C.F.R.[Chun-Fu Richard], Fan, Q.F.[Quan-Fu], Panda, R.[Rameswar],
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification,
ICCV21(347-356)
IEEE DOI 2203
Image segmentation, Image recognition, Computational modeling, Semantics, Memory management, Object detection, Representation learning BibRef

Chefer, H.[Hila], Gur, S.[Shir], Wolf, L.B.[Lior B.],
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers,
ICCV21(387-396)
IEEE DOI 2203
Measurement, Visualization, Image segmentation, Computational modeling, Object detection, BibRef

Xu, W.J.[Wei-Jian], Xu, Y.F.[Yi-Fan], Chang, T.[Tyler], Tu, Z.W.[Zhuo-Wen],
Co-Scale Conv-Attentional Image Transformers,
ICCV21(9961-9970)
IEEE DOI 2203
Image segmentation, Computational modeling, Object detection, Transformers, Convolutional neural networks, Task analysis, Recognition and classification BibRef

Yang, G.L.[Guang-Lei], Tang, H.[Hao], Ding, M.L.[Ming-Li], Sebe, N.[Nicu], Ricci, E.[Elisa],
Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction,
ICCV21(16249-16259)
IEEE DOI 2203
Correlation, Estimation, Logic gates, Transformers, Natural language processing, Vision applications and systems BibRef

Kim, K.[Kyungmin], Wu, B.C.[Bi-Chen], Dai, X.L.[Xiao-Liang], Zhang, P.Z.[Pei-Zhao], Yan, Z.C.[Zhi-Cheng], Vajda, P.[Peter], Kim, S.[Seon],
Rethinking the Self-Attention in Vision Transformers,
ECV21(3065-3069)
IEEE DOI 2109
Computational modeling, Pattern recognition BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Video Transformers .


Last update:May 6, 2024 at 15:50:14