Hu, H.Q.[Hao-Qi],
Lu, X.F.[Xiao-Feng],
Zhang, X.P.[Xin-Peng],
Zhang, T.X.[Tian-Xing],
Sun, G.L.[Guang-Ling],
Inheritance Attention Matrix-Based Universal Adversarial
Perturbations on Vision Transformers,
SPLetters(28), 2021, pp. 1923-1927.
IEEE DOI
2110
Perturbation methods, Robustness, Visualization, Transformers,
Optimization, Task analysis, Head, Vision Transformers, self-attention
BibRef
Xue, Z.X.[Zhi-Xiang],
Tan, X.[Xiong],
Yu, X.[Xuchu],
Liu, B.[Bing],
Yu, A.[Anzhu],
Zhang, P.Q.[Peng-Qiang],
Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data
Classification,
IP(31), 2022, pp. 3095-3110.
IEEE DOI
2205
Feature extraction, Transformers, Hyperspectral imaging,
Laser radar, Data mining, Collaboration, Data models,
cross attention fusion
BibRef
Heo, J.[Jiseong],
Wang, Y.[Yooseung],
Park, J.[Jihun],
Occlusion-aware spatial attention transformer for occluded object
recognition,
PRL(159), 2022, pp. 70-76.
Elsevier DOI
2206
Occluded object recognition, Visual transformer, Spatial attention
BibRef
Yu, X.H.[Xiao-Han],
Wang, J.[Jun],
Zhao, Y.[Yang],
Gao, Y.S.[Yong-Sheng],
Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained
visual categorization,
PR(135), 2023, pp. 109131.
Elsevier DOI
2212
Ultra-fine-grained visual categorization, Vision transformer,
Self-supervised learning, Attentive mixing
BibRef
Wu, G.[Gaojie],
Zheng, W.S.[Wei-Shi],
Lu, Y.T.[Yu-Tong],
Tian, Q.[Qi],
PSLT: A Light-Weight Vision Transformer With Ladder Self-Attention
and Progressive Shift,
PAMI(45), No. 9, September 2023, pp. 11120-11135.
IEEE DOI
2309
BibRef
Li, K.C.[Kun-Chang],
Wang, Y.[Yali],
Zhang, J.H.[Jun-Hao],
Gao, P.[Peng],
Song, G.[Guanglu],
Liu, Y.[Yu],
Li, H.S.[Hong-Sheng],
Qiao, Y.[Yu],
UniFormer: Unifying Convolution and Self-Attention for Visual
Recognition,
PAMI(45), No. 10, October 2023, pp. 12581-12600.
IEEE DOI
2310
Unify CNN and Transformers
BibRef
Li, H.L.[Hao-Ling],
Xue, M.Q.[Meng-Qi],
Song, J.[Jie],
Zhang, H.F.[Hao-Fei],
Huang, W.Q.[Wen-Qi],
Liang, L.[Lingyu],
Song, M.L.[Ming-Li],
Constituent Attention for Vision Transformers,
CVIU(237), 2023, pp. 103838.
Elsevier DOI Code:
WWW Link.
2311
Vision Transformer, Attention mechanism, Classification,
Interpretability for deep learning
BibRef
Qin, R.[Ruiru],
Wang, C.Z.[Chuan-Zhi],
Wu, Y.M.[Yong-Mei],
Du, H.[Huafei],
Lv, M.Y.[Ming-Yun],
A U-Shaped Convolution-Aided Transformer with Double Attention for
Hyperspectral Image Classification,
RS(16), No. 2, 2024, pp. 288.
DOI Link
2402
BibRef
Wang, W.X.[Wen-Xiao],
Chen, W.[Wei],
Qiu, Q.[Qibo],
Chen, L.[Long],
Wu, B.[Boxi],
Lin, B.B.[Bin-Bin],
He, X.F.[Xiao-Fei],
Liu, W.[Wei],
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale
Attention,
PAMI(46), No. 5, May 2024, pp. 3123-3136.
IEEE DOI
2404
Transformers, Task analysis, Feature extraction, Visualization,
Object detection, Costs, Adaptation models, Image classification,
vision transformer
BibRef
Zhang, Q.M.[Qi-Ming],
Zhang, J.[Jing],
Xu, Y.F.[Yu-Fei],
Tao, D.C.[Da-Cheng],
Vision Transformer With Quadrangle Attention,
PAMI(46), No. 5, May 2024, pp. 3608-3624.
IEEE DOI
2404
Transformers, Task analysis, Shape, Feature extraction,
Adaptation models, Semantic segmentation, vision transformer
BibRef
Huang, L.[Lan],
Bai, X.Y.[Xing-Yu],
Zeng, J.[Jia],
Yu, M.Q.[Meng-Qiang],
Pang, W.[Wei],
Wang, K.P.[Kang-Ping],
FAM: Improving columnar vision transformer with feature attention
mechanism,
CVIU(242), 2024, pp. 103981.
Elsevier DOI
2404
Vision transformer, Feature adjustment, Network structure improvement
BibRef
Li, M.X.[Ming-Xiu],
Yu, W.[Wei],
Liu, Q.L.[Qing-Lin],
Li, Z.L.[Zong-Lin],
Li, R.[Ru],
Zhong, B.[Bineng],
Zhang, S.P.[Sheng-Ping],
Hybrid Transformers With Attention-Guided Spatial Embeddings for
Makeup Transfer and Removal,
CirSysVideo(34), No. 4, April 2024, pp. 2876-2890.
IEEE DOI
2404
Faces, Feature extraction, Semantics, Transformers, Shape,
Image color analysis, Data mining, Makeup transfer, makeup removal,
vision transformer
BibRef
Nie, X.S.[Xue-Song],
Chen, X.[Xi],
Jin, H.Y.[Hao-Yuan],
Zhu, Z.H.[Zhi-Hang],
Yan, Y.F.[Yun-Feng],
Qi, D.L.[Dong-Lian],
Triplet Attention Transformer for Spatiotemporal Predictive Learning,
WACV24(7021-7030)
IEEE DOI
2404
Computational modeling, Self-supervised learning,
Predictive models, Parallel processing, Transformers, and algorithms
BibRef
Cai, H.[Han],
Li, J.[Junyan],
Hu, M.[Muyan],
Gan, C.[Chuang],
Han, S.[Song],
EfficientViT: Lightweight Multi-Scale Attention for High-Resolution
Dense Prediction,
ICCV23(17256-17267)
IEEE DOI
2401
BibRef
Ryu, J.[Jongbin],
Han, D.Y.[Dong-Yoon],
Lim, J.W.[Jong-Woo],
Gramian Attention Heads are Strong yet Efficient Vision Learners,
ICCV23(5818-5828)
IEEE DOI Code:
WWW Link.
2401
BibRef
Xu, R.H.[Rui-Han],
Zhang, H.[Haokui],
Hu, W.Z.[Wen-Ze],
Zhang, S.L.[Shi-Liang],
Wang, X.Y.[Xiao-Yu],
ParCNetV2: Oversized Kernel with Enhanced Attention*,
ICCV23(5729-5739)
IEEE DOI Code:
WWW Link.
2401
BibRef
Zhao, B.Y.[Bing-Yin],
Yu, Z.[Zhiding],
Lan, S.Y.[Shi-Yi],
Cheng, Y.[Yutao],
Anandkumar, A.[Anima],
Lao, Y.J.[Ying-Jie],
Alvarez, J.M.[Jose M.],
Fully Attentional Networks with Self-emerging Token Labeling,
ICCV23(5562-5572)
IEEE DOI
2401
BibRef
Guo, Y.[Yong],
Stutz, D.[David],
Schiele, B.[Bernt],
Robustifying Token Attention for Vision Transformers,
ICCV23(17511-17522)
IEEE DOI Code:
WWW Link.
2401
BibRef
Zhao, Y.[Youpeng],
Tang, H.D.[Hua-Dong],
Jiang, Y.Y.[Ying-Ying],
A, Y.[Yong],
Wu, Q.[Qiang],
Wang, J.[Jun],
Parameter-Efficient Vision Transformer with Linear Attention,
ICIP23(1275-1279)
IEEE DOI
2312
BibRef
Shi, L.[Lili],
Huang, H.D.[Hai-Duo],
Song, B.[Bowei],
Tan, M.[Meng],
Zhao, W.Z.[Wen-Zhe],
Xia, T.[Tian],
Ren, P.J.[Peng-Ju],
TAQ: Top-K Attention-Aware Quantization for Vision Transformers,
ICIP23(1750-1754)
IEEE DOI
2312
BibRef
Baili, N.[Nada],
Frigui, H.[Hichem],
ADA-VIT: Attention-Guided Data Augmentation for Vision Transformers,
ICIP23(385-389)
IEEE DOI
2312
BibRef
Ding, M.Y.[Ming-Yu],
Shen, Y.[Yikang],
Fan, L.J.[Li-Jie],
Chen, Z.F.[Zhen-Fang],
Chen, Z.[Zitian],
Luo, P.[Ping],
Tenenbaum, J.[Josh],
Gan, C.[Chuang],
Visual Dependency Transformers:
Dependency Tree Emerges from Reversed Attention,
CVPR23(14528-14539)
IEEE DOI
2309
BibRef
Song, J.C.[Jie-Chong],
Mou, C.[Chong],
Wang, S.Q.[Shi-Qi],
Ma, S.W.[Si-Wei],
Zhang, J.[Jian],
Optimization-Inspired Cross-Attention Transformer for Compressive
Sensing,
CVPR23(6174-6184)
IEEE DOI
2309
BibRef
Hassani, A.[Ali],
Walton, S.[Steven],
Li, J.C.[Jia-Chen],
Li, S.[Shen],
Shi, H.[Humphrey],
Neighborhood Attention Transformer,
CVPR23(6185-6194)
IEEE DOI
2309
BibRef
Liu, Z.J.[Zhi-Jian],
Yang, X.Y.[Xin-Yu],
Tang, H.T.[Hao-Tian],
Yang, S.[Shang],
Han, S.[Song],
FlatFormer: Flattened Window Attention for Efficient Point Cloud
Transformer,
CVPR23(1200-1211)
IEEE DOI
2309
BibRef
Pan, X.[Xuran],
Ye, T.Z.[Tian-Zhu],
Xia, Z.F.[Zhuo-Fan],
Song, S.[Shiji],
Huang, G.[Gao],
Slide-Transformer: Hierarchical Vision Transformer with Local
Self-Attention,
CVPR23(2082-2091)
IEEE DOI
2309
BibRef
Zhu, L.[Lei],
Wang, X.J.[Xin-Jiang],
Ke, Z.H.[Zhang-Han],
Zhang, W.[Wayne],
Lau, R.[Rynson],
BiFormer: Vision Transformer with Bi-Level Routing Attention,
CVPR23(10323-10333)
IEEE DOI
2309
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Pi, J.[Jimin],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Beyond Attentive Tokens: Incorporating Token Importance and Diversity
for Efficient Vision Transformers,
CVPR23(10334-10343)
IEEE DOI
2309
BibRef
Liu, X.Y.[Xin-Yu],
Peng, H.[Houwen],
Zheng, N.X.[Ning-Xin],
Yang, Y.Q.[Yu-Qing],
Hu, H.[Han],
Yuan, Y.X.[Yi-Xuan],
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group
Attention,
CVPR23(14420-14430)
IEEE DOI
2309
BibRef
You, H.R.[Hao-Ran],
Xiong, Y.[Yunyang],
Dai, X.L.[Xiao-Liang],
Wu, B.[Bichen],
Zhang, P.Z.[Pei-Zhao],
Fan, H.Q.[Hao-Qi],
Vajda, P.[Peter],
Lin, Y.Y.C.[Ying-Yan Celine],
Castling-ViT: Compressing Self-Attention via Switching Towards
Linear-Angular Attention at Vision Transformer Inference,
CVPR23(14431-14442)
IEEE DOI
2309
BibRef
Grainger, R.[Ryan],
Paniagua, T.[Thomas],
Song, X.[Xi],
Cuntoor, N.[Naresh],
Lee, M.W.[Mun Wai],
Wu, T.F.[Tian-Fu],
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers,
CVPR23(18568-18578)
IEEE DOI
2309
BibRef
Wei, C.[Cong],
Duke, B.[Brendan],
Jiang, R.[Ruowei],
Aarabi, P.[Parham],
Taylor, G.W.[Graham W.],
Shkurti, F.[Florian],
Sparsifiner: Learning Sparse Instance-Dependent Attention for
Efficient Vision Transformers,
CVPR23(22680-22689)
IEEE DOI
2309
BibRef
Bhattacharyya, M.[Mayukh],
Chattopadhyay, S.[Soumitri],
Nag, S.[Sayan],
DeCAtt: Efficient Vision Transformers with Decorrelated Attention
Heads,
ECV23(4695-4699)
IEEE DOI
2309
BibRef
Tatsunami, Y.[Yuki],
Taki, M.[Masato],
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
Locality?,
ACCV22(VI:459-475).
Springer DOI
2307
WWW Link. Address computational comlexity.
BibRef
Bolya, D.[Daniel],
Fu, C.Y.[Cheng-Yang],
Dai, X.L.[Xiao-Liang],
Zhang, P.Z.[Pei-Zhao],
Hoffman, J.[Judy],
Hydra Attention: Efficient Attention with Many Heads,
CADK22(35-49).
Springer DOI
2304
Transformers computation explodes with large images. Multiple heads.
BibRef
Chen, X.Y.[Xiang-Yu],
Hu, Q.[Qinghao],
Li, K.[Kaidong],
Zhong, C.[Cuncong],
Wang, G.H.[Guang-Hui],
Accumulated Trivial Attention Matters in Vision Transformers on Small
Datasets,
WACV23(3973-3981)
IEEE DOI
2302
Codes, Focusing, Transformers, Convolutional neural networks,
Task analysis, Algorithms: Machine learning architectures,
and algorithms (including transfer)
BibRef
Lan, H.[Hai],
Wang, X.[Xihao],
Shen, H.[Hao],
Liang, P.[Peidong],
Wei, X.[Xian],
Couplformer: Rethinking Vision Transformer with Coupling Attention,
WACV23(6464-6473)
IEEE DOI
2302
Couplings, Visualization, Image segmentation,
Computational modeling, Memory management, Object detection,
Visualization
BibRef
Debnath, B.[Biplob],
Po, O.[Oliver],
Chowdhury, F.A.[Farhan Asif],
Chakradhar, S.[Srimat],
Cosine Similarity based Few-Shot Video Classifier with
Attention-based Aggregation,
ICPR22(1273-1279)
IEEE DOI
2212
Training, Head, Pipelines, Benchmark testing, Feature extraction,
Transformers
BibRef
Mari, C.R.[Carlos Roig],
Gonzalez, D.V.[David Varas],
Bou-Balust, E.[Elisenda],
Multi-Scale Transformer-Based Feature Combination for Image Retrieval,
ICIP22(3166-3170)
IEEE DOI
2211
Visualization, Semantics, Image retrieval, Feature extraction,
Transformers, Internet, Image retrieval, Attention, Multi-scale,
Feature combination
BibRef
Furukawa, R.[Ryouichi],
Hotta, K.[Kazuhiro],
Local Embedding for Axial Attention,
ICIP22(2586-2590)
IEEE DOI
2211
Deep learning, Image segmentation, Visualization,
Computational modeling, Neural networks, Transformers.
BibRef
Ding, M.Y.[Ming-Yu],
Xiao, B.[Bin],
Codella, N.[Noel],
Luo, P.[Ping],
Wang, J.D.[Jing-Dong],
Yuan, L.[Lu],
DaViT: Dual Attention Vision Transformers,
ECCV22(XXIV:74-92).
Springer DOI
2211
BibRef
Wang, P.C.[Pi-Chao],
Wang, X.[Xue],
Wang, F.[Fan],
Lin, M.[Ming],
Chang, S.N.[Shu-Ning],
Li, H.[Hao],
Jin, R.[Rong],
KVT: k-NN Attention for Boosting Vision Transformers,
ECCV22(XXIV:285-302).
Springer DOI
2211
BibRef
Rao, Y.M.[Yong-Ming],
Zhao, W.L.[Wen-Liang],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
AMixer:
Adaptive Weight Mixing for Self-Attention Free Vision Transformers,
ECCV22(XXI:50-67).
Springer DOI
2211
BibRef
Li, A.[Ang],
Jiao, J.[Jichao],
Li, N.[Ning],
Qi, W.[Wangjing],
Xu, W.[Wei],
Pang, M.[Min],
Conmw Transformer: A General Vision Transformer Backbone With
Merged-Window Attention,
ICIP22(1551-1555)
IEEE DOI
2211
Image resolution, Convolution, Transformers, Feature extraction, Tokenization,
Computational efficiency, Vision Transformer, hybrid architecture
BibRef
Zhang, Q.M.[Qi-Ming],
Xu, Y.F.[Yu-Fei],
Zhang, J.[Jing],
Tao, D.C.[Da-Cheng],
VSA: Learning Varied-Size Window Attention in Vision Transformers,
ECCV22(XXV:466-483).
Springer DOI
2211
BibRef
Mallick, R.[Rupayan],
Benois-Pineau, J.[Jenny],
Zemmari, A.[Akka],
I Saw: A Self-Attention Weighted Method for Explanation of Visual
Transformers,
ICIP22(3271-3275)
IEEE DOI
2211
Measurement, Correlation coefficient, Visualization,
Image segmentation, Databases, Object detection, Transformers,
Gaze Fixation Density Maps
BibRef
Song, Z.K.[Zi-Kai],
Yu, J.Q.[Jun-Qing],
Chen, Y.P.P.[Yi-Ping Phoebe],
Yang, W.[Wei],
Transformer Tracking with Cyclic Shifting Window Attention,
CVPR22(8781-8790)
IEEE DOI
2210
WWW Link. Visualization, Target tracking, Image recognition,
Optimization methods, Benchmark testing
BibRef
Yang, C.L.[Cheng-Lin],
Wang, Y.L.[Yi-Lin],
Zhang, J.M.[Jian-Ming],
Zhang, H.[He],
Wei, Z.J.[Zi-Jun],
Lin, Z.[Zhe],
Yuille, A.L.[Alan L.],
Lite Vision Transformer with Enhanced Self-Attention,
CVPR22(11988-11998)
IEEE DOI
2210
Convolutional codes, Image segmentation, Visualization,
Convolution, Semantics, Merging, Predictive models,
Deep learning architectures and techniques
BibRef
Xia, Z.F.[Zhuo-Fan],
Pan, X.[Xuran],
Song, S.[Shiji],
Li, L.E.[Li Erran],
Huang, G.[Gao],
Vision Transformer with Deformable Attention,
CVPR22(4784-4793)
IEEE DOI
2210
Deformable models, Adaptation models, Computational modeling,
Predictive models, Transformers, Data models,
grouping and shape analysis
BibRef
Yu, T.[Tong],
Khalitov, R.[Ruslan],
Cheng, L.[Lei],
Yang, Z.R.[Zhi-Rong],
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better
than Dot-Product Self-Attention,
CVPR22(681-690)
IEEE DOI
2210
Protocols, Costs, Scalability, Neural networks, Stacking, Genomics,
Transformers, Deep learning architectures and techniques,
Representation learning
BibRef
Cheng, B.[Bowen],
Misra, I.[Ishan],
Schwing, A.G.[Alexander G.],
Kirillov, A.[Alexander],
Girdhar, R.[Rohit],
Masked-attention Mask Transformer for Universal Image Segmentation,
CVPR22(1280-1289)
IEEE DOI
2210
Image segmentation, Shape, Computational modeling, Semantics,
Transformers, Feature extraction, retrieval
BibRef
Rangrej, S.B.[Samrudhdhi B.],
Srinidhi, C.L.[Chetan L.],
Clark, J.J.[James J.],
Consistency driven Sequential Transformers Attention Model for
Partially Observable Scenes,
CVPR22(2508-2517)
IEEE DOI
2210
Training, Computational modeling, Imaging, Predictive models,
Transformers, Prediction algorithms, Visual reasoning
BibRef
Chen, C.F.R.[Chun-Fu Richard],
Fan, Q.F.[Quan-Fu],
Panda, R.[Rameswar],
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
Classification,
ICCV21(347-356)
IEEE DOI
2203
Image segmentation, Image recognition, Computational modeling,
Semantics, Memory management, Object detection, Representation learning
BibRef
Chefer, H.[Hila],
Gur, S.[Shir],
Wolf, L.B.[Lior B.],
Generic Attention-model Explainability for Interpreting Bi-Modal and
Encoder-Decoder Transformers,
ICCV21(387-396)
IEEE DOI
2203
Measurement, Visualization, Image segmentation,
Computational modeling, Object detection,
BibRef
Xu, W.J.[Wei-Jian],
Xu, Y.F.[Yi-Fan],
Chang, T.[Tyler],
Tu, Z.W.[Zhuo-Wen],
Co-Scale Conv-Attentional Image Transformers,
ICCV21(9961-9970)
IEEE DOI
2203
Image segmentation, Computational modeling, Object detection,
Transformers, Convolutional neural networks, Task analysis,
Recognition and classification
BibRef
Yang, G.L.[Guang-Lei],
Tang, H.[Hao],
Ding, M.L.[Ming-Li],
Sebe, N.[Nicu],
Ricci, E.[Elisa],
Transformer-Based Attention Networks for Continuous Pixel-Wise
Prediction,
ICCV21(16249-16259)
IEEE DOI
2203
Correlation, Estimation, Logic gates,
Transformers, Natural language processing,
Vision applications and systems
BibRef
Kim, K.[Kyungmin],
Wu, B.C.[Bi-Chen],
Dai, X.L.[Xiao-Liang],
Zhang, P.Z.[Pei-Zhao],
Yan, Z.C.[Zhi-Cheng],
Vajda, P.[Peter],
Kim, S.[Seon],
Rethinking the Self-Attention in Vision Transformers,
ECV21(3065-3069)
IEEE DOI
2109
Computational modeling, Pattern recognition
BibRef
Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Video Transformers .