Bazi, Y.[Yakoub],
Bashmal, L.[Laila],
Al Rahhal, M.M.[Mohamad M.],
Al Dayil, R.[Reham],
Al Ajlan, N.[Naif],
Vision Transformers for Remote Sensing Image Classification,
RS(13), No. 3, 2021, pp. xx-yy.
DOI Link
2102
BibRef
Hu, H.Q.[Hao-Qi],
Lu, X.F.[Xiao-Feng],
Zhang, X.P.[Xin-Peng],
Zhang, T.X.[Tian-Xing],
Sun, G.L.[Guang-Ling],
Inheritance Attention Matrix-Based Universal Adversarial
Perturbations on Vision Transformers,
SPLetters(28), 2021, pp. 1923-1927.
IEEE DOI
2110
Perturbation methods, Robustness, Visualization, Transformers,
Optimization, Task analysis, Head, Vision Transformers, self-attention
BibRef
Li, T.[Tao],
Zhang, Z.[Zheng],
Pei, L.[Lishen],
Gan, Y.[Yan],
HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval,
SPLetters(29), 2022, pp. 827-831.
IEEE DOI
2204
Transformers, Binary codes, Task analysis, Training, Image retrieval,
Feature extraction, Databases, Binary embedding, image retrieval
BibRef
Jiang, B.[Bo],
Zhao, K.K.[Kang-Kang],
Tang, J.[Jin],
RGTransformer: Region-Graph Transformer for Image Representation and
Few-Shot Classification,
SPLetters(29), 2022, pp. 792-796.
IEEE DOI
2204
Measurement, Transformers, Image representation,
Feature extraction, Visualization, transformer
BibRef
Chen, Z.M.[Zhao-Min],
Cui, Q.[Quan],
Zhao, B.[Borui],
Song, R.J.[Ren-Jie],
Zhang, X.Q.[Xiao-Qin],
Yoshie, O.[Osamu],
SST: Spatial and Semantic Transformers for Multi-Label Image
Recognition,
IP(31), 2022, pp. 2570-2583.
IEEE DOI
2204
Correlation, Semantics, Transformers, Image recognition,
Task analysis, Training, Feature extraction, label correlation
BibRef
Xue, Z.X.[Zhi-Xiang],
Tan, X.[Xiong],
Yu, X.[Xuchu],
Liu, B.[Bing],
Yu, A.[Anzhu],
Zhang, P.Q.[Peng-Qiang],
Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data
Classification,
IP(31), 2022, pp. 3095-3110.
IEEE DOI
2205
Feature extraction, Transformers, Hyperspectral imaging,
Laser radar, Data mining, Collaboration, Data models,
cross attention fusion
BibRef
Wang, G.H.[Guang-Hui],
Li, B.[Bin],
Zhang, T.[Tao],
Zhang, S.[Shubi],
A Network Combining a Transformer and a Convolutional Neural Network
for Remote Sensing Image Change Detection,
RS(14), No. 9, 2022, pp. xx-yy.
DOI Link
2205
BibRef
Luo, G.[Gen],
Zhou, Y.[Yiyi],
Sun, X.S.[Xiao-Shuai],
Wang, Y.[Yan],
Cao, L.J.[Liu-Juan],
Wu, Y.J.[Yong-Jian],
Huang, F.Y.[Fei-Yue],
Ji, R.R.[Rong-Rong],
Towards Lightweight Transformer Via Group-Wise Transformation for
Vision-and-Language Tasks,
IP(31), 2022, pp. 3386-3398.
IEEE DOI
2205
Transformers, Task analysis, Computational modeling,
Benchmark testing, Visualization, Convolution, Head,
reference expression comprehension
BibRef
Tu, Y.B.[Yun-Bin],
Li, L.[Liang],
Su, L.[Li],
Gao, S.X.[Sheng-Xiang],
Yan, C.G.[Cheng-Gang],
Zha, Z.J.[Zheng-Jun],
Yu, Z.T.[Zheng-Tao],
Huang, Q.M.[Qing-Ming],
I2-Transformer: Intra- and Inter-Relation Embedding Transformer for
TV Show Captioning,
IP(31), 2022, pp. 3565-3577.
IEEE DOI
2206
Transformers, Semantics, Task analysis, Visualization, TV,
Graph neural networks, TV Show captioning, transformer
BibRef
Heo, J.[Jiseong],
Wang, Y.[Yooseung],
Park, J.[Jihun],
Occlusion-aware spatial attention transformer for occluded object
recognition,
PRL(159), 2022, pp. 70-76.
Elsevier DOI
2206
Occluded object recognition, Visual transformer, Spatial attention
BibRef
Wang, J.Y.[Jia-Yun],
Chakraborty, R.[Rudrasis],
Yu, S.X.[Stella X.],
Transformer for 3D Point Clouds,
PAMI(44), No. 8, August 2022, pp. 4419-4431.
IEEE DOI
2207
Convolution, Feature extraction, Shape, Semantics, Task analysis,
Measurement, point cloud, transformation, deformable, segmentation, 3D detection
BibRef
Wang, L.[Libo],
Li, R.[Rui],
Zhang, C.[Ce],
Fang, S.H.[Sheng-Hui],
Duan, C.X.[Chen-Xi],
Meng, X.L.[Xiao-Liang],
Atkinson, P.M.[Peter M.],
UNetFormer: A UNet-like transformer for efficient semantic
segmentation of remote sensing urban scene imagery,
PandRS(190), 2022, pp. 196-214.
Elsevier DOI
2208
Award, U.V. Helava, ISPRS. Semantic Segmentation, Remote Sensing, Vision Transformer,
Fully Transformer Network, Global-local Context, Urban Scene
BibRef
Kheldouni, A.[Amine],
Boumhidi, J.[Jaouad],
A Study of Bidirectional Encoder Representations from Transformers
for Sequential Recommendations,
ISCV22(1-5)
IEEE DOI
2208
Knowledge engineering, Recurrent neural networks,
Predictive models, Markov processes
BibRef
Li, Z.[Zekun],
Liu, Y.F.[Yu-Fan],
Li, B.[Bing],
Feng, B.L.[Bai-Lan],
Wu, K.[Kebin],
Peng, C.W.[Cheng-Wei],
Hu, W.M.[Wei-Ming],
SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image
Prediction,
CirSysVideo(32), No. 9, September 2022, pp. 6160-6173.
IEEE DOI
2209
Transformers, Semantics, Task analysis, Detectors,
Image segmentation, Head, Convolution, Transformer, dense prediction,
multi-level interaction
BibRef
Wu, J.J.[Jia-Jing],
Wei, Z.Q.[Zhi-Qiang],
Zhang, J.P.[Jin-Peng],
Zhang, Y.[Yushi],
Jia, D.N.[Dong-Ning],
Yin, B.[Bo],
Yu, Y.C.[Yun-Chao],
Full-Coupled Convolutional Transformer for Surface-Based Duct
Refractivity Inversion,
RS(14), No. 17, 2022, pp. xx-yy.
DOI Link
2209
BibRef
Dalmaz, O.[Onat],
Yurt, M.[Mahmut],
Çukur, T.[Tolga],
ResViT: Residual Vision Transformers for Multimodal Medical Image
Synthesis,
MedImg(41), No. 10, October 2022, pp. 2598-2614.
IEEE DOI
2210
Transformers, Biomedical imaging, Subspace constraints,
Task analysis, Image synthesis, Magnetic resonance imaging, unified
BibRef
Jiang, K.[Kai],
Peng, P.[Peng],
Lian, Y.[Youzao],
Xu, W.S.[Wei-Sheng],
The encoding method of position embeddings in vision transformer,
JVCIR(89), 2022, pp. 103664.
Elsevier DOI
2212
Vision transformer, Position embeddings, Gabor filters
BibRef
Han, K.[Kai],
Wang, Y.H.[Yun-He],
Chen, H.[Hanting],
Chen, X.[Xinghao],
Guo, J.[Jianyuan],
Liu, Z.H.[Zhen-Hua],
Tang, Y.[Yehui],
Xiao, A.[An],
Xu, C.J.[Chun-Jing],
Xu, Y.X.[Yi-Xing],
Yang, Z.H.[Zhao-Hui],
Zhang, Y.[Yiman],
Tao, D.C.[Da-Cheng],
A Survey on Vision Transformer,
PAMI(45), No. 1, January 2023, pp. 87-110.
IEEE DOI
2212
Survey, Vision Transformer. Transformers, Task analysis, Encoding, Computational modeling,
Visualization, Object detection, high-level vision,
video
BibRef
Hou, Q.[Qibin],
Jiang, Z.[Zihang],
Yuan, L.[Li],
Cheng, M.M.[Ming-Ming],
Yan, S.C.[Shui-Cheng],
Feng, J.S.[Jia-Shi],
Vision Permutator:
A Permutable MLP-Like Architecture for Visual Recognition,
PAMI(45), No. 1, January 2023, pp. 1328-1334.
IEEE DOI
2212
Transformers, Encoding, Visualization, Convolutional codes, Mixers,
Computer architecture, Training data, Vision permutator, deep neural network
BibRef
Wu, Y.H.[Yu-Huan],
Liu, Y.[Yun],
Zhan, X.[Xin],
Cheng, M.M.[Ming-Ming],
P2T: Pyramid Pooling Transformer for Scene Understanding,
PAMI(45), No. 11, November 2023, pp. 12760-12771.
IEEE DOI
2310
BibRef
Zhou, D.[Daquan],
Hou, Q.[Qibin],
Yang, L.J.[Lin-Jie],
Jin, X.J.[Xiao-Jie],
Feng, J.S.[Jia-Shi],
Token Selection is a Simple Booster for Vision Transformers,
PAMI(45), No. 11, November 2023, pp. 12738-12746.
IEEE DOI
2310
BibRef
Yu, X.H.[Xiao-Han],
Wang, J.[Jun],
Zhao, Y.[Yang],
Gao, Y.S.[Yong-Sheng],
Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained
visual categorization,
PR(135), 2023, pp. 109131.
Elsevier DOI
2212
Ultra-fine-grained visual categorization, Vision transformer,
Self-supervised learning, Attentive mixing
BibRef
Li, Y.[Yehao],
Yao, T.[Ting],
Pan, Y.W.[Ying-Wei],
Mei, T.[Tao],
Contextual Transformer Networks for Visual Recognition,
PAMI(45), No. 2, February 2023, pp. 1489-1500.
IEEE DOI
2301
Transformers, Convolution, Visualization, Task analysis,
Image recognition, Object detection, Transformer, image recognition
BibRef
Wang, H.[Hang],
Du, Y.[Youtian],
Zhang, Y.[Yabin],
Li, S.[Shuai],
Zhang, L.[Lei],
One-Stage Visual Relationship Referring With Transformers and
Adaptive Message Passing,
IP(32), 2023, pp. 190-202.
IEEE DOI
2301
Visualization, Proposals, Transformers, Task analysis, Detectors,
Message passing, Predictive models, gated message passing
BibRef
Kim, B.[Boah],
Kim, J.[Jeongsol],
Ye, J.C.[Jong Chul],
Task-Agnostic Vision Transformer for Distributed Learning of Image
Processing,
IP(32), 2023, pp. 203-218.
IEEE DOI
2301
Task analysis, Transformers, Servers, Distance learning,
Computer aided instruction, Tail, Head, Distributed learning,
task-agnostic learning
BibRef
Park, S.[Sangjoon],
Ye, J.C.[Jong Chul],
Multi-Task Distributed Learning Using Vision Transformer With Random
Patch Permutation,
MedImg(42), No. 7, July 2023, pp. 2091-2105.
IEEE DOI
2307
Task analysis, Transformers, Head, Tail, Servers, Multitasking,
Distance learning, Federated learning, split learning,
privacy preservation
BibRef
Kiya, H.[Hitoshi],
Iijima, R.[Ryota],
Maungmaung, A.[Aprilpyone],
Kinoshit, Y.[Yuma],
Image and Model Transformation with Secret Key for Vision Transformer,
IEICE(E106-D), No. 1, January 2023, pp. 2-11.
WWW Link.
2301
BibRef
Lin, X.[Xiao],
Sun, S.Z.[Shu-Zhou],
Huang, W.[Wei],
Sheng, B.[Bin],
Li, P.[Ping],
Feng, D.D.[David Dagan],
EAPT: Efficient Attention Pyramid Transformer for Image Processing,
MultMed(25), 2023, pp. 50-61.
IEEE DOI
2301
Transformers, Encoding, Task analysis, Semantics, Feature extraction,
Costs, Convolutional neural networks, Transformer,
semantic segmentation
BibRef
Mou, C.[Chong],
Zhang, J.[Jian],
TransCL: Transformer Makes Strong and Flexible Compressive Learning,
PAMI(45), No. 4, April 2023, pp. 5236-5251.
IEEE DOI
2303
Task analysis, Transformers, Image reconstruction, Image coding,
Compressed sensing, Sensors, Cameras, Compressed sensing,
semantic segmentation
BibRef
Yuan, L.[Li],
Hou, Q.[Qibin],
Jiang, Z.[Zihang],
Feng, J.S.[Jia-Shi],
Yan, S.C.[Shui-Cheng],
VOLO: Vision Outlooker for Visual Recognition,
PAMI(45), No. 5, May 2023, pp. 6575-6586.
IEEE DOI
2304
Transformers, Computer architecture, Computational modeling,
Training, Data models, Task analysis, Visualization,
image classification
BibRef
Zhang, H.F.[Hao-Fei],
Mao, F.[Feng],
Xue, M.Q.[Meng-Qi],
Fang, G.F.[Gong-Fan],
Feng, Z.L.[Zun-Lei],
Song, J.[Jie],
Song, M.L.[Ming-Li],
Knowledge Amalgamation for Object Detection With Transformers,
IP(32), 2023, pp. 2093-2106.
IEEE DOI
2304
Transformers, Task analysis, Object detection, Detectors, Training,
Feature extraction, Model reusing, vision transformers
BibRef
Li, Y.[Ying],
Chen, K.[Kehan],
Sun, S.L.[Shi-Lei],
He, C.[Chu],
Multi-scale homography estimation based on dual feature aggregation
transformer,
IET-IPR(17), No. 5, 2023, pp. 1403-1416.
DOI Link
2304
image matching, image registration
BibRef
Wang, G.Q.[Guan-Qun],
Chen, H.[He],
Chen, L.[Liang],
Zhuang, Y.[Yin],
Zhang, S.H.[Shang-Hang],
Zhang, T.[Tong],
Dong, H.[Hao],
Gao, P.[Peng],
P2FEViT: Plug-and-Play CNN Feature Embedded Hybrid Vision Transformer
for Remote Sensing Image Classification,
RS(15), No. 7, 2023, pp. 1773.
DOI Link
2304
BibRef
Zhang, Q.M.[Qi-Ming],
Xu, Y.F.[Yu-Fei],
Zhang, J.[Jing],
Tao, D.C.[Da-Cheng],
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond,
IJCV(131), No. 5, May 2023, pp. 1141-1162.
Springer DOI
2305
BibRef
Fan, X.[Xinyi],
Liu, H.J.[Hua-Jun],
FlexFormer: Flexible Transformer for efficient visual recognition,
PRL(169), 2023, pp. 95-101.
Elsevier DOI
2305
Vision transformer, Frequency analysis, Image classification
BibRef
Cho, S.[Seokju],
Hong, S.[Sunghwan],
Kim, S.[Seungryong],
CATs++: Boosting Cost Aggregation With Convolutions and Transformers,
PAMI(45), No. 6, June 2023, pp. 7174-7194.
IEEE DOI
WWW Link.
2305
Costs, Transformers, Correlation, Semantics, Feature extraction,
Task analysis, Cost aggregation, efficient transformer,
semantic visual correspondence
BibRef
Kim, B.J.[Bum Jun],
Choi, H.[Hyeyeon],
Jang, H.[Hyeonah],
Lee, D.G.[Dong Gu],
Jeong, W.[Wonseok],
Kim, S.W.[Sang Woo],
Improved robustness of vision transformers via prelayernorm in patch
embedding,
PR(141), 2023, pp. 109659.
Elsevier DOI
2306
Vision transformer, Patch embedding, Contrast enhancement,
Robustness, Layer normalization, Convolutional neural network, Deep learning
BibRef
He, Q.[Qibin],
Sun, X.[Xian],
Yan, Z.Y.[Zhi-Yuan],
Wang, B.[Bing],
Zhu, Z.[Zicong],
Diao, W.H.[Wen-Hui],
Yang, M.Y.[Michael Ying],
AST: Adaptive Self-supervised Transformer for optical remote sensing
representation,
PandRS(200), 2023, pp. 41-54.
Elsevier DOI
2306
Cross-scale transformer, Interpretation, Masked image modeling,
Optical remote sensing, Representation learning
BibRef
Wang, Z.[Ziwei],
Wang, C.Y.[Chang-Yuan],
Xu, X.[Xiuwei],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Quantformer: Learning Extremely Low-Precision Vision Transformers,
PAMI(45), No. 7, July 2023, pp. 8813-8826.
IEEE DOI
2306
Quantization (signal), Transformers, Computational modeling,
Search problems, Object detection, Image color analysis,
vision transformers
BibRef
Sun, S.Y.[Shu-Yang],
Yue, X.Y.[Xiao-Yu],
Zhao, H.S.[Heng-Shuang],
Torr, P.H.S.[Philip H.S.],
Bai, S.[Song],
Patch-Based Separable Transformer for Visual Recognition,
PAMI(45), No. 7, July 2023, pp. 9241-9247.
IEEE DOI
2306
Task analysis, Current transformers, Visualization,
Feature extraction, Convolutional neural networks,
instance segmentation
BibRef
Yue, X.Y.[Xiao-Yu],
Sun, S.Y.[Shu-Yang],
Kuang, Z.H.[Zhang-Hui],
Wei, M.[Meng],
Torr, P.H.S.[Philip H.S.],
Zhang, W.[Wayne],
Lin, D.[Dahua],
Vision Transformer with Progressive Sampling,
ICCV21(377-386)
IEEE DOI
2203
Codes, Computational modeling, Interference,
Transformers, Feature extraction, Recognition and classification,
Representation learning
BibRef
Zheng, F.[Fujian],
Lin, S.[Shuai],
Zhou, W.[Wei],
Huang, H.[Hong],
A Lightweight Dual-Branch Swin Transfomrer for Remote Sensing Scene
Classification,
RS(15), No. 11, 2023, pp. 2865.
DOI Link
2306
BibRef
Yu, L.[Lu],
Xiang, W.[Wei],
Fang, J.[Juan],
Chen, Y.P.P.[Yi-Ping Phoebe],
Chi, L.[Lianhua],
eX-ViT: A Novel explainable vision transformer for weakly supervised
semantic segmentation,
PR(142), 2023, pp. 109666.
Elsevier DOI
2307
Explainable, Attention map, Transformer, Weakly supervised
BibRef
Peng, Z.L.[Zhi-Liang],
Guo, Z.H.[Zong-Hao],
Huang, W.[Wei],
Wang, Y.W.[Yao-Wei],
Xie, L.X.[Ling-Xi],
Jiao, J.B.[Jian-Bin],
Tian, Q.[Qi],
Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for
Recognition and Detection,
PAMI(45), No. 8, August 2023, pp. 9454-9468.
IEEE DOI
2307
Transformers, Feature extraction, Couplings, Visualization,
Detectors, Convolution, Object detection, Feature fusion,
vision transformer
BibRef
Peng, Z.L.[Zhi-Liang],
Huang, W.[Wei],
Gu, S.Z.[Shan-Zhi],
Xie, L.X.[Ling-Xi],
Wang, Y.[Yaowei],
Jiao, J.B.[Jian-Bin],
Ye, Q.X.[Qi-Xiang],
Conformer: Local Features Coupling Global Representations for Visual
Recognition,
ICCV21(357-366)
IEEE DOI
2203
Couplings, Representation learning, Visualization, Fuses,
Convolution, Object detection, Transformers,
Representation learning
BibRef
Feng, Z.Z.[Zhan-Zhou],
Zhang, S.L.[Shi-Liang],
Efficient Vision Transformer via Token Merger,
IP(32), 2023, pp. 4156-4169.
IEEE DOI
2307
Corporate acquisitions, Transformers, Semantics, Task analysis,
Visualization, Merging, Computational efficiency, sparese representation
BibRef
Yang, J.H.[Jia-Hao],
Li, X.Y.[Xiang-Yang],
Zheng, M.[Mao],
Wang, Z.[Zihan],
Zhu, Y.Q.[Yong-Qing],
Guo, X.Q.[Xiao-Qian],
Yuan, Y.C.[Yu-Chen],
Chai, Z.[Zifeng],
Jiang, S.Q.[Shu-Qiang],
MemBridge: Video-Language Pre-Training With Memory-Augmented
Inter-Modality Bridge,
IP(32), 2023, pp. 4073-4087.
IEEE DOI
2307
WWW Link. Bridges, Transformers, Computer architecture, Task analysis,
Visualization, Feature extraction, Memory modules, memory module
BibRef
Wang, D.L.[Duo-Lin],
Chen, Y.[Yadang],
Naz, B.[Bushra],
Sun, L.[Le],
Li, B.Z.[Bao-Zhu],
Spatial-Aware Transformer (SAT): Enhancing Global Modeling in
Transformer Segmentation for Remote Sensing Images,
RS(15), No. 14, 2023, pp. 3607.
DOI Link
2307
BibRef
Huang, X.Y.[Xin-Yan],
Liu, F.[Fang],
Cui, Y.H.[Yuan-Hao],
Chen, P.[Puhua],
Li, L.L.[Ling-Ling],
Li, P.F.[Peng-Fang],
Faster and Better: A Lightweight Transformer Network for Remote
Sensing Scene Classification,
RS(15), No. 14, 2023, pp. 3645.
DOI Link
2307
BibRef
Yao, T.[Ting],
Li, Y.[Yehao],
Pan, Y.W.[Ying-Wei],
Wang, Y.[Yu],
Zhang, X.P.[Xiao-Ping],
Mei, T.[Tao],
Dual Vision Transformer,
PAMI(45), No. 9, September 2023, pp. 10870-10882.
IEEE DOI
2309
Survey, Vision Transformer.
BibRef
Rao, Y.M.[Yong-Ming],
Liu, Z.[Zuyan],
Zhao, W.L.[Wen-Liang],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks,
PAMI(45), No. 9, September 2023, pp. 10883-10897.
IEEE DOI
2309
BibRef
Li, J.[Jie],
Liu, Z.[Zhao],
Li, L.[Li],
Lin, J.Q.[Jun-Qin],
Yao, J.[Jian],
Tu, J.[Jingmin],
Multi-view convolutional vision transformer for 3D object recognition,
JVCIR(95), 2023, pp. 103906.
Elsevier DOI
2309
Multi-view, 3D object recognition, Feature fusion, Convolutional neural networks
BibRef
Wu, G.[Gaojie],
Zheng, W.S.[Wei-Shi],
Lu, Y.T.[Yu-Tong],
Tian, Q.[Qi],
PSLT: A Light-Weight Vision Transformer With Ladder Self-Attention
and Progressive Shift,
PAMI(45), No. 9, September 2023, pp. 11120-11135.
IEEE DOI
2309
BibRef
Shang, J.H.[Jing-Huan],
Li, X.[Xiang],
Kahatapitiya, K.[Kumara],
Lee, Y.C.[Yu-Cheol],
Ryoo, M.S.[Michael S.],
StARformer: Transformer With State-Action-Reward Representations for
Robot Learning,
PAMI(45), No. 11, November 2023, pp. 12862-12877.
IEEE DOI
2310
BibRef
Earlier: A1, A3, A2, A5, Only:
StARformer: Transformer with State-Action-Reward Representations for
Visual Reinforcement Learning,
ECCV22(XXIX:462-479).
Springer DOI
2211
BibRef
Duan, H.R.[Hao-Ran],
Long, Y.[Yang],
Wang, S.D.[Shi-Dong],
Zhang, H.F.[Hao-Feng],
Willcocks, C.G.[Chris G.],
Shao, L.[Ling],
Dynamic Unary Convolution in Transformers,
PAMI(45), No. 11, November 2023, pp. 12747-12759.
IEEE DOI
2310
BibRef
Chen, S.M.[Shi-Ming],
Hong, Z.M.[Zi-Ming],
Hou, W.J.[Wen-Jin],
Xie, G.S.[Guo-Sen],
Song, Y.B.[Yi-Bing],
Zhao, J.[Jian],
You, X.G.[Xin-Ge],
Yan, S.C.[Shui-Cheng],
Shao, L.[Ling],
TransZero++:
Cross Attribute-Guided Transformer for Zero-Shot Learning,
PAMI(45), No. 11, November 2023, pp. 12844-12861.
IEEE DOI
2310
BibRef
Qian, S.J.[Sheng-Ju],
Zhu, Y.[Yi],
Li, W.[Wenbo],
Li, M.[Mu],
Jia, J.Y.[Jia-Ya],
What Makes for Good Tokenizers in Vision Transformer?,
PAMI(45), No. 11, November 2023, pp. 13011-13023.
IEEE DOI
2310
BibRef
Sun, W.X.[Wei-Xuan],
Qin, Z.[Zhen],
Deng, H.[Hui],
Wang, J.[Jianyuan],
Zhang, Y.[Yi],
Zhang, K.[Kaihao],
Barnes, N.[Nick],
Birchfield, S.[Stan],
Kong, L.P.[Ling-Peng],
Zhong, Y.[Yiran],
Vicinity Vision Transformer,
PAMI(45), No. 10, October 2023, pp. 12635-12649.
IEEE DOI
2310
BibRef
Cao, C.J.[Chen-Jie],
Dong, Q.[Qiaole],
Fu, Y.W.[Yan-Wei],
ZITS++: Image Inpainting by Improving the Incremental Transformer on
Structural Priors,
PAMI(45), No. 10, October 2023, pp. 12667-12684.
IEEE DOI
2310
BibRef
Fang, Y.X.[Yu-Xin],
Wang, X.G.[Xing-Gang],
Wu, R.[Rui],
Liu, W.Y.[Wen-Yu],
What Makes for Hierarchical Vision Transformer?,
PAMI(45), No. 10, October 2023, pp. 12714-12720.
IEEE DOI
2310
BibRef
Xu, P.[Peng],
Zhu, X.T.[Xia-Tian],
Clifton, D.A.[David A.],
Multimodal Learning With Transformers: A Survey,
PAMI(45), No. 10, October 2023, pp. 12113-12132.
IEEE DOI
2310
BibRef
Li, K.C.[Kun-Chang],
Wang, Y.[Yali],
Zhang, J.[Junhao],
Gao, P.[Peng],
Song, G.[Guanglu],
Liu, Y.[Yu],
Li, H.S.[Hong-Sheng],
Qiao, Y.[Yu],
UniFormer: Unifying Convolution and Self-Attention for Visual
Recognition,
PAMI(45), No. 10, October 2023, pp. 12581-12600.
IEEE DOI
2310
Unify CNN and Transformers
BibRef
Liu, J.[Jun],
Guo, H.R.[Hao-Ran],
He, Y.[Yile],
Li, H.L.[Hua-Li],
Vision Transformer-Based Ensemble Learning for Hyperspectral Image
Classification,
RS(15), No. 21, 2023, pp. 5208.
DOI Link
2311
BibRef
Lin, M.B.[Ming-Bao],
Chen, M.Z.[Meng-Zhao],
Zhang, Y.X.[Yu-Xin],
Shen, C.H.[Chun-Hua],
Ji, R.R.[Rong-Rong],
Cao, L.J.[Liu-Juan],
Super Vision Transformer,
IJCV(131), No. 12, December 2023, pp. 3136-3151.
Springer DOI
2311
BibRef
Li, H.L.[Hao-Ling],
Xue, M.Q.[Meng-Qi],
Song, J.[Jie],
Zhang, H.F.[Hao-Fei],
Huang, W.Q.[Wen-Qi],
Liang, L.[Lingyu],
Song, M.L.[Ming-Li],
Constituent Attention for Vision Transformers,
CVIU(237), 2023, pp. 103838.
Elsevier DOI Code:
WWW Link.
2311
Vision Transformer, Attention mechanism, Classification,
Interpretability for deep learning
BibRef
Li, Z.Y.[Zhong-Yu],
Gao, S.[Shanghua],
Cheng, M.M.[Ming-Ming],
SERE: Exploring Feature Self-Relation for Self-Supervised Transformer,
PAMI(45), No. 12, December 2023, pp. 15619-15631.
IEEE DOI
2311
BibRef
Ling, Z.X.[Zhi-Xin],
Xing, Z.[Zhen],
Zhou, X.D.[Xiang-Dong],
Cao, M.L.[Man-Liang],
Zhou, G.C.[Gui-Chun],
PanoSwin: a Pano-style Swin Transformer for Panorama Understanding,
CVPR23(17755-17764)
IEEE DOI
2309
BibRef
Bowman, B.[Benjamin],
Achille, A.[Alessandro],
Zancato, L.[Luca],
Trager, M.[Matthew],
Perera, P.[Pramuditha],
Paolini, G.[Giovanni],
Soatto, S.[Stefano],
Ŕ-la-carte Prompt Tuning (APT):
Combining Distinct Data Via Composable Prompting,
CVPR23(14984-14993)
IEEE DOI
2309
BibRef
Nakhli, R.[Ramin],
Moghadam, P.A.[Puria Azadi],
Mi, H.Y.[Hao-Yang],
Farahani, H.[Hossein],
Baras, A.[Alexander],
Gilks, B.[Blake],
Bashashati, A.[Ali],
Sparse Multi-Modal Graph Transformer with Shared-Context Processing
for Representation Learning of Giga-pixel Images,
CVPR23(11547-11557)
IEEE DOI
2309
BibRef
Gärtner, E.[Erik],
Metz, L.[Luke],
Andriluka, M.[Mykhaylo],
Freeman, C.D.[C. Daniel],
Sminchisescu, C.[Cristian],
Transformer-Based Learned Optimization,
CVPR23(11970-11979)
IEEE DOI
2309
BibRef
Ding, M.Y.[Ming-Yu],
Shen, Y.[Yikang],
Fan, L.J.[Li-Jie],
Chen, Z.F.[Zhen-Fang],
Chen, Z.[Zitian],
Luo, P.[Ping],
Tenenbaum, J.[Josh],
Gan, C.[Chuang],
Visual Dependency Transformers:
Dependency Tree Emerges from Reversed Attention,
CVPR23(14528-14539)
IEEE DOI
2309
BibRef
Song, J.C.[Jie-Chong],
Mou, C.[Chong],
Wang, S.Q.[Shi-Qi],
Ma, S.W.[Si-Wei],
Zhang, J.[Jian],
Optimization-Inspired Cross-Attention Transformer for Compressive
Sensing,
CVPR23(6174-6184)
IEEE DOI
2309
BibRef
Li, J.C.[Jia-Chen],
Hassani, A.[Ali],
Walton, S.[Steven],
Shi, H.[Humphrey],
ConvMLP: Hierarchical Convolutional MLPs for Vision,
WFM23(6307-6316)
IEEE DOI
2309
multi-layer perceptron
BibRef
Hassani, A.[Ali],
Walton, S.[Steven],
Li, J.C.[Jia-Chen],
Li, S.[Shen],
Shi, H.[Humphrey],
Neighborhood Attention Transformer,
CVPR23(6185-6194)
IEEE DOI
2309
BibRef
Walmer, M.[Matthew],
Suri, S.[Saksham],
Gupta, K.[Kamal],
Shrivastava, A.[Abhinav],
Teaching Matters:
Investigating the Role of Supervision in Vision Transformers,
CVPR23(7486-7496)
IEEE DOI
2309
BibRef
Wang, S.G.[Shi-Guang],
Xie, T.[Tao],
Cheng, J.[Jian],
Zhang, X.C.[Xing-Cheng],
Liu, H.J.[Hai-Jun],
MDL-NAS: A Joint Multi-domain Learning Framework for Vision
Transformer,
CVPR23(20094-20104)
IEEE DOI
2309
BibRef
Ko, D.[Dohwan],
Choi, J.[Joonmyung],
Choi, H.K.[Hyeong Kyu],
On, K.W.[Kyoung-Woon],
Roh, B.[Byungseok],
Kim, H.W.J.[Hyun-Woo J.],
MELTR: Meta Loss Transformer for Learning to Fine-tune Video
Foundation Models,
CVPR23(20105-20115)
IEEE DOI
2309
BibRef
Ren, S.[Sucheng],
Wei, F.Y.[Fang-Yun],
Zhang, Z.[Zheng],
Hu, H.[Han],
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models,
CVPR23(3687-3697)
IEEE DOI
2309
BibRef
He, J.F.[Jian-Feng],
Gao, Y.[Yuan],
Zhang, T.Z.[Tian-Zhu],
Zhang, Z.[Zhe],
Wu, F.[Feng],
D2Former: Jointly Learning Hierarchical Detectors and Contextual
Descriptors via Agent-Based Transformers,
CVPR23(2904-2914)
IEEE DOI
2309
BibRef
Liu, Z.J.[Zhi-Jian],
Yang, X.Y.[Xin-Yu],
Tang, H.T.[Hao-Tian],
Yang, S.[Shang],
Han, S.[Song],
FlatFormer: Flattened Window Attention for Efficient Point Cloud
Transformer,
CVPR23(1200-1211)
IEEE DOI
2309
BibRef
Chen, X.[Xuanyao],
Liu, Z.J.[Zhi-Jian],
Tang, H.T.[Hao-Tian],
Yi, L.[Li],
Zhao, H.[Hang],
Han, S.[Song],
SparseViT: Revisiting Activation Sparsity for Efficient
High-Resolution Vision Transformer,
CVPR23(2061-2070)
IEEE DOI
2309
BibRef
Pan, X.[Xuran],
Ye, T.Z.[Tian-Zhu],
Xia, Z.[Zhuofan],
Song, S.[Shiji],
Huang, G.[Gao],
Slide-Transformer: Hierarchical Vision Transformer with Local
Self-Attention,
CVPR23(2082-2091)
IEEE DOI
2309
BibRef
Wei, S.Y.[Si-Yuan],
Ye, T.Z.[Tian-Zhu],
Zhang, S.[Shen],
Tang, Y.[Yao],
Liang, J.J.[Jia-Jun],
Joint Token Pruning and Squeezing Towards More Aggressive Compression
of Vision Transformers,
CVPR23(2092-2101)
IEEE DOI
2309
BibRef
Lin, Y.B.[Yan-Bo],
Sung, Y.L.[Yi-Lin],
Lei, J.[Jie],
Bansal, M.[Mohit],
Bertasius, G.[Gedas],
Vision Transformers are Parameter-Efficient Audio-Visual Learners,
CVPR23(2299-2309)
IEEE DOI
2309
BibRef
Das, R.[Rajshekhar],
Dukler, Y.[Yonatan],
Ravichandran, A.[Avinash],
Swaminathan, A.[Ashwin],
Learning Expressive Prompting With Residuals for Vision Transformers,
CVPR23(3366-3377)
IEEE DOI
2309
BibRef
Zheng, M.X.[Meng-Xin],
Lou, Q.[Qian],
Jiang, L.[Lei],
TrojViT: Trojan Insertion in Vision Transformers,
CVPR23(4025-4034)
IEEE DOI
2309
BibRef
Guo, Y.[Yong],
Stutz, D.[David],
Schiele, B.[Bernt],
Improving Robustness of Vision Transformers by Reducing Sensitivity
to Patch Corruptions,
CVPR23(4108-4118)
IEEE DOI
2309
BibRef
Liu, J.[Jihao],
Huang, X.[Xin],
Zheng, J.L.[Jin-Liang],
Liu, Y.[Yu],
Li, H.S.[Hong-Sheng],
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of
Hierarchical Vision Transformers,
CVPR23(6252-6261)
IEEE DOI
2309
BibRef
Li, Y.X.[Yan-Xi],
Xu, C.[Chang],
Trade-off between Robustness and Accuracy of Vision Transformers,
CVPR23(7558-7568)
IEEE DOI
2309
BibRef
Zhu, L.[Lei],
Wang, X.J.[Xin-Jiang],
Ke, Z.[Zhanghan],
Zhang, W.[Wayne],
Lau, R.[Rynson],
BiFormer: Vision Transformer with Bi-Level Routing Attention,
CVPR23(10323-10333)
IEEE DOI
2309
BibRef
Long, S.[Sifan],
Zhao, Z.[Zhen],
Pi, J.[Jimin],
Wang, S.S.[Sheng-Sheng],
Wang, J.D.[Jing-Dong],
Beyond Attentive Tokens: Incorporating Token Importance and Diversity
for Efficient Vision Transformers,
CVPR23(10334-10343)
IEEE DOI
2309
BibRef
Tarasiou, M.[Michail],
Chavez, E.[Erik],
Zafeiriou, S.[Stefanos],
ViTs for SITS: Vision Transformers for Satellite Image Time Series,
CVPR23(10418-10428)
IEEE DOI
2309
BibRef
Yu, Z.Z.[Zhong-Zhi],
Wu, S.[Shang],
Fu, Y.G.[Yong-Gan],
Zhang, S.[Shunyao],
Lin, Y.Y.C.[Ying-Yan Celine],
Hint-Aug: Drawing Hints from Foundation Vision Transformers towards
Boosted Few-shot Parameter-Efficient Tuning,
CVPR23(11102-11112)
IEEE DOI
2309
BibRef
Kim, D.[Dahun],
Angelova, A.[Anelia],
Kuo, W.C.[Wei-Cheng],
Region-Aware Pretraining for Open-Vocabulary Object Detection with
Vision Transformers,
CVPR23(11144-11154)
IEEE DOI
2309
BibRef
Hou, J.[Ji],
Dai, X.L.[Xiao-Liang],
He, Z.J.[Zi-Jian],
Dai, A.[Angela],
Nießner, M.[Matthias],
Mask3D: Pretraining 2D Vision Transformers by Learning Masked 3D
Priors,
CVPR23(13510-13519)
IEEE DOI
2309
BibRef
Liu, X.Y.[Xin-Yu],
Peng, H.[Houwen],
Zheng, N.X.[Ning-Xin],
Yang, Y.Q.[Yu-Qing],
Hu, H.[Han],
Yuan, Y.X.[Yi-Xuan],
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group
Attention,
CVPR23(14420-14430)
IEEE DOI
2309
BibRef
You, H.R.[Hao-Ran],
Xiong, Y.[Yunyang],
Dai, X.L.[Xiao-Liang],
Wu, B.[Bichen],
Zhang, P.Z.[Pei-Zhao],
Fan, H.Q.[Hao-Qi],
Vajda, P.[Peter],
Lin, Y.Y.C.[Ying-Yan Celine],
Castling-ViT: Compressing Self-Attention via Switching Towards
Linear-Angular Attention at Vision Transformer Inference,
CVPR23(14431-14442)
IEEE DOI
2309
BibRef
Xu, Z.Z.[Zheng-Zhuo],
Liu, R.[Ruikang],
Yang, S.[Shuo],
Chai, Z.[Zenghao],
Yuan, C.[Chun],
Learning Imbalanced Data with Vision Transformers,
CVPR23(15793-15803)
IEEE DOI
2309
BibRef
Zhang, J.P.[Jian-Ping],
Huang, Y.Z.[Yi-Zhan],
Wu, W.B.[Wei-Bin],
Lyu, M.R.[Michael R.],
Transferable Adversarial Attacks on Vision Transformers with Token
Gradient Regularization,
CVPR23(16415-16424)
IEEE DOI
2309
BibRef
Yang, H.[Huanrui],
Yin, H.X.[Hong-Xu],
Shen, M.[Maying],
Molchanov, P.[Pavlo],
Li, H.[Hai],
Kautz, J.[Jan],
Global Vision Transformer Pruning with Hessian-Aware Saliency,
CVPR23(18547-18557)
IEEE DOI
2309
BibRef
Grainger, R.[Ryan],
Paniagua, T.[Thomas],
Song, X.[Xi],
Cuntoor, N.[Naresh],
Lee, M.W.[Mun Wai],
Wu, T.F.[Tian-Fu],
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers,
CVPR23(18568-18578)
IEEE DOI
2309
BibRef
Takashima, S.[Sora],
Hayamizu, R.[Ryo],
Inoue, N.[Nakamasa],
Kataoka, H.[Hirokatsu],
Yokota, R.[Rio],
Visual Atoms: Pre-Training Vision Transformers with Sinusoidal Waves,
CVPR23(18579-18588)
IEEE DOI
2309
BibRef
Kang, D.[Dahyun],
Koniusz, P.[Piotr],
Cho, M.[Minsu],
Murray, N.[Naila],
Distilling Self-Supervised Vision Transformers for Weakly-Supervised
Few-Shot Classification and Segmentation,
CVPR23(19627-19638)
IEEE DOI
2309
BibRef
Liu, Y.J.[Yi-Jiang],
Yang, H.[Huanrui],
Dong, Z.[Zhen],
Keutzer, K.[Kurt],
Du, L.[Li],
Zhang, S.[Shanghang],
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization
for Vision Transformers,
CVPR23(20321-20330)
IEEE DOI
2309
BibRef
Park, J.[Jeongsoo],
Johnson, J.[Justin],
RGB No More: Minimally-Decoded JPEG Vision Transformers,
CVPR23(22334-22346)
IEEE DOI
2309
BibRef
Yu, C.[Chong],
Chen, T.[Tao],
Gan, Z.X.[Zhong-Xue],
Fan, J.Y.[Jia-Yuan],
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization,
CVPR23(22658-22668)
IEEE DOI
2309
BibRef
Bao, F.[Fan],
Nie, S.[Shen],
Xue, K.[Kaiwen],
Cao, Y.[Yue],
Li, C.X.[Chong-Xuan],
Su, H.[Hang],
Zhu, J.[Jun],
All are Worth Words: A ViT Backbone for Diffusion Models,
CVPR23(22669-22679)
IEEE DOI
2309
BibRef
Wei, C.[Cong],
Duke, B.[Brendan],
Jiang, R.[Ruowei],
Aarabi, P.[Parham],
Taylor, G.W.[Graham W.],
Shkurti, F.[Florian],
Sparsifiner: Learning Sparse Instance-Dependent Attention for
Efficient Vision Transformers,
CVPR23(22680-22689)
IEEE DOI
2309
BibRef
Li, B.[Bonan],
Hu, Y.[Yinhan],
Nie, X.C.[Xue-Cheng],
Han, C.Y.[Cong-Ying],
Jiang, X.J.[Xiang-Jian],
Guo, T.D.[Tian-De],
Liu, L.Q.[Luo-Qi],
DropKey for Vision Transformer,
CVPR23(22700-22709)
IEEE DOI
2309
BibRef
Lan, S.Y.[Shi-Yi],
Yang, X.[Xitong],
Yu, Z.[Zhiding],
Wu, Z.[Zuxuan],
Alvarez, J.M.[Jose M.],
Anandkumar, A.[Anima],
Vision Transformers are Good Mask Auto-Labelers,
CVPR23(23745-23755)
IEEE DOI
2309
BibRef
Yu, L.[Lu],
Xiang, W.[Wei],
X-Pruner: eXplainable Pruning for Vision Transformers,
CVPR23(24355-24363)
IEEE DOI
2309
BibRef
Singh, A.[Apoorv],
Training Strategies for Vision Transformers for Object Detection,
WAD23(110-118)
IEEE DOI
2309
BibRef
Hukkelĺs, H.[Hĺkon],
Lindseth, F.[Frank],
Does Image Anonymization Impact Computer Vision Training?,
WAD23(140-150)
IEEE DOI
2309
BibRef
Marnissi, M.A.[Mohamed Amine],
Fathallah, A.[Abir],
GAN-based Vision Transformer for High-Quality Thermal Image
Enhancement,
GCV23(817-825)
IEEE DOI
2309
BibRef
Scheibenreif, L.[Linus],
Mommert, M.[Michael],
Borth, D.[Damian],
Masked Vision Transformers for Hyperspectral Image Classification,
EarthVision23(2166-2176)
IEEE DOI
2309
BibRef
Komorowski, P.[Piotr],
Baniecki, H.[Hubert],
Biecek, P.[Przemyslaw],
Towards Evaluating Explanations of Vision Transformers for Medical
Imaging,
XAI4CV23(3726-3732)
IEEE DOI
2309
BibRef
Nalmpantis, A.[Angelos],
Panagiotopoulos, A.[Apostolos],
Gkountouras, J.[John],
Papakostas, K.[Konstantinos],
Aziz, W.[Wilker],
Vision DiffMask: Faithful Interpretation of Vision Transformers with
Differentiable Patch Masking,
XAI4CV23(3756-3763)
IEEE DOI
2309
BibRef
Ronen, T.[Tomer],
Levy, O.[Omer],
Golbert, A.[Avram],
Vision Transformers with Mixed-Resolution Tokenization,
ECV23(4613-4622)
IEEE DOI
2309
BibRef
Le, P.H.C.[Phuoc-Hoan Charles],
Li, X.[Xinlin],
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional
Models,
ECV23(4665-4674)
IEEE DOI
2309
BibRef
Bhattacharyya, M.[Mayukh],
Chattopadhyay, S.[Soumitri],
Nag, S.[Sayan],
DeCAtt: Efficient Vision Transformers with Decorrelated Attention
Heads,
ECV23(4695-4699)
IEEE DOI
2309
BibRef
Ma, D.[Dongning],
Zhao, P.F.[Peng-Fei],
Jiao, X.[Xun],
PerfHD: Efficient ViT Architecture Performance Ranking using
Hyperdimensional Computing,
NAS23(2230-2237)
IEEE DOI
2309
BibRef
Wang, J.[Jun],
Alamayreh, O.[Omran],
Tondi, B.[Benedetta],
Barni, M.[Mauro],
Open Set Classification of GAN-based Image Manipulations via a
ViT-based Hybrid Architecture,
WMF23(953-962)
IEEE DOI
2309
BibRef
Tian, R.[Rui],
Wu, Z.[Zuxuan],
Dai, Q.[Qi],
Hu, H.[Han],
Qiao, Y.[Yu],
Jiang, Y.G.[Yu-Gang],
ResFormer: Scaling ViTs with Multi-Resolution Training,
CVPR23(22721-22731)
IEEE DOI
2309
BibRef
Li, Y.[Yi],
Min, K.[Kyle],
Tripathi, S.[Subarna],
Vasconcelos, N.M.[Nuno M.],
SViTT: Temporal Learning of Sparse Video-Text Transformers,
CVPR23(18919-18929)
IEEE DOI
2309
BibRef
Beyer, L.[Lucas],
Izmailov, P.[Pavel],
Kolesnikov, A.[Alexander],
Caron, M.[Mathilde],
Kornblith, S.[Simon],
Zhai, X.H.[Xiao-Hua],
Minderer, M.[Matthias],
Tschannen, M.[Michael],
Alabdulmohsin, I.[Ibrahim],
Pavetic, F.[Filip],
FlexiViT: One Model for All Patch Sizes,
CVPR23(14496-14506)
IEEE DOI
2309
BibRef
Chang, S.N.[Shu-Ning],
Wang, P.[Pichao],
Lin, M.[Ming],
Wang, F.[Fan],
Zhang, D.J.[David Junhao],
Jin, R.[Rong],
Shou, M.Z.[Mike Zheng],
Making Vision Transformers Efficient from A Token Sparsification View,
CVPR23(6195-6205)
IEEE DOI
2309
BibRef
Naeem, M.F.[Muhammad Ferjad],
Khan, M.G.Z.A.[Muhammad Gul Zain Ali],
Xian, Y.Q.[Yong-Qin],
Afzal, M.Z.[Muhammad Zeshan],
Stricker, D.[Didier],
Van Gool, L.J.[Luc J.],
Tombari, F.[Federico],
I2MVFormer: Large Language Model Generated Multi-View Document
Supervision for Zero-Shot Image Classification,
CVPR23(15169-15179)
IEEE DOI
2309
BibRef
Tatsunami, Y.[Yuki],
Taki, M.[Masato],
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
Locality?,
ACCV22(VI:459-475).
Springer DOI
2307
WWW Link. Address computational comlexity.
BibRef
Phan, L.[Lam],
Nguyen, H.T.H.[Hiep Thi Hong],
Warrier, H.[Harikrishna],
Gupta, Y.[Yogesh],
Patch Embedding as Local Features: Unifying Deep Local and Global
Features via Vision Transformer for Image Retrieval,
ACCV22(II:204-221).
Springer DOI
2307
BibRef
Guo, X.D.[Xin-Dong],
Sun, Y.[Yu],
Zhao, R.[Rong],
Kuang, L.Q.[Li-Qun],
Han, X.[Xie],
SWPT: Spherical Window-based Point Cloud Transformer,
ACCV22(I:396-412).
Springer DOI
2307
BibRef
Wang, W.J.[Wen-Ju],
Chen, G.[Gang],
Zhou, H.R.[Hao-Ran],
Wang, X.L.[Xiao-Lin],
OVPT: Optimal Viewset Pooling Transformer for 3d Object Recognition,
ACCV22(I:486-503).
Springer DOI
2307
BibRef
Kim, D.[Daeho],
Kim, J.[Jaeil],
Vision Transformer Compression and Architecture Exploration with
Efficient Embedding Space Search,
ACCV22(III:524-540).
Springer DOI
2307
BibRef
Bolya, D.[Daniel],
Fu, C.Y.[Cheng-Yang],
Dai, X.L.[Xiao-Liang],
Zhang, P.Z.[Pei-Zhao],
Hoffman, J.[Judy],
Hydra Attention: Efficient Attention with Many Heads,
CADK22(35-49).
Springer DOI
2304
Transformers computation explodes with large images. Multiple heads.
BibRef
Lee, Y.S.[Yun-Sung],
Lee, G.[Gyuseong],
Ryoo, K.[Kwangrok],
Go, H.[Hyojun],
Park, J.[Jihye],
Kim, S.[Seungryong],
Towards Flexible Inductive Bias via Progressive Reparameterization
Scheduling,
VIPriors22(706-720).
Springer DOI
2304
Transformers vs. CNN different benefits. Best of both.
BibRef
Amir, S.[Shir],
Gandelsman, Y.[Yossi],
Bagon, S.[Shai],
Dekel, T.[Tali],
On the Effectiveness of VIT Features as Local Semantic Descriptors,
SelfLearn22(39-55).
Springer DOI
2304
BibRef
Deng, X.[Xuran],
Liu, C.B.[Chuan-Bin],
Lu, Z.[Zhiying],
Recombining Vision Transformer Architecture for Fine-grained Visual
Categorization,
MMMod23(II: 127-138).
Springer DOI
2304
BibRef
Tonkes, V.[Vincent],
Sabatelli, M.[Matthia],
How Well Do Vision Transformers (vts) Transfer to the Non-natural Image
Domain? An Empirical Study Involving Art Classification,
VisArt22(234-250).
Springer DOI
2304
BibRef
Li, B.C.[Bing-Chen],
Li, X.[Xin],
Lu, Y.T.[Yi-Ting],
Liu, S.[Sen],
Feng, R.[Ruoyu],
Chen, Z.B.[Zhi-Bo],
HST: Hierarchical Swin Transformer for Compressed Image
Super-resolution,
AIM22(651-668).
Springer DOI
2304
BibRef
Conde, M.V.[Marcos V.],
Choi, U.J.[Ui-Jin],
Burchi, M.[Maxime],
Timofte, R.[Radu],
Swin2sr: Swinv2 Transformer for Compressed Image Super-resolution and
Restoration,
AIM22(669-687).
Springer DOI
2304
BibRef
Rangrej, S.B.[Samrudhdhi B],
Liang, K.J.[Kevin J],
Hassner, T.[Tal],
Clark, J.J.[James J],
GliTr: Glimpse Transformers with Spatiotemporal Consistency for
Online Action Prediction,
WACV23(3402-3412)
IEEE DOI
2302
Predictive models, Transformers, Cameras, Spatiotemporal phenomena,
Sensors, Observability
BibRef
Mo, S.T.[Shen-Tong],
Sun, Z.[Zhun],
Li, C.[Chao],
Multi-level Contrastive Learning for Self-Supervised Vision
Transformers,
WACV23(2777-2786)
IEEE DOI
2302
Training, Representation learning, Head, Semantic segmentation,
Self-supervised learning, visual reasoning
BibRef
Yun, J.[Jooyeol],
Lee, S.[Sanghyeon],
Park, M.H.[Min-Ho],
Choo, J.[Jaegul],
iColoriT: Towards Propagating Local Hints to the Right Region in
Interactive Colorization by Leveraging Vision Transformer,
WACV23(1787-1796)
IEEE DOI
2302
Convolutional codes, Image color analysis, Stacking, Gray-scale,
Transformers, Algorithms: Computational photography, image and video synthesis
BibRef
Liu, Y.[Yue],
Matsoukas, C.[Christos],
Strand, F.[Fredrik],
Azizpour, H.[Hossein],
Smith, K.[Kevin],
PatchDropout: Economizing Vision Transformers Using Patch Dropout,
WACV23(3942-3951)
IEEE DOI
2302
Training, Image resolution, Computational modeling,
Biological system modeling, Memory management, Transformers,
Biomedical/healthcare/medicine
BibRef
Chen, X.Y.[Xiang-Yu],
Hu, Q.[Qinghao],
Li, K.[Kaidong],
Zhong, C.[Cuncong],
Wang, G.H.[Guang-Hui],
Accumulated Trivial Attention Matters in Vision Transformers on Small
Datasets,
WACV23(3973-3981)
IEEE DOI
2302
Codes, Focusing, Transformers, Convolutional neural networks,
Task analysis, Algorithms: Machine learning architectures,
and algorithms (including transfer)
BibRef
Lan, H.[Hai],
Wang, X.[Xihao],
Shen, H.[Hao],
Liang, P.[Peidong],
Wei, X.[Xian],
Couplformer: Rethinking Vision Transformer with Coupling Attention,
WACV23(6464-6473)
IEEE DOI
2302
Couplings, Visualization, Image segmentation,
Computational modeling, Memory management, Object detection,
Visualization
BibRef
Marin, D.[Dmitrii],
Chang, J.H.R.[Jen-Hao Rick],
Ranjan, A.[Anurag],
Prabhu, A.[Anish],
Rastegari, M.[Mohammad],
Tuzel, O.[Oncel],
Token Pooling in Vision Transformers for Image Classification,
WACV23(12-21)
IEEE DOI
2302
Filtering, Semantic segmentation, Pose estimation, Transformers,
Encoding, Convolutional neural networks, and algorithms (including transfer)
BibRef
Song, C.H.[Chull Hwan],
Yoon, J.Y.[Joo-Young],
Choi, S.[Shunghyun],
Avrithis, Y.[Yannis],
Boosting vision transformers for image retrieval,
WACV23(107-117)
IEEE DOI
2302
Training, Location awareness, Image retrieval,
Self-supervised learning, Image representation, Transformers
BibRef
Yang, J.[Jinyu],
Liu, J.J.[Jing-Jing],
Xu, N.[Ning],
Huang, J.Z.[Jun-Zhou],
TVT: Transferable Vision Transformer for Unsupervised Domain
Adaptation,
WACV23(520-530)
IEEE DOI
2302
Benchmark testing, Image representation, Transformers,
Convolutional neural networks, Task analysis,
and algorithms (including transfer)
BibRef
Lin, K.E.[Kai-En],
Yen-Chen, L.[Lin],
Lai, W.S.[Wei-Sheng],
Lin, T.Y.[Tsung-Yi],
Shih, Y.C.[Yi-Chang],
Ramamoorthi, R.[Ravi],
Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image,
WACV23(806-815)
IEEE DOI
2302
Shape, Pose estimation, Feature extraction, Transformers, Cameras,
Algorithms: Computational photography,
3D computer vision
BibRef
Saavedra-Ruiz, M.[Miguel],
Morin, S.[Sacha],
Paull, L.[Liam],
Monocular Robot Navigation with Self-Supervised Pretrained Vision
Transformers,
CRV22(197-204)
IEEE DOI
2301
Adaptation models, Image segmentation, Image resolution,
Navigation, Transformers, Robot sensing systems, Visual Servoing
BibRef
Debnath, B.[Biplob],
Po, O.[Oliver],
Chowdhury, F.A.[Farhan Asif],
Chakradhar, S.[Srimat],
Cosine Similarity based Few-Shot Video Classifier with
Attention-based Aggregation,
ICPR22(1273-1279)
IEEE DOI
2212
Training, Head, Pipelines, Benchmark testing, Feature extraction,
Transformers
BibRef
Patel, K.[Krushi],
Bur, A.M.[Andrés M.],
Li, F.J.[Feng-Jun],
Wang, G.H.[Guang-Hui],
Aggregating Global Features into Local Vision Transformer,
ICPR22(1141-1147)
IEEE DOI
2212
Source coding, Computational modeling,
Information processing, Performance gain, Transformers
BibRef
Shen, Z.Q.[Zhi-Qiang],
Liu, Z.[Zechun],
Xing, E.[Eric],
Sliced Recursive Transformer,
ECCV22(XXIV:727-744).
Springer DOI
2211
BibRef
Shao, Y.[Yidi],
Loy, C.C.[Chen Change],
Dai, B.[Bo],
Transformer with Implicit Edges for Particle-Based Physics Simulation,
ECCV22(XIX:549-564).
Springer DOI
2211
BibRef
Wang, W.[Wen],
Zhang, J.[Jing],
Cao, Y.[Yang],
Shen, Y.L.[Yong-Liang],
Tao, D.C.[Da-Cheng],
Towards Data-Efficient Detection Transformers,
ECCV22(IX:88-105).
Springer DOI
2211
BibRef
Mari, C.R.[Carlos Roig],
Gonzalez, D.V.[David Varas],
Bou-Balust, E.[Elisenda],
Multi-Scale Transformer-Based Feature Combination for Image Retrieval,
ICIP22(3166-3170)
IEEE DOI
2211
Visualization, Semantics, Image retrieval, Feature extraction,
Transformers, Internet, Image retrieval, Attention, Multi-scale,
Feature combination
BibRef
Lorenzana, M.B.[Marlon Bran],
Engstrom, C.[Craig],
Chandra, S.S.[Shekhar S.],
Transformer Compressed Sensing Via Global Image Tokens,
ICIP22(3011-3015)
IEEE DOI
2211
Training, Limiting, Image resolution, Neural networks,
Image representation, Transformers, MRI
BibRef
Furukawa, R.[Ryouichi],
Hotta, K.[Kazuhiro],
Local Embedding for Axial Attention,
ICIP22(2586-2590)
IEEE DOI
2211
Deep learning, Image segmentation, Visualization,
Computational modeling, Neural networks, Transformers.
BibRef
Lu, X.Y.[Xiao-Yong],
Du, S.[Songlin],
NCTR: Neighborhood Consensus Transformer for Feature Matching,
ICIP22(2726-2730)
IEEE DOI
2211
Learning systems, Impedance matching, Aggregates, Pose estimation,
Neural networks, Transformers, Local feature matching,
graph neural network
BibRef
Jeny, A.A.[Afsana Ahsan],
Junayed, M.S.[Masum Shah],
Islam, M.B.[Md Baharul],
An Efficient End-To-End Image Compression Transformer,
ICIP22(1786-1790)
IEEE DOI
2211
Image coding, Correlation, Limiting, Computational modeling,
Rate-distortion, Video compression, Transformers, entropy model
BibRef
Kakogeorgiou, I.[Ioannis],
Gidaris, S.[Spyros],
Psomas, B.[Bill],
Avrithis, Y.[Yannis],
Bursuc, A.[Andrei],
Karantzalos, K.[Konstantinos],
Komodakis, N.[Nikos],
What to Hide from Your Students: Attention-Guided Masked Image Modeling,
ECCV22(XXX:300-318).
Springer DOI
2211
WWW Link.
BibRef
Bai, J.W.[Jia-Wang],
Yuan, L.[Li],
Xia, S.T.[Shu-Tao],
Yan, S.C.[Shui-Cheng],
Li, Z.F.[Zhi-Feng],
Liu, W.[Wei],
Improving Vision Transformers by Revisiting High-Frequency Components,
ECCV22(XXIV:1-18).
Springer DOI
2211
BibRef
Ding, M.Y.[Ming-Yu],
Xiao, B.[Bin],
Codella, N.[Noel],
Luo, P.[Ping],
Wang, J.D.[Jing-Dong],
Yuan, L.[Lu],
DaViT: Dual Attention Vision Transformers,
ECCV22(XXIV:74-92).
Springer DOI
2211
BibRef
Li, K.[Kehan],
Yu, R.[Runyi],
Wang, Z.[Zhennan],
Yuan, L.[Li],
Song, G.[Guoli],
Chen, J.[Jie],
Locality Guidance for Improving Vision Transformers on Tiny Datasets,
ECCV22(XXIV:110-127).
Springer DOI
2211
BibRef
Wang, P.C.[Pi-Chao],
Wang, X.[Xue],
Wang, F.[Fan],
Lin, M.[Ming],
Chang, S.N.[Shu-Ning],
Li, H.[Hao],
Jin, R.[Rong],
KVT: k-NN Attention for Boosting Vision Transformers,
ECCV22(XXIV:285-302).
Springer DOI
2211
BibRef
Tu, Z.Z.[Zheng-Zhong],
Talebi, H.[Hossein],
Zhang, H.[Han],
Yang, F.[Feng],
Milanfar, P.[Peyman],
Bovik, A.C.[Alan C.],
Li, Y.[Yinxiao],
MaxViT: Multi-axis Vision Transformer,
ECCV22(XXIV:459-479).
Springer DOI
2211
BibRef
Yang, R.[Rui],
Ma, H.L.[Hai-Long],
Wu, J.[Jie],
Tang, Y.S.[Yan-Song],
Xiao, X.F.[Xue-Feng],
Zheng, M.[Min],
Li, X.[Xiu],
ScalableViT: Rethinking the Context-Oriented Generalization of Vision
Transformer,
ECCV22(XXIV:480-496).
Springer DOI
2211
BibRef
Touvron, H.[Hugo],
Cord, M.[Matthieu],
El-Nouby, A.[Alaaeldin],
Verbeek, J.[Jakob],
Jégou, H.[Hervé],
Three Things Everyone Should Know About Vision Transformers,
ECCV22(XXIV:497-515).
Springer DOI
2211
BibRef
Touvron, H.[Hugo],
Cord, M.[Matthieu],
Jégou, H.[Hervé],
DeiT III: Revenge of the ViT,
ECCV22(XXIV:516-533).
Springer DOI
2211
BibRef
Li, Y.H.[Yang-Hao],
Mao, H.Z.[Han-Zi],
Girshick, R.[Ross],
He, K.M.[Kai-Ming],
Exploring Plain Vision Transformer Backbones for Object Detection,
ECCV22(IX:280-296).
Springer DOI
2211
BibRef
Yu, Q.H.[Qi-Hang],
Wang, H.Y.[Hui-Yu],
Qiao, S.Y.[Si-Yuan],
Collins, M.[Maxwell],
Zhu, Y.K.[Yu-Kun],
Adam, H.[Hartwig],
Yuille, A.L.[Alan L.],
Chen, L.C.[Liang-Chieh],
k-means Mask Transformer,
ECCV22(XXIX:288-307).
Springer DOI
2211
BibRef
Lezama, J.[José],
Chang, H.[Huiwen],
Jiang, L.[Lu],
Essa, I.[Irfan],
Improved Masked Image Generation with Token-Critic,
ECCV22(XXIII:70-86).
Springer DOI
2211
Generative transformer.
BibRef
Rao, Y.M.[Yong-Ming],
Zhao, W.L.[Wen-Liang],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
AMixer:
Adaptive Weight Mixing for Self-Attention Free Vision Transformers,
ECCV22(XXI:50-67).
Springer DOI
2211
BibRef
Pham, K.[Khoi],
Kafle, K.[Kushal],
Lin, Z.[Zhe],
Ding, Z.H.[Zhi-Hong],
Cohen, S.[Scott],
Tran, Q.[Quan],
Shrivastava, A.[Abhinav],
Improving Closed and Open-Vocabulary Attribute Prediction Using
Transformers,
ECCV22(XXV:201-219).
Springer DOI
2211
BibRef
Yu, W.X.[Wen-Xin],
Zhang, H.[Hongru],
Lan, T.X.[Tian-Xiang],
Hu, Y.C.[Yu-Cheng],
Yin, D.[Dong],
CBPT: A New Backbone for Enhancing Information Transmission of Vision
Transformers,
ICIP22(156-160)
IEEE DOI
2211
Merging, Information processing, Object detection, Transformers,
Computational complexity, Vision Transformer, Backbone
BibRef
Takeda, M.[Mana],
Yanai, K.[Keiji],
Continual Learning in Vision Transformer,
ICIP22(616-620)
IEEE DOI
2211
Learning systems, Image recognition, Transformers,
Natural language processing, Convolutional neural networks, Vision Transformer
BibRef
Zhou, W.L.[Wei-Lian],
Kamata, S.I.[Sei-Ichiro],
Luo, Z.[Zhengbo],
Xue, X.[Xi],
Rethinking Unified Spectral-Spatial-Based Hyperspectral Image
Classification Under 3D Configuration of Vision Transformer,
ICIP22(711-715)
IEEE DOI
2211
Flowcharts, Correlation, Convolution, Transformers,
Hyperspectral image classification, 3D coordinate positional embedding
BibRef
Li, A.[Ang],
Jiao, J.[Jichao],
Li, N.[Ning],
Qi, W.[Wangjing],
Xu, W.[Wei],
Pang, M.[Min],
Conmw Transformer: A General Vision Transformer Backbone With
Merged-Window Attention,
ICIP22(1551-1555)
IEEE DOI
2211
Image resolution, Convolution, Transformers, Feature extraction,
Tokenization, Computational efficiency, Vision Transformer,
hybrid architecture
BibRef
Li, J.[Junbo],
Zhang, H.[Huan],
Xie, C.[Cihang],
ViP: Unified Certified Detection and Recovery for Patch Attack with
Vision Transformers,
ECCV22(XXV:573-587).
Springer DOI
2211
BibRef
Zhang, Q.M.[Qi-Ming],
Xu, Y.F.[Yu-Fei],
Zhang, J.[Jing],
Tao, D.C.[Da-Cheng],
VSA: Learning Varied-Size Window Attention in Vision Transformers,
ECCV22(XXV:466-483).
Springer DOI
2211
BibRef
Cao, Y.H.[Yun-Hao],
Yu, H.[Hao],
Wu, J.X.[Jian-Xin],
Training Vision Transformers with only 2040 Images,
ECCV22(XXV:220-237).
Springer DOI
2211
BibRef
Wang, C.[Cong],
Xu, H.M.[Hong-Min],
Zhang, X.[Xiong],
Wang, L.[Li],
Zheng, Z.[Zhitong],
Liu, H.F.[Hai-Feng],
Convolutional Embedding Makes Hierarchical Vision Transformer Stronger,
ECCV22(XX:739-756).
Springer DOI
2211
BibRef
Wu, B.[Boxi],
Gu, J.D.[Jin-Dong],
Li, Z.F.[Zhi-Feng],
Cai, D.[Deng],
He, X.F.[Xiao-Fei],
Liu, W.[Wei],
Towards Efficient Adversarial Training on Vision Transformers,
ECCV22(XIII:307-325).
Springer DOI
2211
BibRef
Gu, J.D.[Jin-Dong],
Tresp, V.[Volker],
Qin, Y.[Yao],
Are Vision Transformers Robust to Patch Perturbations?,
ECCV22(XII:404-421).
Springer DOI
2211
BibRef
Zong, Z.[Zhuofan],
Li, K.[Kunchang],
Song, G.[Guanglu],
Wang, Y.[Yali],
Qiao, Y.[Yu],
Leng, B.[Biao],
Liu, Y.[Yu],
Self-slimmed Vision Transformer,
ECCV22(XI:432-448).
Springer DOI
2211
BibRef
Fayyaz, M.[Mohsen],
Koohpayegani, S.A.[Soroush Abbasi],
Jafari, F.R.[Farnoush Rezaei],
Sengupta, S.[Sunando],
Joze, H.R.V.[Hamid Reza Vaezi],
Sommerlade, E.[Eric],
Pirsiavash, H.[Hamed],
Gall, J.[Jürgen],
Adaptive Token Sampling for Efficient Vision Transformers,
ECCV22(XI:396-414).
Springer DOI
2211
BibRef
Li, Z.K.[Zhi-Kai],
Ma, L.P.[Li-Ping],
Chen, M.J.[Meng-Juan],
Xiao, J.R.[Jun-Rui],
Gu, Q.Y.[Qing-Yi],
Patch Similarity Aware Data-Free Quantization for Vision Transformers,
ECCV22(XI:154-170).
Springer DOI
2211
BibRef
Weng, Z.J.[Ze-Jia],
Yang, X.T.[Xi-Tong],
Li, A.[Ang],
Wu, Z.X.[Zu-Xuan],
Jiang, Y.G.[Yu-Gang],
Semi-supervised Vision Transformers,
ECCV22(XXX:605-620).
Springer DOI
2211
BibRef
Mallick, R.[Rupayan],
Benois-Pineau, J.[Jenny],
Zemmari, A.[Akka],
I Saw: A Self-Attention Weighted Method for Explanation of Visual
Transformers,
ICIP22(3271-3275)
IEEE DOI
2211
Measurement, Correlation coefficient, Visualization,
Image segmentation, Databases, Object detection, Transformers,
Gaze Fixation Density Maps
BibRef
Su, T.[Tong],
Ye, S.[Shuo],
Song, C.Q.[Cheng-Qun],
Cheng, J.[Jun],
Mask-Vit: an Object Mask Embedding in Vision Transformer for
Fine-Grained Visual Classification,
ICIP22(1626-1630)
IEEE DOI
2211
Knowledge engineering, Visualization, Focusing, Interference,
Benchmark testing, Transformers, Feature extraction,
Knowledge Embedding
BibRef
Gai, L.[Lulu],
Chen, W.[Wei],
Gao, R.[Rui],
Chen, Y.W.[Yan-Wei],
Qiao, X.[Xu],
Using Vision Transformers in 3-D Medical Image Classifications,
ICIP22(696-700)
IEEE DOI
2211
Deep learning, Training, Visualization, Transfer learning,
Optimization methods, Self-supervised learning, Transformers,
3-D medical image classifications
BibRef
Wu, K.[Kan],
Zhang, J.[Jinnian],
Peng, H.[Houwen],
Liu, M.C.[Meng-Chen],
Xiao, B.[Bin],
Fu, J.L.[Jian-Long],
Yuan, L.[Lu],
TinyViT: Fast Pretraining Distillation for Small Vision Transformers,
ECCV22(XXI:68-85).
Springer DOI
2211
BibRef
Gao, L.[Li],
Nie, D.[Dong],
Li, B.[Bo],
Ren, X.F.[Xiao-Feng],
Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with
Local Representation,
ECCV22(XXIII:744-761).
Springer DOI
2211
BibRef
Yao, T.[Ting],
Pan, Y.W.[Ying-Wei],
Li, Y.[Yehao],
Ngo, C.W.[Chong-Wah],
Mei, T.[Tao],
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation
Learning,
ECCV22(XXV:328-345).
Springer DOI
2211
BibRef
Yuan, Z.H.[Zhi-Hang],
Xue, C.H.[Chen-Hao],
Chen, Y.Q.[Yi-Qi],
Wu, Q.[Qiang],
Sun, G.Y.[Guang-Yu],
PTQ4ViT: Post-training Quantization for Vision Transformers with Twin
Uniform Quantization,
ECCV22(XII:191-207).
Springer DOI
2211
BibRef
Kong, Z.L.[Zheng-Lun],
Dong, P.Y.[Pei-Yan],
Ma, X.L.[Xiao-Long],
Meng, X.[Xin],
Niu, W.[Wei],
Sun, M.S.[Meng-Shu],
Shen, X.[Xuan],
Yuan, G.[Geng],
Ren, B.[Bin],
Tang, H.[Hao],
Qin, M.[Minghai],
Wang, Y.Z.[Yan-Zhi],
SPViT:
Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning,
ECCV22(XI:620-640).
Springer DOI
2211
BibRef
Pan, J.[Junting],
Bulat, A.[Adrian],
Tan, F.[Fuwen],
Zhu, X.T.[Xia-Tian],
Dudziak, L.[Lukasz],
Li, H.S.[Hong-Sheng],
Tzimiropoulos, G.[Georgios],
Martinez, B.[Brais],
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision
Transformers,
ECCV22(XI:294-311).
Springer DOI
2211
BibRef
Xu, R.S.[Run-Sheng],
Xiang, H.[Hao],
Tu, Z.Z.[Zheng-Zhong],
Xia, X.[Xin],
Yang, M.H.[Ming-Hsuan],
Ma, J.Q.[Jia-Qi],
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer,
ECCV22(XXIX:107-124).
Springer DOI
2211
BibRef
Liu, Y.[Yong],
Mai, S.Q.[Si-Qi],
Chen, X.N.[Xiang-Ning],
Hsieh, C.J.[Cho-Jui],
You, Y.[Yang],
Towards Efficient and Scalable Sharpness-Aware Minimization,
CVPR22(12350-12360)
IEEE DOI
2210
WWW Link. Training, Schedules, Scalability, Perturbation methods,
Stochastic processes, Transformers, Minimization,
Vision applications and systems
BibRef
Ren, P.Z.[Peng-Zhen],
Li, C.[Changlin],
Wang, G.[Guangrun],
Xiao, Y.[Yun],
Du, Q.[Qing],
Liang, X.D.[Xiao-Dan],
Chang, X.J.[Xiao-Jun],
Beyond Fixation: Dynamic Window Visual Transformer,
CVPR22(11977-11987)
IEEE DOI
2210
Performance evaluation, Visualization, Systematics,
Computational modeling, Scalability, Transformers,
Deep learning architectures and techniques
BibRef
Liu, Z.[Ze],
Hu, H.[Han],
Lin, Y.T.[Yu-Tong],
Yao, Z.L.[Zhu-Liang],
Xie, Z.D.[Zhen-Da],
Wei, Y.X.[Yi-Xuan],
Ning, J.[Jia],
Cao, Y.[Yue],
Zhang, Z.[Zheng],
Dong, L.[Li],
Wei, F.[Furu],
Guo, B.[Baining],
Swin Transformer V2: Scaling Up Capacity and Resolution,
CVPR22(11999-12009)
IEEE DOI
2210
Training, Representation learning, Adaptation models,
Image resolution, Computational modeling, Semantics,
Representation learning
BibRef
Bhattacharjee, D.[Deblina],
Zhang, T.[Tong],
Süsstrunk, S.[Sabine],
Salzmann, M.[Mathieu],
MuIT: An End-to-End Multitask Learning Transformer,
CVPR22(12021-12031)
IEEE DOI
2210
Heart, Image segmentation, Computational modeling,
Image edge detection, Semantics, Estimation, Predictive models,
Scene analysis and understanding
BibRef
Fang, J.[Jiemin],
Xie, L.X.[Ling-Xi],
Wang, X.G.[Xing-Gang],
Zhang, X.P.[Xiao-Peng],
Liu, W.Y.[Wen-Yu],
Tian, Q.[Qi],
MSG-Transformer:
Exchanging Local Spatial Information by Manipulating Messenger Tokens,
CVPR22(12053-12062)
IEEE DOI
2210
Deep learning, Visualization, Neural networks,
Graphics processing units, retrieval
BibRef
Sandler, M.[Mark],
Zhmoginov, A.[Andrey],
Vladymyrov, M.[Max],
Jackson, A.[Andrew],
Fine-tuning Image Transformers using Learnable Memory,
CVPR22(12145-12154)
IEEE DOI
2210
Deep learning, Adaptation models, Costs, Computational modeling,
Memory management, Transformers, Transfer/low-shot/long-tail learning
BibRef
Yu, X.[Xumin],
Tang, L.[Lulu],
Rao, Y.M.[Yong-Ming],
Huang, T.J.[Tie-Jun],
Zhou, J.[Jie],
Lu, J.W.[Ji-Wen],
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked
Point Modeling,
CVPR22(19291-19300)
IEEE DOI
2210
Point cloud compression, Solid modeling, Computational modeling,
Bit error rate, Transformers, Pattern recognition,
Deep learning architectures and techniques
BibRef
Park, C.[Chunghyun],
Jeong, Y.[Yoonwoo],
Cho, M.[Minsu],
Park, J.[Jaesik],
Fast Point Transformer,
CVPR22(16928-16937)
IEEE DOI
2210
Point cloud compression, Shape, Semantics, Neural networks,
Transformers, grouping and shape analysis
BibRef
Ren, S.[Sucheng],
Zhou, D.[Daquan],
He, S.F.[Sheng-Feng],
Feng, J.S.[Jia-Shi],
Wang, X.C.[Xin-Chao],
Shunted Self-Attention via Multi-Scale Token Aggregation,
CVPR22(10843-10852)
IEEE DOI
2210
Degradation, Deep learning, Costs, Computational modeling, Merging,
Efficient learning and inferences
BibRef
Zeng, W.[Wang],
Jin, S.[Sheng],
Liu, W.T.[Wen-Tao],
Qian, C.[Chen],
Luo, P.[Ping],
Ouyang, W.L.[Wan-Li],
Wang, X.G.[Xiao-Gang],
Not All Tokens Are Equal:
Human-centric Visual Analysis via Token Clustering Transformer,
CVPR22(11091-11101)
IEEE DOI
2210
Visualization, Shape, Pose estimation, Semantics,
Pose estimation and tracking,
Deep learning architectures and techniques
BibRef
Yu, W.H.[Wei-Hao],
Luo, M.[Mi],
Zhou, P.[Pan],
Si, C.Y.[Chen-Yang],
Zhou, Y.C.[Yi-Chen],
Wang, X.C.[Xin-Chao],
Feng, J.S.[Jia-Shi],
Yan, S.C.[Shui-Cheng],
MetaFormer is Actually What You Need for Vision,
CVPR22(10809-10819)
IEEE DOI
2210
Computational modeling, Focusing,
Transformers, Pattern recognition, Task analysis, retrieval
BibRef
Xie, Z.D.[Zhen-Da],
Zhang, Z.[Zheng],
Cao, Y.[Yue],
Lin, Y.T.[Yu-Tong],
Bao, J.M.[Jian-Min],
Yao, Z.L.[Zhu-Liang],
Dai, Q.[Qi],
Hu, H.[Han],
SimMIM: a Simple Framework for Masked Image Modeling,
CVPR22(9643-9653)
IEEE DOI
2210
WWW Link. Representation learning, Training, Head, Self-supervised learning,
Predictive models, Data models, Self- semi- meta- Representation learning
BibRef
Song, Z.[Zikai],
Yu, J.Q.[Jun-Qing],
Chen, Y.P.P.[Yi-Ping Phoebe],
Yang, W.[Wei],
Transformer Tracking with Cyclic Shifting Window Attention,
CVPR22(8781-8790)
IEEE DOI
2210
WWW Link. Visualization, Target tracking, Image recognition,
Optimization methods, Benchmark testing
BibRef
Tu, Z.Z.[Zheng-Zhong],
Talebi, H.[Hossein],
Zhang, H.[Han],
Yang, F.[Feng],
Milanfar, P.[Peyman],
Bovik, A.[Alan],
Li, Y.X.[Yin-Xiao],
MAXIM: Multi-Axis MLP for Image Processing,
CVPR22(5759-5770)
IEEE DOI
2210
WWW Link. Training, Photography, Adaptation models, Visualization,
Computational modeling, Transformers, Low-level vision,
Computational photography
BibRef
Yun, S.[Sukmin],
Lee, H.[Hankook],
Kim, J.[Jaehyung],
Shin, J.[Jinwoo],
Patch-level Representation Learning for Self-supervised Vision
Transformers,
CVPR22(8344-8353)
IEEE DOI
2210
Training, Representation learning, Visualization, Neural networks,
Object detection, Self-supervised learning, Transformers,
Self- semi- meta- unsupervised learning
BibRef
Hou, Z.J.[Ze-Jiang],
Kung, S.Y.[Sun-Yuan],
Multi-Dimensional Vision Transformer Compression via Dependency
Guided Gaussian Process Search,
EVW22(3668-3677)
IEEE DOI
2210
Adaptation models, Image coding, Head, Computational modeling,
Neurons, Gaussian processes, Transformers
BibRef
Salman, H.[Hadi],
Jain, S.[Saachi],
Wong, E.[Eric],
Madry, A.[Aleksander],
Certified Patch Robustness via Smoothed Vision Transformers,
CVPR22(15116-15126)
IEEE DOI
2210
Visualization, Smoothing methods, Costs, Computational modeling,
Transformers, Adversarial attack and defense
BibRef
Wang, Y.K.[Yi-Kai],
Chen, X.H.[Xing-Hao],
Cao, L.[Lele],
Huang, W.B.[Wen-Bing],
Sun, F.C.[Fu-Chun],
Wang, Y.H.[Yun-He],
Multimodal Token Fusion for Vision Transformers,
CVPR22(12176-12185)
IEEE DOI
2210
Point cloud compression, Image segmentation, Shape, Semantics,
Object detection,
Vision+X
BibRef
Tang, Y.[Yehui],
Han, K.[Kai],
Wang, Y.H.[Yun-He],
Xu, C.[Chang],
Guo, J.Y.[Jian-Yuan],
Xu, C.[Chao],
Tao, D.C.[Da-Cheng],
Patch Slimming for Efficient Vision Transformers,
CVPR22(12155-12164)
IEEE DOI
2210
Visualization, Quantization (signal), Computational modeling,
Aggregates, Benchmark testing,
Representation learning
BibRef
Zhang, J.[Jinnian],
Peng, H.[Houwen],
Wu, K.[Kan],
Liu, M.C.[Meng-Chen],
Xiao, B.[Bin],
Fu, J.L.[Jian-Long],
Yuan, L.[Lu],
MiniViT: Compressing Vision Transformers with Weight Multiplexing,
CVPR22(12135-12144)
IEEE DOI
2210
Multiplexing, Performance evaluation, Image coding, Codes,
Computational modeling, Benchmark testing,
Vision applications and systems
BibRef
Chen, J.N.[Jie-Neng],
Sun, S.Y.[Shu-Yang],
He, J.[Ju],
Torr, P.H.S.[Philip H.S.],
Yuille, A.L.[Alan L.],
Bai, S.[Song],
TransMix: Attend to Mix for Vision Transformers,
CVPR22(12125-12134)
IEEE DOI
2210
Training, Image segmentation, Codes, Semantics, Object detection,
Benchmark testing, Transformers,
Representation learning
BibRef
Dong, X.Y.[Xiao-Yi],
Bao, J.M.[Jian-Min],
Chen, D.D.[Dong-Dong],
Zhang, W.M.[Wei-Ming],
Yu, N.H.[Neng-Hai],
Yuan, L.[Lu],
Chen, D.[Dong],
Guo, B.[Baining],
CSWin Transformer: A General Vision Transformer Backbone with
Cross-Shaped Windows,
CVPR22(12114-12124)
IEEE DOI
2210
Image segmentation, Costs, Mathematical analysis, Training data,
Transformer cores, Transformers,
grouping and shape analysis
BibRef
Liu, H.[Hao],
Jiang, X.H.[Xing-Hua],
Li, X.[Xin],
Bao, Z.M.[Zhi-Min],
Jiang, D.Q.[De-Qiang],
Ren, B.[Bo],
NomMer: Nominate Synergistic Context in Vision Transformer for Visual
Recognition,
CVPR22(12063-12072)
IEEE DOI
2210
Visualization, Image segmentation, Semantics, Redundancy,
Object detection, Deep learning architectures and techniques
BibRef
Chen, T.L.[Tian-Long],
Zhang, Z.Y.[Zhen-Yu],
Cheng, Y.[Yu],
Awadallah, A.[Ahmed],
Wang, Z.Y.[Zhang-Yang],
The Principle of Diversity: Training Stronger Vision Transformers
Calls for Reducing All Levels of Redundancy,
CVPR22(12010-12020)
IEEE DOI
2210
Training, Convolutional codes, Deep learning,
Computational modeling, Redundancy, Deep learning architectures and techniques
BibRef
Yang, C.[Chenglin],
Wang, Y.[Yilin],
Zhang, J.M.[Jian-Ming],
Zhang, H.[He],
Wei, Z.J.[Zi-Jun],
Lin, Z.[Zhe],
Yuille, A.L.[Alan L.],
Lite Vision Transformer with Enhanced Self-Attention,
CVPR22(11988-11998)
IEEE DOI
2210
Convolutional codes, Image segmentation, Visualization,
Convolution, Semantics, Merging, Predictive models, Deep learning architectures and techniques
BibRef
Yin, H.X.[Hong-Xu],
Vahdat, A.[Arash],
Alvarez, J.M.[Jose M.],
Mallya, A.[Arun],
Kautz, J.[Jan],
Molchanov, P.[Pavlo],
A-ViT: Adaptive Tokens for Efficient Vision Transformer,
CVPR22(10799-10808)
IEEE DOI
2210
Training, Adaptive systems, Network architecture, Transformers,
Throughput, Hardware, Complexity theory,
Efficient learning and inferences
BibRef
Lu, J.H.[Jia-Hao],
Zhang, X.S.[Xi Sheryl],
Zhao, T.L.[Tian-Li],
He, X.Y.[Xiang-Yu],
Cheng, J.[Jian],
APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers,
CVPR22(10041-10050)
IEEE DOI
2210
Privacy, Data privacy, Federated learning, Computational modeling,
Training data, Transformers, Market research, Privacy and federated learning
BibRef
Hatamizadeh, A.[Ali],
Yin, H.X.[Hong-Xu],
Roth, H.[Holger],
Li, W.Q.[Wen-Qi],
Kautz, J.[Jan],
Xu, D.[Daguang],
Molchanov, P.[Pavlo],
GradViT: Gradient Inversion of Vision Transformers,
CVPR22(10011-10020)
IEEE DOI
2210
Measurement, Differential privacy, Neural networks, Transformers,
Pattern recognition, Security, Iterative methods, Privacy and federated learning
BibRef
Zhang, H.[Haofei],
Duan, J.R.[Jia-Rui],
Xue, M.Q.[Meng-Qi],
Song, J.[Jie],
Sun, L.[Li],
Song, M.L.[Ming-Li],
Bootstrapping ViTs: Towards Liberating Vision Transformers from
Pre-training,
CVPR22(8934-8943)
IEEE DOI
2210
Training, Upper bound, Neural networks, Training data,
Network architecture, Transformers, Computer vision theory,
Efficient learning and inferences
BibRef
Chavan, A.[Arnav],
Shen, Z.Q.[Zhi-Qiang],
Liu, Z.[Zhuang],
Liu, Z.[Zechun],
Cheng, K.T.[Kwang-Ting],
Xing, E.[Eric],
Vision Transformer Slimming:
Multi-Dimension Searching in Continuous Optimization Space,
CVPR22(4921-4931)
IEEE DOI
2210
Training, Performance evaluation, Image coding, Force,
Graphics processing units,
Vision applications and systems
BibRef
Xia, Z.F.[Zhuo-Fan],
Pan, X.[Xuran],
Song, S.[Shiji],
Li, L.E.[Li Erran],
Huang, G.[Gao],
Vision Transformer with Deformable Attention,
CVPR22(4784-4793)
IEEE DOI
2210
Deformable models, Adaptation models, Computational modeling,
Predictive models, Transformers, Data models,
grouping and shape analysis
BibRef
Hong, W.X.[Wei-Xiang],
Lao, J.W.[Jiang-Wei],
Ren, W.[Wang],
Wang, J.[Jian],
Chen, J.D.[Jing-Dong],
Chu, W.[Wei],
Training Object Detectors from Scratch: An Empirical Study in the Era
of Vision Transformer,
CVPR22(4652-4661)
IEEE DOI
2210
Training, Visualization, Semantics, Detectors, Object detection,
Transformers, Recognition: detection, categorization, retrieval, Deep learning architectures and techniques
BibRef
Chen, Z.Y.[Zhao-Yu],
Li, B.[Bo],
Wu, S.[Shuang],
Xu, J.H.[Jiang-He],
Ding, S.H.[Shou-Hong],
Zhang, W.Q.[Wen-Qiang],
Shape Matters: Deformable Patch Attack,
ECCV22(IV:529-548).
Springer DOI
2211
BibRef
Chen, Z.Y.[Zhao-Yu],
Li, B.[Bo],
Xu, J.H.[Jiang-He],
Wu, S.[Shuang],
Ding, S.H.[Shou-Hong],
Zhang, W.Q.[Wen-Qiang],
Towards Practical Certifiable Patch Defense with Vision Transformer,
CVPR22(15127-15137)
IEEE DOI
2210
Smoothing methods, Toy manufacturing industry, Semantics,
Network architecture, Transformers, Robustness,
Adversarial attack and defense
BibRef
Chen, R.J.[Richard J.],
Chen, C.[Chengkuan],
Li, Y.C.[Yi-Cong],
Chen, T.Y.[Tiffany Y.],
Trister, A.D.[Andrew D.],
Krishnan, R.G.[Rahul G.],
Mahmood, F.[Faisal],
Scaling Vision Transformers to Gigapixel Images via Hierarchical
Self-Supervised Learning,
CVPR22(16123-16134)
IEEE DOI
2210
Training, Visualization, Self-supervised learning,
Image representation, Transformers,
Self- semi- meta- unsupervised learning
BibRef
Yang, Z.[Zhao],
Wang, J.Q.[Jia-Qi],
Tang, Y.S.[Yan-Song],
Chen, K.[Kai],
Zhao, H.S.[Heng-Shuang],
Torr, P.H.S.[Philip H.S.],
LAVT: Language-Aware Vision Transformer for Referring Image
Segmentation,
CVPR22(18134-18144)
IEEE DOI
2210
Image segmentation, Visualization, Image coding, Shape, Linguistics,
Transformers, Feature extraction, Segmentation, grouping and shape analysis
BibRef
Scheibenreif, L.[Linus],
Hanna, J.[Joëlle],
Mommert, M.[Michael],
Borth, D.[Damian],
Self-supervised Vision Transformers for Land-cover Segmentation and
Classification,
EarthVision22(1421-1430)
IEEE DOI
2210
Training, Earth, Image segmentation, Computational modeling,
Conferences, Transformers
BibRef
Zhai, X.H.[Xiao-Hua],
Kolesnikov, A.[Alexander],
Houlsby, N.[Neil],
Beyer, L.[Lucas],
Scaling Vision Transformers,
CVPR22(1204-1213)
IEEE DOI
2210
Training, Error analysis, Computational modeling, Neural networks,
Memory management, Training data,
Transfer/low-shot/long-tail learning
BibRef
Guo, J.Y.[Jian-Yuan],
Han, K.[Kai],
Wu, H.[Han],
Tang, Y.[Yehui],
Chen, X.H.[Xing-Hao],
Wang, Y.H.[Yun-He],
Xu, C.[Chang],
CMT: Convolutional Neural Networks Meet Vision Transformers,
CVPR22(12165-12175)
IEEE DOI
2210
Visualization, Image recognition, Force,
Object detection, Transformers,
Representation learning
BibRef
Meng, L.C.[Ling-Chen],
Li, H.D.[Heng-Duo],
Chen, B.C.[Bor-Chun],
Lan, S.Y.[Shi-Yi],
Wu, Z.X.[Zu-Xuan],
Jiang, Y.G.[Yu-Gang],
Lim, S.N.[Ser-Nam],
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition,
CVPR22(12299-12308)
IEEE DOI
2210
Image recognition, Head, Law enforcement, Computational modeling,
Redundancy, Transformers, Efficient learning and inferences,
retrieval
BibRef
Herrmann, C.[Charles],
Sargent, K.[Kyle],
Jiang, L.[Lu],
Zabih, R.[Ramin],
Chang, H.[Huiwen],
Liu, C.[Ce],
Krishnan, D.[Dilip],
Sun, D.Q.[De-Qing],
Pyramid Adversarial Training Improves ViT Performance,
CVPR22(13409-13419)
IEEE DOI
2210
Training, Image recognition, Stochastic processes,
Transformers, Robustness, retrieval,
Recognition: detection
BibRef
Li, C.L.[Chang-Lin],
Zhuang, B.[Bohan],
Wang, G.R.[Guang-Run],
Liang, X.D.[Xiao-Dan],
Chang, X.J.[Xiao-Jun],
Yang, Y.[Yi],
Automated Progressive Learning for Efficient Training of Vision
Transformers,
CVPR22(12476-12486)
IEEE DOI
2210
Training, Adaptation models, Schedules, Computational modeling,
Estimation, Manuals, Transformers, Representation learning
BibRef
Yu, T.[Tong],
Khalitov, R.[Ruslan],
Cheng, L.[Lei],
Yang, Z.R.[Zhi-Rong],
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better
than Dot-Product Self-Attention,
CVPR22(681-690)
IEEE DOI
2210
Protocols, Costs, Scalability, Neural networks, Stacking, Genomics,
Transformers, Deep learning architectures and techniques,
Representation learning
BibRef
Guo, J.Y.[Jian-Yuan],
Tang, Y.H.[Ye-Hui],
Han, K.[Kai],
Chen, X.H.[Xing-Hao],
Wu, H.[Han],
Xu, C.[Chao],
Xu, C.[Chang],
Wang, Y.H.[Yun-He],
Hire-MLP: Vision MLP via Hierarchical Rearrangement,
CVPR22(816-826)
IEEE DOI
2210
Representation learning, Image segmentation, Semantics,
Object detection, Transformers,
Representation learning
BibRef
Cheng, B.[Bowen],
Misra, I.[Ishan],
Schwing, A.G.[Alexander G.],
Kirillov, A.[Alexander],
Girdhar, R.[Rohit],
Masked-attention Mask Transformer for Universal Image Segmentation,
CVPR22(1280-1289)
IEEE DOI
2210
Image segmentation, Shape, Computational modeling, Semantics,
Transformers, Feature extraction,
retrieval
BibRef
Pu, M.Y.[Meng-Yang],
Huang, Y.P.[Ya-Ping],
Liu, Y.M.[Yu-Ming],
Guan, Q.J.[Qing-Ji],
Ling, H.B.[Hai-Bin],
EDTER: Edge Detection with Transformer,
CVPR22(1392-1402)
IEEE DOI
2210
Head, Image edge detection, Semantics, Detectors, Transformers,
Feature extraction, Segmentation, grouping and shape analysis,
Scene analysis and understanding
BibRef
Rangrej, S.B.[Samrudhdhi B.],
Srinidhi, C.L.[Chetan L.],
Clark, J.J.[James J.],
Consistency driven Sequential Transformers Attention Model for
Partially Observable Scenes,
CVPR22(2508-2517)
IEEE DOI
2210
Training, Computational modeling, Imaging, Predictive models,
Transformers, Prediction algorithms, Visual reasoning
BibRef
Zhu, R.[Rui],
Li, Z.Q.[Zheng-Qin],
Matai, J.[Janarbek],
Porikli, F.M.[Fatih M.],
Chandraker, M.[Manmohan],
IRISformer: Dense Vision Transformers for Single-Image Inverse
Rendering in Indoor Scenes,
CVPR22(2812-2821)
IEEE DOI
2210
Photorealism, Shape, Computational modeling, Lighting,
Transformers,
Physics-based vision and shape-from-X
BibRef
Ermolov, A.[Aleksandr],
Mirvakhabova, L.[Leyla],
Khrulkov, V.[Valentin],
Sebe, N.[Nicu],
Oseledets, I.[Ivan],
Hyperbolic Vision Transformers: Combining Improvements in Metric
Learning,
CVPR22(7399-7409)
IEEE DOI
2210
Measurement, Geometry, Visualization, Semantics,
Self-supervised learning, Transformer cores, Transformers,
Representation learning
BibRef
Lee, Y.[Youngwan],
Kim, J.[Jonghee],
Willette, J.[Jeffrey],
Hwang, S.J.[Sung Ju],
MPViT: Multi-Path Vision Transformer for Dense Prediction,
CVPR22(7277-7286)
IEEE DOI
2210
Image segmentation, Semantics, Object detection, Transformers,
Feature extraction, Pattern recognition, Recognition: detection,
Representation learning
BibRef
Zhang, C.Z.[Chong-Zhi],
Zhang, M.Y.[Ming-Yuan],
Zhang, S.H.[Shang-Hang],
Jin, D.S.[Dai-Sheng],
Zhou, Q.[Qiang],
Cai, Z.A.[Zhong-Ang],
Zhao, H.[Haiyu],
Liu, X.L.[Xiang-Long],
Liu, Z.[Ziwei],
Delving Deep into the Generalization of Vision Transformers under
Distribution Shifts,
CVPR22(7267-7276)
IEEE DOI
2210
Training, Representation learning, Systematics, Shape, Taxonomy,
Self-supervised learning, Transformers, Recognition: detection,
Representation learning
BibRef
Hou, Z.[Zhi],
Yu, B.[Baosheng],
Tao, D.C.[Da-Cheng],
BatchFormer: Learning to Explore Sample Relationships for Robust
Representation Learning,
CVPR22(7246-7256)
IEEE DOI
2210
Training, Deep learning, Representation learning, Neural networks,
Tail, Transformers, Transfer/low-shot/long-tail learning,
Self- semi- meta- unsupervised learning
BibRef
Zamir, S.W.[Syed Waqas],
Arora, A.[Aditya],
Khan, S.[Salman],
Hayat, M.[Munawar],
Khan, F.S.[Fahad Shahbaz],
Yang, M.H.[Ming-Hsuan],
Restormer: Efficient Transformer for High-Resolution Image
Restoration,
CVPR22(5718-5729)
IEEE DOI
2210
Computational modeling, Transformer cores,
Transformers, Data models, Image restoration, Task analysis,
Deep learning architectures and techniques
BibRef
Zhao, H.S.[Heng-Shuang],
Jiang, L.[Li],
Jia, J.Y.[Jia-Ya],
Torr, P.H.S.[Philip H.S.],
Koltun, V.[Vladlen],
Point Transformer,
ICCV21(16239-16248)
IEEE DOI
2203
Point cloud compression, Measurement, Image segmentation,
Semantics, Object detection, Transformer cores,
Recognition and classification
BibRef
Lin, K.[Kevin],
Wang, L.J.[Li-Juan],
Liu, Z.C.[Zi-Cheng],
Mesh Graphormer,
ICCV21(12919-12928)
IEEE DOI
2203
Convolutional codes, Solid modeling, Network topology,
Transformers, Gestures and body pose
BibRef
Casey, E.[Evan],
Pérez, V.[Víctor],
Li, Z.[Zhuoru],
The Animation Transformer: Visual Correspondence via Segment Matching,
ICCV21(11303-11312)
IEEE DOI
2203
Visualization, Image segmentation, Image color analysis,
Production, Animation, Transformers,
grouping and shape
BibRef
Reizenstein, J.[Jeremy],
Shapovalov, R.[Roman],
Henzler, P.[Philipp],
Sbordone, L.[Luca],
Labatut, P.[Patrick],
Novotny, D.[David],
Common Objects in 3D: Large-Scale Learning and Evaluation of
Real-life 3D Category Reconstruction,
ICCV21(10881-10891)
IEEE DOI
2203
Award, Marr Prize, HM. Point cloud compression, Transformers,
Rendering (computer graphics), Cameras, Image reconstruction,
3D from multiview and other sensors
BibRef
Mariotti, O.[Octave],
Aodha, O.M.[Oisin Mac],
Bilen, H.[Hakan],
ViewNet: Unsupervised Viewpoint Estimation from Conditional
Generation,
ICCV21(10398-10408)
IEEE DOI
2203
Training, Annotations, Estimation, Benchmark testing, Transformers,
Representation learning, Transfer/Low-shot/Semi/Unsupervised Learning
BibRef
Feng, W.X.[Wei-Xin],
Wang, Y.J.[Yuan-Jiang],
Ma, L.H.[Li-Hua],
Yuan, Y.[Ye],
Zhang, C.[Chi],
Temporal Knowledge Consistency for Unsupervised Visual Representation
Learning,
ICCV21(10150-10160)
IEEE DOI
2203
Training, Representation learning, Visualization, Protocols,
Object detection, Semisupervised learning, Transformers,
Transfer/Low-shot/Semi/Unsupervised Learning
BibRef
Wu, H.P.[Hai-Ping],
Xiao, B.[Bin],
Codella, N.[Noel],
Liu, M.C.[Meng-Chen],
Dai, X.Y.[Xi-Yang],
Yuan, L.[Lu],
Zhang, L.[Lei],
CvT: Introducing Convolutions to Vision Transformers,
ICCV21(22-31)
IEEE DOI
2203
Code, Vision Transformer.
WWW Link. Convolutional codes, Image resolution, Image recognition,
Performance gain, Transformers, Distortion,
BibRef
Touvron, H.[Hugo],
Cord, M.[Matthieu],
Sablayrolles, A.[Alexandre],
Synnaeve, G.[Gabriel],
Jégou, H.[Hervé],
Going deeper with Image Transformers,
ICCV21(32-42)
IEEE DOI
2203
Training, Neural networks, Training data,
Data models, Circuit faults, Recognition and classification,
Optimization and learning methods
BibRef
Zhao, J.W.[Jia-Wei],
Yan, K.[Ke],
Zhao, Y.F.[Yi-Fan],
Guo, X.W.[Xiao-Wei],
Huang, F.Y.[Fei-Yue],
Li, J.[Jia],
Transformer-based Dual Relation Graph for Multi-label Image
Recognition,
ICCV21(163-172)
IEEE DOI
2203
Image recognition, Correlation, Computational modeling, Semantics,
Benchmark testing,
Representation learning
BibRef
Chen, C.F.R.[Chun-Fu Richard],
Fan, Q.F.[Quan-Fu],
Panda, R.[Rameswar],
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
Classification,
ICCV21(347-356)
IEEE DOI
2203
Image segmentation, Image recognition, Computational modeling,
Semantics, Memory management, Object detection,
Representation learning
BibRef
Pan, Z.Z.[Zi-Zheng],
Zhuang, B.[Bohan],
Liu, J.[Jing],
He, H.Y.[Hao-Yu],
Cai, J.F.[Jian-Fei],
Scalable Vision Transformers with Hierarchical Pooling,
ICCV21(367-376)
IEEE DOI
2203
Visualization, Image recognition, Computational modeling,
Scalability, Transformers, Computational efficiency,
Efficient training and inference methods
BibRef
Chefer, H.[Hila],
Gur, S.[Shir],
Wolf, L.B.[Lior B.],
Generic Attention-model Explainability for Interpreting Bi-Modal and
Encoder-Decoder Transformers,
ICCV21(387-396)
IEEE DOI
2203
Measurement, Visualization, Image segmentation,
Computational modeling, Object detection,
BibRef
Yuan, L.[Li],
Chen, Y.P.[Yun-Peng],
Wang, T.[Tao],
Yu, W.H.[Wei-Hao],
Shi, Y.J.[Yu-Jun],
Jiang, Z.H.[Zi-Hang],
Tay, F.E.H.[Francis E. H.],
Feng, J.S.[Jia-Shi],
Yan, S.C.[Shui-Cheng],
Tokens-to-Token ViT:
Training Vision Transformers from Scratch on ImageNet,
ICCV21(538-547)
IEEE DOI
2203
Training, Image resolution, Computational modeling,
Image edge detection, Transformers,
BibRef
Wu, B.[Bichen],
Xu, C.F.[Chen-Feng],
Dai, X.L.[Xiao-Liang],
Wan, A.[Alvin],
Zhang, P.Z.[Pei-Zhao],
Yan, Z.C.[Zhi-Cheng],
Tomizuka, M.[Masayoshi],
Gonzalez, J.[Joseph],
Keutzer, K.[Kurt],
Vajda, P.[Peter],
Visual Transformers: Where Do Transformers Really Belong in Vision
Models?,
ICCV21(579-589)
IEEE DOI
2203
Training, Visualization, Image segmentation, Lips,
Computational modeling, Semantics,
Vision applications and systems
BibRef
Hu, R.H.[Rong-Hang],
Singh, A.[Amanpreet],
UniT: Multimodal Multitask Learning with a Unified Transformer,
ICCV21(1419-1429)
IEEE DOI
2203
Training, Natural languages,
Object detection, Predictive models, Transformers, Multitasking,
Representation learning
BibRef
Qiu, Y.[Yue],
Yamamoto, S.[Shintaro],
Nakashima, K.[Kodai],
Suzuki, R.[Ryota],
Iwata, K.[Kenji],
Kataoka, H.[Hirokatsu],
Satoh, Y.[Yutaka],
Describing and Localizing Multiple Changes with Transformers,
ICCV21(1951-1960)
IEEE DOI
2203
Measurement, Location awareness, Codes, Natural languages,
Benchmark testing, Transformers,
Vision applications and systems
BibRef
Song, M.[Myungseo],
Choi, J.[Jinyoung],
Han, B.H.[Bo-Hyung],
Variable-Rate Deep Image Compression through Spatially-Adaptive
Feature Transform,
ICCV21(2360-2369)
IEEE DOI
2203
Training, Image coding, Neural networks, Rate-distortion, Transforms,
Network architecture, Computational photography,
Low-level and physics-based vision
BibRef
Shenga, H.[Hualian],
Cai, S.[Sijia],
Liu, Y.[Yuan],
Deng, B.[Bing],
Huang, J.Q.[Jian-Qiang],
Hua, X.S.[Xian-Sheng],
Zhao, M.J.[Min-Jian],
Improving 3D Object Detection with Channel-wise Transformer,
ICCV21(2723-2732)
IEEE DOI
2203
Point cloud compression, Object detection, Detectors, Transforms,
Transformers, Encoding, Detection and localization in 2D and 3D,
BibRef
Zhang, P.C.[Peng-Chuan],
Dai, X.[Xiyang],
Yang, J.W.[Jian-Wei],
Xiao, B.[Bin],
Yuan, L.[Lu],
Zhang, L.[Lei],
Gao, J.F.[Jian-Feng],
Multi-Scale Vision Longformer: A New Vision Transformer for
High-Resolution Image Encoding,
ICCV21(2978-2988)
IEEE DOI
2203
Image segmentation, Image coding, Computational modeling,
Memory management, Object detection, Transformers,
Representation learning
BibRef
Dong, Q.[Qi],
Tu, Z.W.[Zhuo-Wen],
Liao, H.[Haofu],
Zhang, Y.T.[Yu-Ting],
Mahadevan, V.[Vijay],
Soatto, S.[Stefano],
Visual Relationship Detection Using Part-and-Sum Transformers with
Composite Queries,
ICCV21(3530-3539)
IEEE DOI
2203
Visualization, Detectors, Transformers, Task analysis, Standards,
Detection and localization in 2D and 3D,
Representation learning
BibRef
Fan, H.Q.[Hao-Qi],
Xiong, B.[Bo],
Mangalam, K.[Karttikeya],
Li, Y.[Yanghao],
Yan, Z.C.[Zhi-Cheng],
Malik, J.[Jitendra],
Feichtenhofer, C.[Christoph],
Multiscale Vision Transformers,
ICCV21(6804-6815)
IEEE DOI
2203
Visualization, Image recognition, Codes, Computational modeling,
Transformers, Complexity theory,
Recognition and classification
BibRef
Mahmood, K.[Kaleel],
Mahmood, R.[Rigel],
van Dijk, M.[Marten],
On the Robustness of Vision Transformers to Adversarial Examples,
ICCV21(7818-7827)
IEEE DOI
2203
Transformers, Robustness,
Adversarial machine learning, Security,
Machine learning architectures and formulations
BibRef
Chen, X.L.[Xin-Lei],
Xie, S.[Saining],
He, K.[Kaiming],
An Empirical Study of Training Self-Supervised Vision Transformers,
ICCV21(9620-9629)
IEEE DOI
2203
Training, Benchmark testing, Transformers, Standards,
Representation learning, Recognition and classification, Transfer/Low-shot/Semi/Unsupervised Learning
BibRef
Caron, M.[Mathilde],
Touvron, H.[Hugo],
Misra, I.[Ishan],
Jegou, H.[Hervé],
Mairal, J.[Julien],
Bojanowski, P.[Piotr],
Joulin, A.[Armand],
Emerging Properties in Self-Supervised Vision Transformers,
ICCV21(9630-9640)
IEEE DOI
2203
Training, Image segmentation, Semantics, Layout, Image retrieval,
Representation learning,
Transfer/Low-shot/Semi/Unsupervised Learning
BibRef
Yuan, Y.[Ye],
Weng, X.[Xinshuo],
Ou, Y.[Yanglan],
Kitani, K.[Kris],
AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent
Forecasting,
ICCV21(9793-9803)
IEEE DOI
2203
Uncertainty, Stochastic processes, Predictive models, Transformers,
Encoding, Trajectory, Motion and tracking,
Vision for robotics and autonomous vehicles
BibRef
Xu, W.J.[Wei-Jian],
Xu, Y.F.[Yi-Fan],
Chang, T.[Tyler],
Tu, Z.W.[Zhuo-Wen],
Co-Scale Conv-Attentional Image Transformers,
ICCV21(9961-9970)
IEEE DOI
2203
Image segmentation, Computational modeling, Object detection,
Transformers, Convolutional neural networks, Task analysis,
Recognition and classification
BibRef
Wu, K.[Kan],
Peng, H.W.[Hou-Wen],
Chen, M.H.[Ming-Hao],
Fu, J.L.[Jian-Long],
Chao, H.Y.[Hong-Yang],
Rethinking and Improving Relative Position Encoding for Vision
Transformer,
ICCV21(10013-10021)
IEEE DOI
2203
Image coding, Codes, Computational modeling, Transformers, Encoding,
Natural language processing, Datasets and evaluation,
Recognition and classification
BibRef
Bhojanapalli, S.[Srinadh],
Chakrabarti, A.[Ayan],
Glasner, D.[Daniel],
Li, D.[Daliang],
Unterthiner, T.[Thomas],
Veit, A.[Andreas],
Understanding Robustness of Transformers for Image Classification,
ICCV21(10211-10221)
IEEE DOI
2203
Perturbation methods, Transformers,
Robustness, Data models, Convolutional neural networks,
Recognition and classification
BibRef
Yan, B.[Bin],
Peng, H.[Houwen],
Fu, J.L.[Jian-Long],
Wang, D.[Dong],
Lu, H.C.[Hu-Chuan],
Learning Spatio-Temporal Transformer for Visual Tracking,
ICCV21(10428-10437)
IEEE DOI
2203
Visualization, Target tracking, Smoothing methods, Pipelines,
Benchmark testing, Transformers,
BibRef
Heo, B.[Byeongho],
Yun, S.[Sangdoo],
Han, D.Y.[Dong-Yoon],
Chun, S.[Sanghyuk],
Choe, J.[Junsuk],
Oh, S.J.[Seong Joon],
Rethinking Spatial Dimensions of Vision Transformers,
ICCV21(11916-11925)
IEEE DOI
2203
Dimensionality reduction, Computational modeling,
Object detection, Transformers, Robustness,
Recognition and classification
BibRef
Voskou, A.[Andreas],
Panousis, K.P.[Konstantinos P.],
Kosmopoulos, D.[Dimitrios],
Metaxas, D.N.[Dimitris N.],
Chatzis, S.[Sotirios],
Stochastic Transformer Networks with Linear Competing Units:
Application to end-to-end SL Translation,
ICCV21(11926-11935)
IEEE DOI
2203
Training, Memory management, Stochastic processes,
Gesture recognition, Benchmark testing, Assistive technologies,
BibRef
Ranftl, R.[René],
Bochkovskiy, A.[Alexey],
Koltun, V.[Vladlen],
Vision Transformers for Dense Prediction,
ICCV21(12159-12168)
IEEE DOI
2203
Image resolution, Semantics, Neural networks, Estimation,
Training data,
grouping and shape
BibRef
Chen, M.H.[Ming-Hao],
Peng, H.W.[Hou-Wen],
Fu, J.L.[Jian-Long],
Ling, H.B.[Hai-Bin],
AutoFormer: Searching Transformers for Visual Recognition,
ICCV21(12250-12260)
IEEE DOI
2203
Training, Convolutional codes, Visualization, Head, Search methods,
Manuals,
Recognition and classification
BibRef
Yang, G.L.[Guang-Lei],
Tang, H.[Hao],
Ding, M.L.[Ming-Li],
Sebe, N.[Nicu],
Ricci, E.[Elisa],
Transformer-Based Attention Networks for Continuous Pixel-Wise
Prediction,
ICCV21(16249-16259)
IEEE DOI
2203
Correlation, Estimation, Logic gates,
Transformers, Natural language processing,
Vision applications and systems
BibRef
Yuan, K.[Kun],
Guo, S.P.[Shao-Peng],
Liu, Z.[Ziwei],
Zhou, A.[Aojun],
Yu, F.W.[Feng-Wei],
Wu, W.[Wei],
Incorporating Convolution Designs into Visual Transformers,
ICCV21(559-568)
IEEE DOI
2203
Training, Visualization, Costs, Convolution, Training data,
Transformers, Feature extraction, Recognition and classification,
Efficient training and inference methods
BibRef
Chen, Z.[Zhengsu],
Xie, L.X.[Ling-Xi],
Niu, J.W.[Jian-Wei],
Liu, X.F.[Xue-Feng],
Wei, L.[Longhui],
Tian, Q.[Qi],
Visformer: The Vision-friendly Transformer,
ICCV21(569-578)
IEEE DOI
2203
Convolutional codes, Training, Visualization, Protocols,
Computational modeling, Fitting, Recognition and classification,
Representation learning
BibRef
Wang, W.[Wenhai],
Xie, E.[Enze],
Li, X.[Xiang],
Fan, D.P.[Deng-Ping],
Song, K.[Kaitao],
Liang, D.[Ding],
Lu, T.[Tong],
Luo, P.[Ping],
Shao, L.[Ling],
Pyramid Vision Transformer:
A Versatile Backbone for Dense Prediction without Convolutions,
ICCV21(548-558)
IEEE DOI
2203
Image resolution, Costs, Semantics, Object detection, Transformers,
Feature extraction, Recognition and classification,
grouping and shape
BibRef
Yao, Z.L.[Zhu-Liang],
Cao, Y.[Yue],
Lin, Y.T.[Yu-Tong],
Liu, Z.[Ze],
Zhang, Z.[Zheng],
Hu, H.[Han],
Leveraging Batch Normalization for Vision Transformers,
NeruArch21(413-422)
IEEE DOI
2112
Training, Transformers, Feeds
BibRef
Kim, K.[Kyungmin],
Wu, B.C.[Bi-Chen],
Dai, X.L.[Xiao-Liang],
Zhang, P.Z.[Pei-Zhao],
Yan, Z.C.[Zhi-Cheng],
Vajda, P.[Peter],
Kim, S.[Seon],
Rethinking the Self-Attention in Vision Transformers,
ECV21(3065-3069)
IEEE DOI
2109
Computational modeling, Pattern recognition
BibRef
Zhang, Z.X.[Zi-Xiao],
Lu, X.Q.[Xiao-Qiang],
Cao, G.J.[Guo-Jin],
Yang, Y.T.[Yu-Ting],
Jiao, L.C.[Li-Cheng],
Liu, F.[Fang],
ViT-YOLO: Transformer-Based YOLO for Object Detection,
VisDrone21(2799-2808)
IEEE DOI
2112
Semantics, Detectors, Object detection,
Feature extraction, Robustness
BibRef
Kong, D.[Daehyeon],
Kong, K.[Kyeongbo],
Kim, K.[Kyunghun],
Min, S.J.[Sung-Jun],
Kang, S.J.[Suk-Ju],
Image-Adaptive Hint Generation via Vision Transformer for Outpainting,
WACV22(4029-4038)
IEEE DOI
2202
Image synthesis, Neural networks,
Complex networks, Benchmark testing, Transformers,
Vision Systems and Applications
BibRef
Graham, B.[Ben],
El-Nouby, A.[Alaaeldin],
Touvron, H.[Hugo],
Stock, P.[Pierre],
Joulin, A.[Armand],
Jégou, H.[Hervé],
Douze, M.[Matthijs],
LeViT: a Vision Transformer in ConvNet's Clothing for Faster
Inference,
ICCV21(12239-12249)
IEEE DOI
2203
Training, Image resolution, Neural networks,
Parallel processing, Transformers, Feature extraction,
Representation learning
BibRef
Horváth, J.[János],
Baireddy, S.[Sriram],
Hao, H.X.[Han-Xiang],
Montserrat, D.M.[Daniel Mas],
Delp, E.J.[Edward J.],
Manipulation Detection in Satellite Images Using Vision Transformer,
WMF21(1032-1041)
IEEE DOI
2109
BibRef
Earlier: A1, A4, A3, A5, Only:
Manipulation Detection in Satellite Images Using Deep Belief Networks,
WMF20(2832-2840)
IEEE DOI
2008
Image sensors, Satellites, Splicing, Forestry, Tools.
Satellites, Image reconstruction, Training, Forgery,
Heating systems, Feature extraction
BibRef
Beal, J.[Josh],
Wu, H.Y.[Hao-Yu],
Park, D.H.[Dong Huk],
Zhai, A.[Andrew],
Kislyuk, D.[Dmitry],
Billion-Scale Pretraining with Vision Transformers for Multi-Task
Visual Representations,
WACV22(1431-1440)
IEEE DOI
2202
Visualization, Solid modeling, Systematics,
Computational modeling, Transformers,
Semi- and Un- supervised Learning
BibRef
Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Video Transformers .