14.5.10.6.2 Video Transformers

Chapter Contents (Back)
Video Transformers. Transformers.

Selva, J.[Javier], Johansen, A.S.[Anders S.], Escalera, S.[Sergio], Nasrollahi, K.[Kamal], Moeslund, T.B.[Thomas B.], Clapés, A.[Albert],
Video Transformers: A Survey,
PAMI(45), No. 11, November 2023, pp. 12922-12943.
IEEE DOI 2310
Survey, Video Transformers. BibRef

Zhang, Z.C.[Zi-Chao], Chen, Z.D.[Zhen-Duo], Wang, Y.X.[Yong-Xin], Luo, X.[Xin], Xu, X.S.[Xin-Shun],
A vision transformer for fine-grained classification by reducing noise and enhancing discriminative information,
PR(145), 2024, pp. 109979.
Elsevier DOI Code:
WWW Link. 2311
Vision transformer, Complementary information integration, Region attention, Fine-grained image recognition BibRef

Xian, K.[Ke], Peng, J.[Juewen], Cao, Z.G.[Zhi-Guo], Zhang, J.M.[Jian-Ming], Lin, G.S.[Guo-Sheng],
ViTA: Video Transformer Adaptor for Robust Video Depth Estimation,
MultMed(26), 2024, pp. 3302-3316.
IEEE DOI 2402
Transformers, Optical flow, Estimation, Training, Streaming media, Optical losses, Bidirectional control, spatio-temporal consistency loss BibRef


Piergiovanni, A.[AJ], Kuo, W.C.[Wei-Cheng], Angelova, A.[Anelia],
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning,
CVPR23(2214-2224)
IEEE DOI 2309
BibRef

Park, J.[Jungin], Lee, J.Y.[Ji-Young], Sohn, K.H.[Kwang-Hoon],
Dual-Path Adaptation from Image to Video Transformers,
CVPR23(2203-2213)
IEEE DOI 2309
BibRef

Karim, R.[Rezaul], Zhao, H.[He], Wildes, R.P.[Richard P.], Siam, M.[Mennatullah],
MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation,
CVPR23(6323-6333)
IEEE DOI 2309
BibRef

Yu, L.J.[Li-Jun], Cheng, Y.[Yong], Sohn, K.[Kihyuk], Lezama, J.[José], Zhang, H.[Han], Chang, H.[Huiwen], Hauptmann, A.G.[Alexander G.], Yang, M.H.[Ming-Hsuan], Hao, Y.[Yuan], Essa, I.[Irfan], Jiang, L.[Lu],
MAGVIT: Masked Generative Video Transformer,
CVPR23(10459-10469)
IEEE DOI 2309
BibRef

Xing, Z.[Zhen], Dai, Q.[Qi], Hu, H.[Han], Chen, J.J.[Jing-Jing], Wu, Z.[Zuxuan], Jiang, Y.G.[Yu-Gang],
SVFormer: Semi-supervised Video Transformer for Action Recognition,
CVPR23(18816-18826)
IEEE DOI 2309
BibRef

Xie, F.[Fei], Chu, L.[Lei], Li, J.H.[Jia-Hao], Lu, Y.[Yan], Ma, C.[Chao],
VideoTrack: Learning to Track Objects via Video Transformer,
CVPR23(22826-22835)
IEEE DOI 2309
BibRef

Qiu, Z.W.[Zhong-Wei], Yang, Q.S.[Qian-Sheng], Wang, J.[Jian], Feng, H.C.[Hao-Cheng], Han, J.Y.[Jun-Yu], Ding, E.[Errui], Xu, C.[Chang], Fu, D.M.[Dong-Mei], Wang, J.D.[Jing-Dong],
PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation with Progressive Video Transformers,
CVPR23(21254-21263)
IEEE DOI 2309
BibRef

Yang, J.[Jing], Chen, J.W.[Jun-Wen], Yanai, K.[Keiji],
Transformer-based Cross-modal Recipe Embeddings with Large Batch Training,
MMMod23(II: 471-482).
Springer DOI 2304
BibRef

Huang, K.W.[Kuan-Wei], Chen, G.C.F.[Geoff Chih-Fan], Chang, P.W.[Po-Wen], Lin, S.C.[Sheng-Chieh], Hsu, C.[ChiaJung], Thengane, V.[Vishal], Lin, J.Y.Y.[Joshua Yao-Yu],
Strong Gravitational Lensing Parameter Estimation with Vision Transformer,
AI4Space22(143-153).
Springer DOI 2304
BibRef

Zheng, M.[Minyan], Luo, J.P.[Jian-Ping],
Space-time Video Super-resolution 3d Transformer,
MMMod23(II: 374-385).
Springer DOI 2304
BibRef

Ye, X.[Xi], Bilodeau, G.A.[Guillaume-Alexandre],
VPTR: Efficient Transformers for Video Prediction,
ICPR22(3492-3499)
IEEE DOI 2212
Representation learning, Source coding, Predictive models, Transformers, Spatiotemporal phenomena, Task analysis BibRef

Liang, Y.X.[Yu-Xuan], Zhou, P.[Pan], Zimmermann, R.[Roger], Yan, S.C.[Shui-Cheng],
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition,
ECCV22(XXXIV:577-595).
Springer DOI 2211

WWW Link. Transformer architecture to address computational costs. BibRef

Wang, J.[Junke], Yang, X.T.[Xi-Tong], Li, H.D.[Heng-Duo], Liu, L.[Li], Wu, Z.X.[Zu-Xuan], Jiang, Y.G.[Yu-Gang],
Efficient Video Transformers with Spatial-Temporal Token Selection,
ECCV22(XXXV:69-86).
Springer DOI 2211
BibRef

Yuan, J.[Jing], Barmpoutis, P.[Panagiotis], Stathaki, T.[Tania],
Multi-Scale Deformable Transformer Encoder Based Single-Stage Pedestrian Detection,
ICIP22(2906-2910)
IEEE DOI 2211
Head, Detectors, Prediction methods, Feature extraction, Transformers, Video surveillance, Robustness, Pedestrian detection, vision transformer BibRef

Yun, H.[Heeseung], Lee, S.[Sehun], Kim, G.[Gunhee],
Panoramic Vision Transformer for Saliency Detection in 360° Videos,
ECCV22(XXXV:422-439).
Springer DOI 2211
BibRef

Sun, G.X.[Guan-Xiong], Hua, Y.[Yang], Hu, G.S.[Guo-Sheng], Robertson, N.[Neil],
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks,
ECCV22(XXXV:285-301).
Springer DOI 2211
BibRef

Wang, Y.H.[Yong-Hua], Zhang, J.C.[Jing-Chi], Li, Z.G.[Zhen-Gang], Zeng, X.[Xing], Zhang, Z.[Zhen], Zhang, D.[Diankai], Long, Y.[Yunlin], Wang, N.[Ning],
Neural Network-based In-Loop Filter for CLIC 2022,
CLIC22(1773-1776)
IEEE DOI 2210
Deep learning, Video coding, Image coding, Convolution, Rate-distortion, Network architecture, Video compression BibRef

Chang, H.W.[Hui-Wen], Zhang, H.[Han], Jiang, L.[Lu], Liu, C.[Ce], Freeman, W.T.[William T.],
MaskGIT: Masked Generative Image Transformer,
CVPR22(11305-11315)
IEEE DOI 2210
Training, Visualization, Image synthesis, Computational modeling, Transformers, Decoding, Image and video synthesis and generation, Self- semi- meta- unsupervised learning BibRef

Herzig, R.[Roei], Ben-Avraham, E.[Elad], Mangalam, K.[Karttikeya], Bar, A.[Amir], Chechik, G.[Gal], Rohrbach, A.[Anna], Darrell, T.J.[Trevor J.], Globerson, A.[Amir],
Object-Region Video Transformers,
CVPR22(3138-3149)
IEEE DOI 2210
Visualization, Fuses, Computational modeling, Transformers, Trajectory, Action and event recognition BibRef

Wang, R.[Rui], Chen, D.D.[Dong-Dong], Wu, Z.[Zuxuan], Chen, Y.P.[Yin-Peng], Dai, X.[Xiyang], Liu, M.C.[Meng-Chen], Jiang, Y.G.[Yu-Gang], Zhou, L.[Luowei], Yuan, L.[Lu],
BEVT: BERT Pretraining of Video Transformers,
CVPR22(14713-14723)
IEEE DOI 2210
Representation learning, Codes, Computational modeling, Bit error rate, Transformers, Data models, Video analysis and understanding BibRef

Wu, C.Y.[Chao-Yuan], Li, Y.[Yanghao], Mangalam, K.[Karttikeya], Fan, H.Q.[Hao-Qi], Xiong, B.[Bo], Malik, J.[Jitendra], Feichtenhofer, C.[Christoph],
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition,
CVPR22(13577-13587)
IEEE DOI 2210
Visualization, Costs, Codes, Computational modeling, Cache memory, Action and event recognition, Video analysis and understanding BibRef

Mangalam, K.[Karttikeya], Fan, H.Q.[Hao-Qi], Li, Y.[Yanghao], Wu, C.Y.[Chao-Yuan], Xiong, B.[Bo], Feichtenhofer, C.[Christoph], Malik, J.[Jitendra],
Reversible Vision Transformers,
CVPR22(10820-10830)
IEEE DOI 2210
Training, Adaptation models, Visualization, Computational modeling, Memory management, Object detection, Transformers, Video analysis and understanding BibRef

Li, Y.[Yanghao], Wu, C.Y.[Chao-Yuan], Fan, H.Q.[Hao-Qi], Mangalam, K.[Karttikeya], Xiong, B.[Bo], Malik, J.[Jitendra], Feichtenhofer, C.[Christoph],
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection,
CVPR22(4794-4804)
IEEE DOI 2210
Representation learning, Visualization, Image segmentation, Image recognition, Object detection, Video analysis and understanding BibRef

Ranasinghe, K.[Kanchana], Naseer, M.[Muzammal], Khan, S.[Salman], Khan, F.S.[Fahad Shahbaz], Ryoo, M.S.[Michael S.],
Self-supervised Video Transformer,
CVPR22(2864-2874)
IEEE DOI 2210
Training, Video sequences, Benchmark testing, Transformers, Encoding, Spatiotemporal phenomena, Video analysis and understanding, Self- semi- meta- unsupervised learning BibRef

Yang, S.[Shusheng], Wang, X.G.[Xing-Gang], Li, Y.[Yu], Fang, Y.X.[Yu-Xin], Fang, J.[Jiemin], Liu, W.Y.[Wen-Yu], Zhao, X.[Xun], Shan, Y.[Ying],
Temporally Efficient Vision Transformer for Video Instance Segmentation,
CVPR22(2875-2885)
IEEE DOI 2210
Image segmentation, Visualization, Head, Shape, Transformers, Spatiotemporal phenomena, Computational efficiency, grouping and shape analysis BibRef

Liu, Z.[Ze], Ning, J.[Jia], Cao, Y.[Yue], Wei, Y.X.[Yi-Xuan], Zhang, Z.[Zheng], Lin, S.[Stephen], Hu, H.[Han],
Video Swin Transformer,
CVPR22(3192-3201)
IEEE DOI 2210
Adaptation models, Image recognition, Computational modeling, Benchmark testing, Transformers, Solids, Recognition: detection BibRef

Yan, S.[Shen], Xiong, X.[Xuehan], Arnab, A.[Anurag], Lu, Z.C.[Zhi-Chao], Zhang, M.[Mi], Sun, C.[Chen], Schmid, C.[Cordelia],
Multiview Transformers for Video Recognition,
CVPR22(3323-3333)
IEEE DOI 2210
Fuses, Computational modeling, Transformers, Cognition, Spatiotemporal phenomena, Action and event recognition BibRef

Shao, R.Z.[Rui-Zhi], Wu, G.[Gaochang], Zhou, Y.M.[Yue-Mei], Fu, Y.[Ying], Fang, L.[Lu], Liu, Y.B.[Ye-Bin],
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation,
ICCV21(14870-14879)
IEEE DOI 2203
Photography, Superresolution, Estimation, Transformers, Light fields, Kernel, Image and video synthesis, Computational photography BibRef

Rombach, R.[Robin], Esser, P.[Patrick], Ommer, B.[Björn],
Geometry-Free View Synthesis: Transformers and no 3D Priors,
ICCV21(14336-14346)
IEEE DOI 2203
Solid modeling, Visualization, Geometric modeling, Transformers, Image and video synthesis, Neural generative models BibRef

Tan, J.[Jing], Tang, J.Q.[Jia-Qi], Wang, L.M.[Li-Min], Wu, G.S.[Gang-Shan],
Relaxed Transformer Decoders for Direct Action Proposal Generation,
ICCV21(13506-13515)
IEEE DOI 2203
Visualization, Head, Pipelines, Estimation, Transformers, Decoding, Action and behavior recognition, Video analysis and understanding BibRef

Liu, S.[Song], Fan, H.Q.[Hao-Qi], Qian, S.S.[Sheng-Sheng], Chen, Y.[Yiru], Ding, W.[Wenkui], Wang, Z.Y.[Zhong-Yuan],
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval,
ICCV21(11895-11905)
IEEE DOI 2203
Training, Memory management, Streaming media, Performance gain, Benchmark testing, Transformers, Image and video retrieval, Vision + other modalities BibRef

Truong, T.D.[Thanh-Dat], Duong, C.N.[Chi Nhan], Vu, T.D.[The De], Pham, H.A.[Hoang Anh], Raj, B.[Bhiksha], Le, N.[Ngan], Luu, K.[Khoa],
The Right to Talk: An Audio-Visual Transformer Approach,
ICCV21(1085-1094)
IEEE DOI 2203
Location awareness, Visualization, Correlation, Interrupters, Transformers, Feature extraction, Regulation, Video analysis and understanding BibRef

Weng, W.M.[Wen-Ming], Zhang, Y.[Yueyi], Xiong, Z.W.[Zhi-Wei],
Event-based Video Reconstruction Using Transformer,
ICCV21(2543-2552)
IEEE DOI 2203
Visualization, Computational modeling, Semantics, Memory management, Transformers, Feature extraction, Image and video synthesis BibRef

Arnab, A.[Anurag], Dehghani, M.[Mostafa], Heigold, G.[Georg], Sun, C.[Chen], Lucic, M.[Mario], Schmid, C.[Cordelia],
ViViT: A Video Vision Transformer,
ICCV21(6816-6826)
IEEE DOI 2203
Training, Benchmark testing, Transformers, Spatiotemporal phenomena, Kinetic theory, Action and behavior recognition BibRef

Girdhar, R.[Rohit], Grauman, K.[Kristen],
Anticipative Video Transformer,
ICCV21(13485-13495)
IEEE DOI 2203
Video sequences, Predictive models, Benchmark testing, Transformers, Task analysis, Video analysis and understanding BibRef

Zhang, Y.[Yanyi], Li, X.Y.[Xin-Yu], Liu, C.H.[Chun-Hui], Shuai, B.[Bing], Zhu, Y.[Yi], Brattoli, B.[Biagio], Chen, H.[Hao], Marsic, I.[Ivan], Tighe, J.[Joseph],
VidTr: Video Transformer Without Convolutions,
ICCV21(13557-13567)
IEEE DOI 2203
Training, Costs, Error analysis, Computational modeling, Transformers, Cognition, Action and behavior recognition, Video analysis and understanding BibRef

Chen, J.W.[Jia-Wei], Ho, C.M.[Chiu Man],
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition,
WACV22(786-797)
IEEE DOI 2202
Computational modeling, Benchmark testing, ransformers, Computational efficiency, Spatiotemporal phenomena, Analysis and Understanding BibRef

Li, S.Y.[Shu-Yan], Li, X.[Xiu], Lu, J.W.[Ji-Wen], Zhou, J.[Jie],
Self-supervised Video Hashing via Bidirectional Transformers,
CVPR21(13544-13553)
IEEE DOI 2111
Training, Hash functions, Visualization, Correlation, Benchmark testing, Transformers BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Spiking Neural Networks .


Last update:Mar 16, 2024 at 20:36:19