Yang, J.H.[Jia-Hao],
Li, X.Y.[Xiang-Yang],
Zheng, M.[Mao],
Wang, Z.H.[Zi-Han],
Zhu, Y.Q.[Yong-Qing],
Guo, X.Q.[Xiao-Qian],
Yuan, Y.C.[Yu-Chen],
Chai, Z.[Zifeng],
Jiang, S.Q.[Shu-Qiang],
MemBridge: Video-Language Pre-Training With Memory-Augmented
Inter-Modality Bridge,
IP(32), 2023, pp. 4073-4087.
IEEE DOI
2307
WWW Link. Bridges, Transformers, Computer architecture, Task analysis,
Visualization, Feature extraction, Memory modules, memory module
BibRef
Selva, J.[Javier],
Johansen, A.S.[Anders S.],
Escalera, S.[Sergio],
Nasrollahi, K.[Kamal],
Moeslund, T.B.[Thomas B.],
Clapés, A.[Albert],
Video Transformers: A Survey,
PAMI(45), No. 11, November 2023, pp. 12922-12943.
IEEE DOI
2310
Survey, Video Transformers.
BibRef
Zhang, Z.C.[Zi-Chao],
Chen, Z.D.[Zhen-Duo],
Wang, Y.X.[Yong-Xin],
Luo, X.[Xin],
Xu, X.S.[Xin-Shun],
A vision transformer for fine-grained classification by reducing
noise and enhancing discriminative information,
PR(145), 2024, pp. 109979.
Elsevier DOI Code:
WWW Link.
2311
Vision transformer, Complementary information integration,
Region attention, Fine-grained image recognition
BibRef
Xian, K.[Ke],
Peng, J.[Juewen],
Cao, Z.G.[Zhi-Guo],
Zhang, J.M.[Jian-Ming],
Lin, G.S.[Guo-Sheng],
ViTA: Video Transformer Adaptor for Robust Video Depth Estimation,
MultMed(26), 2024, pp. 3302-3316.
IEEE DOI
2402
Transformers, Optical flow, Estimation, Training, Streaming media,
Optical losses, Bidirectional control,
spatio-temporal consistency loss
BibRef
Zhang, J.S.[Jin-Song],
Gu, L.F.[Ling-Feng],
Lai, Y.K.[Yu-Kun],
Wang, X.Y.[Xue-Yang],
Li, K.[Kun],
Toward Grouping in Large Scenes With Occlusion-Aware Spatio-Temporal
Transformers,
CirSysVideo(34), No. 5, May 2024, pp. 3919-3929.
IEEE DOI
2405
Feature extraction, Trajectory, Transformers, Task analysis,
Data mining, Video sequences, Group detection, large-scale scenes,
spatio-temporal transformers
BibRef
Lu, Y.W.[Ya-Wen],
Liu, D.F.[Dong-Fang],
Wang, Q.F.[Qi-Fan],
Han, C.[Cheng],
Cui, Y.M.[Yi-Ming],
Cao, Z.W.[Zhi-Wen],
Zhang, X.L.[Xue-Ling],
Chen, Y.J.V.[Ying-Jie Victor],
Fan, H.[Heng],
ProMotion: Prototypes as Motion Learners,
CVPR24(28109-28119)
IEEE DOI
2410
Uncertainty, Computational modeling, Prototypes, Transformers, Robustness
BibRef
Choi, J.[Joonmyung],
Lee, S.[Sanghyeok],
Chu, J.W.[Jae-Won],
Choi, M.[Minhyuk],
Kim, H.W.J.[Hyun-Woo J.],
vid-TLDR: Training Free Token merging for Light-Weight Video
Transformer,
CVPR24(18771-18781)
IEEE DOI Code:
WWW Link.
2410
Training, Codes, Computational modeling, Merging, Detectors,
Transformers, Efficient ViTs, Video Understanding, Token Merging,
Video Transformers
BibRef
Kowal, M.[Matthew],
Dave, A.[Achal],
Ambrus, R.[Rares],
Gaidon, A.[Adrien],
Derpanis, K.G.[Konstantinos G.],
Tokmakov, P.[Pavel],
Understanding Video Transformers via Universal Concept Discovery,
CVPR24(10946-10956)
IEEE DOI
2410
Representation learning, Heuristic algorithms,
Object segmentation, Predictive models, Transformers, Transformers
BibRef
Herzig, R.[Roei],
Abramovich, O.[Ofir],
Ben Avraham, E.[Elad],
Arbelle, A.[Assaf],
Karlinsky, L.[Leonid],
Shamir, A.[Ariel],
Darrell, T.J.[Trevor J.],
Globerson, A.[Amir],
PromptonomyViT: Multi-Task Prompt Learning Improves Video
Transformers using Synthetic Scene Data,
WACV24(6789-6801)
IEEE DOI Code:
WWW Link.
2404
Graphics, Solid modeling, Annotations, Transformers, Multitasking,
Task analysis, Algorithms, Video recognition and understanding,
Image recognition and understanding
BibRef
Li, K.C.[Kun-Chang],
Wang, Y.[Yali],
Li, Y.Z.[Yi-Zhuo],
Wang, Y.[Yi],
He, Y.[Yinan],
Wang, L.M.[Li-Min],
Qiao, Y.[Yu],
Unmasked Teacher: Towards Training-Efficient Video Foundation Models,
ICCV23(19891-19903)
IEEE DOI
2401
BibRef
Ko, D.[Dohwan],
Choi, J.[Joonmyung],
Choi, H.K.[Hyeong Kyu],
On, K.W.[Kyoung-Woon],
Roh, B.[Byungseok],
Kim, H.W.J.[Hyun-Woo J.],
MELTR: Meta Loss Transformer for Learning to Fine-tune Video
Foundation Models,
CVPR23(20105-20115)
IEEE DOI
2309
BibRef
Piergiovanni, A.J.,
Kuo, W.C.[Wei-Cheng],
Angelova, A.[Anelia],
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
Learning,
CVPR23(2214-2224)
IEEE DOI
2309
BibRef
Park, J.[Jungin],
Lee, J.Y.[Ji-Young],
Sohn, K.H.[Kwang-Hoon],
Dual-Path Adaptation from Image to Video Transformers,
CVPR23(2203-2213)
IEEE DOI
2309
BibRef
Karim, R.[Rezaul],
Zhao, H.[He],
Wildes, R.P.[Richard P.],
Siam, M.[Mennatullah],
MED-VT: Multiscale Encoder-Decoder Video Transformer with Application
to Object Segmentation,
CVPR23(6323-6333)
IEEE DOI
2309
BibRef
Yu, L.J.[Li-Jun],
Cheng, Y.[Yong],
Sohn, K.[Kihyuk],
Lezama, J.[José],
Zhang, H.[Han],
Chang, H.[Huiwen],
Hauptmann, A.G.[Alexander G.],
Yang, M.H.[Ming-Hsuan],
Hao, Y.[Yuan],
Essa, I.[Irfan],
Jiang, L.[Lu],
MAGVIT: Masked Generative Video Transformer,
CVPR23(10459-10469)
IEEE DOI
2309
BibRef
Xing, Z.[Zhen],
Dai, Q.[Qi],
Hu, H.[Han],
Chen, J.J.[Jing-Jing],
Wu, Z.X.[Zu-Xuan],
Jiang, Y.G.[Yu-Gang],
SVFormer: Semi-supervised Video Transformer for Action Recognition,
CVPR23(18816-18826)
IEEE DOI
2309
BibRef
Xie, F.[Fei],
Chu, L.[Lei],
Li, J.H.[Jia-Hao],
Lu, Y.[Yan],
Ma, C.[Chao],
VideoTrack: Learning to Track Objects via Video Transformer,
CVPR23(22826-22835)
IEEE DOI
2309
BibRef
Qiu, Z.W.[Zhong-Wei],
Yang, Q.S.[Qian-Sheng],
Wang, J.[Jian],
Feng, H.C.[Hao-Cheng],
Han, J.Y.[Jun-Yu],
Ding, E.[Errui],
Xu, C.[Chang],
Fu, D.M.[Dong-Mei],
Wang, J.D.[Jing-Dong],
PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation with
Progressive Video Transformers,
CVPR23(21254-21263)
IEEE DOI
2309
BibRef
Yang, J.[Jing],
Chen, J.W.[Jun-Wen],
Yanai, K.[Keiji],
Transformer-based Cross-modal Recipe Embeddings with Large Batch
Training,
MMMod23(II: 471-482).
Springer DOI
2304
BibRef
Huang, K.W.[Kuan-Wei],
Chen, G.C.F.[Geoff Chih-Fan],
Chang, P.W.[Po-Wen],
Lin, S.C.[Sheng-Chieh],
Hsu, C.[ChiaJung],
Thengane, V.[Vishal],
Lin, J.Y.Y.[Joshua Yao-Yu],
Strong Gravitational Lensing Parameter Estimation with Vision
Transformer,
AI4Space22(143-153).
Springer DOI
2304
BibRef
Zheng, M.[Minyan],
Luo, J.P.[Jian-Ping],
Space-time Video Super-resolution 3d Transformer,
MMMod23(II: 374-385).
Springer DOI
2304
BibRef
Ye, X.[Xi],
Bilodeau, G.A.[Guillaume-Alexandre],
VPTR: Efficient Transformers for Video Prediction,
ICPR22(3492-3499)
IEEE DOI
2212
Representation learning, Source coding, Predictive models,
Transformers, Spatiotemporal phenomena, Task analysis
BibRef
Liang, Y.X.[Yu-Xuan],
Zhou, P.[Pan],
Zimmermann, R.[Roger],
Yan, S.C.[Shui-Cheng],
DualFormer:
Local-Global Stratified Transformer for Efficient Video Recognition,
ECCV22(XXXIV:577-595).
Springer DOI
2211
WWW Link. Transformer architecture to address computational costs.
BibRef
Wang, J.[Junke],
Yang, X.T.[Xi-Tong],
Li, H.D.[Heng-Duo],
Liu, L.[Li],
Wu, Z.X.[Zu-Xuan],
Jiang, Y.G.[Yu-Gang],
Efficient Video Transformers with Spatial-Temporal Token Selection,
ECCV22(XXXV:69-86).
Springer DOI
2211
BibRef
Yuan, J.[Jing],
Barmpoutis, P.[Panagiotis],
Stathaki, T.[Tania],
Multi-Scale Deformable Transformer Encoder Based Single-Stage
Pedestrian Detection,
ICIP22(2906-2910)
IEEE DOI
2211
Head, Detectors, Prediction methods, Feature extraction,
Transformers, Video surveillance, Robustness, Pedestrian detection,
vision transformer
BibRef
Yun, H.[Heeseung],
Lee, S.[Sehun],
Kim, G.[Gunhee],
Panoramic Vision Transformer for Saliency Detection in 360° Videos,
ECCV22(XXXV:422-439).
Springer DOI
2211
BibRef
Sun, G.X.[Guan-Xiong],
Hua, Y.[Yang],
Hu, G.S.[Guo-Sheng],
Robertson, N.[Neil],
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks,
ECCV22(XXXV:285-301).
Springer DOI
2211
BibRef
Wang, Y.H.[Yong-Hua],
Zhang, J.C.[Jing-Chi],
Li, Z.G.[Zhen-Gang],
Zeng, X.[Xing],
Zhang, Z.[Zhen],
Zhang, D.[Diankai],
Long, Y.[Yunlin],
Wang, N.[Ning],
Neural Network-based In-Loop Filter for CLIC 2022,
CLIC22(1773-1776)
IEEE DOI
2210
Deep learning, Video coding, Image coding, Convolution,
Rate-distortion, Network architecture, Video compression
BibRef
Chang, H.W.[Hui-Wen],
Zhang, H.[Han],
Jiang, L.[Lu],
Liu, C.[Ce],
Freeman, W.T.[William T.],
MaskGIT: Masked Generative Image Transformer,
CVPR22(11305-11315)
IEEE DOI
2210
Training, Visualization, Image synthesis, Computational modeling,
Transformers, Decoding, Image and video synthesis and generation,
Self- semi- meta- unsupervised learning
BibRef
Herzig, R.[Roei],
Ben-Avraham, E.[Elad],
Mangalam, K.[Karttikeya],
Bar, A.[Amir],
Chechik, G.[Gal],
Rohrbach, A.[Anna],
Darrell, T.J.[Trevor J.],
Globerson, A.[Amir],
Object-Region Video Transformers,
CVPR22(3138-3149)
IEEE DOI
2210
Visualization, Fuses, Computational modeling,
Transformers, Trajectory,
Action and event recognition
BibRef
Wang, R.[Rui],
Chen, D.D.[Dong-Dong],
Wu, Z.X.[Zu-Xuan],
Chen, Y.P.[Yin-Peng],
Dai, X.[Xiyang],
Liu, M.C.[Meng-Chen],
Jiang, Y.G.[Yu-Gang],
Zhou, L.[Luowei],
Yuan, L.[Lu],
BEVT: BERT Pretraining of Video Transformers,
CVPR22(14713-14723)
IEEE DOI
2210
Representation learning, Codes, Computational modeling,
Bit error rate, Transformers, Data models, Video analysis and understanding
BibRef
Wu, C.Y.[Chao-Yuan],
Li, Y.[Yanghao],
Mangalam, K.[Karttikeya],
Fan, H.Q.[Hao-Qi],
Xiong, B.[Bo],
Malik, J.[Jitendra],
Feichtenhofer, C.[Christoph],
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient
Long-Term Video Recognition,
CVPR22(13577-13587)
IEEE DOI
2210
Visualization, Costs, Codes, Computational modeling, Cache memory,
Action and event recognition, Video analysis and understanding
BibRef
Mangalam, K.[Karttikeya],
Fan, H.Q.[Hao-Qi],
Li, Y.[Yanghao],
Wu, C.Y.[Chao-Yuan],
Xiong, B.[Bo],
Feichtenhofer, C.[Christoph],
Malik, J.[Jitendra],
Reversible Vision Transformers,
CVPR22(10820-10830)
IEEE DOI
2210
Training, Adaptation models, Visualization, Computational modeling,
Memory management, Object detection, Transformers,
Video analysis and understanding
BibRef
Li, Y.[Yanghao],
Wu, C.Y.[Chao-Yuan],
Fan, H.Q.[Hao-Qi],
Mangalam, K.[Karttikeya],
Xiong, B.[Bo],
Malik, J.[Jitendra],
Feichtenhofer, C.[Christoph],
MViTv2: Improved Multiscale Vision Transformers for Classification
and Detection,
CVPR22(4794-4804)
IEEE DOI
2210
Representation learning, Visualization, Image segmentation,
Image recognition, Object detection,
Video analysis and understanding
BibRef
Ranasinghe, K.[Kanchana],
Naseer, M.[Muzammal],
Khan, S.[Salman],
Khan, F.S.[Fahad Shahbaz],
Ryoo, M.S.[Michael S.],
Self-supervised Video Transformer,
CVPR22(2864-2874)
IEEE DOI
2210
Training, Video sequences, Benchmark testing, Transformers, Encoding,
Spatiotemporal phenomena, Video analysis and understanding,
Self- semi- meta- unsupervised learning
BibRef
Yang, S.[Shusheng],
Wang, X.G.[Xing-Gang],
Li, Y.[Yu],
Fang, Y.X.[Yu-Xin],
Fang, J.[Jiemin],
Liu, W.Y.[Wen-Yu],
Zhao, X.[Xun],
Shan, Y.[Ying],
Temporally Efficient Vision Transformer for Video Instance
Segmentation,
CVPR22(2875-2885)
IEEE DOI
2210
Image segmentation, Visualization, Head, Shape, Transformers,
Spatiotemporal phenomena, Computational efficiency,
grouping and shape analysis
BibRef
Liu, Z.[Ze],
Ning, J.[Jia],
Cao, Y.[Yue],
Wei, Y.X.[Yi-Xuan],
Zhang, Z.[Zheng],
Lin, S.[Stephen],
Hu, H.[Han],
Video Swin Transformer,
CVPR22(3192-3201)
IEEE DOI
2210
Adaptation models, Image recognition, Computational modeling,
Benchmark testing, Transformers, Solids, Recognition: detection
BibRef
Yan, S.[Shen],
Xiong, X.[Xuehan],
Arnab, A.[Anurag],
Lu, Z.C.[Zhi-Chao],
Zhang, M.[Mi],
Sun, C.[Chen],
Schmid, C.[Cordelia],
Multiview Transformers for Video Recognition,
CVPR22(3323-3333)
IEEE DOI
2210
Fuses, Computational modeling, Transformers,
Cognition, Spatiotemporal phenomena,
Action and event recognition
BibRef
Shao, R.Z.[Rui-Zhi],
Wu, G.[Gaochang],
Zhou, Y.M.[Yue-Mei],
Fu, Y.[Ying],
Fang, L.[Lu],
Liu, Y.B.[Ye-Bin],
LocalTrans: A Multiscale Local Transformer Network for
Cross-Resolution Homography Estimation,
ICCV21(14870-14879)
IEEE DOI
2203
Photography, Superresolution, Estimation, Transformers, Light fields,
Kernel, Image and video synthesis, Computational photography
BibRef
Rombach, R.[Robin],
Esser, P.[Patrick],
Ommer, B.[Björn],
Geometry-Free View Synthesis: Transformers and no 3D Priors,
ICCV21(14336-14346)
IEEE DOI
2203
Solid modeling, Visualization, Geometric modeling,
Transformers, Image and video synthesis,
Neural generative models
BibRef
Tan, J.[Jing],
Tang, J.Q.[Jia-Qi],
Wang, L.M.[Li-Min],
Wu, G.S.[Gang-Shan],
Relaxed Transformer Decoders for Direct Action Proposal Generation,
ICCV21(13506-13515)
IEEE DOI
2203
Visualization, Head, Pipelines, Estimation, Transformers, Decoding,
Action and behavior recognition,
Video analysis and understanding
BibRef
Liu, S.[Song],
Fan, H.Q.[Hao-Qi],
Qian, S.S.[Sheng-Sheng],
Chen, Y.[Yiru],
Ding, W.[Wenkui],
Wang, Z.Y.[Zhong-Yuan],
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text
Retrieval,
ICCV21(11895-11905)
IEEE DOI
2203
Training, Memory management, Streaming media, Performance gain,
Benchmark testing, Transformers, Image and video retrieval,
Vision + other modalities
BibRef
Truong, T.D.[Thanh-Dat],
Duong, C.N.[Chi Nhan],
Vu, T.D.[The De],
Pham, H.A.[Hoang Anh],
Raj, B.[Bhiksha],
Le, N.[Ngan],
Luu, K.[Khoa],
The Right to Talk: An Audio-Visual Transformer Approach,
ICCV21(1085-1094)
IEEE DOI
2203
Location awareness, Visualization, Correlation, Interrupters,
Transformers, Feature extraction, Regulation,
Video analysis and understanding
BibRef
Weng, W.M.[Wen-Ming],
Zhang, Y.[Yueyi],
Xiong, Z.W.[Zhi-Wei],
Event-based Video Reconstruction Using Transformer,
ICCV21(2543-2552)
IEEE DOI
2203
Visualization, Computational modeling, Semantics,
Memory management, Transformers, Feature extraction,
Image and video synthesis
BibRef
Arnab, A.[Anurag],
Dehghani, M.[Mostafa],
Heigold, G.[Georg],
Sun, C.[Chen],
Lucic, M.[Mario],
Schmid, C.[Cordelia],
ViViT: A Video Vision Transformer,
ICCV21(6816-6826)
IEEE DOI
2203
Training, Benchmark testing, Transformers,
Spatiotemporal phenomena, Kinetic theory,
Action and behavior recognition
BibRef
Girdhar, R.[Rohit],
Grauman, K.[Kristen],
Anticipative Video Transformer,
ICCV21(13485-13495)
IEEE DOI
2203
Video sequences, Predictive models,
Benchmark testing, Transformers, Task analysis,
Video analysis and understanding
BibRef
Zhang, Y.[Yanyi],
Li, X.Y.[Xin-Yu],
Liu, C.H.[Chun-Hui],
Shuai, B.[Bing],
Zhu, Y.[Yi],
Brattoli, B.[Biagio],
Chen, H.[Hao],
Marsic, I.[Ivan],
Tighe, J.[Joseph],
VidTr: Video Transformer Without Convolutions,
ICCV21(13557-13567)
IEEE DOI
2203
Training, Costs, Error analysis, Computational modeling,
Transformers, Cognition, Action and behavior recognition,
Video analysis and understanding
BibRef
Chen, J.W.[Jia-Wei],
Ho, C.M.[Chiu Man],
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action
Recognition,
WACV22(786-797)
IEEE DOI
2202
Computational modeling, Benchmark testing,
ransformers, Computational efficiency, Spatiotemporal phenomena,
Analysis and Understanding
BibRef
Li, S.Y.[Shu-Yan],
Li, X.[Xiu],
Lu, J.W.[Ji-Wen],
Zhou, J.[Jie],
Self-supervised Video Hashing via Bidirectional Transformers,
CVPR21(13544-13553)
IEEE DOI
2111
Training, Hash functions, Visualization,
Correlation, Benchmark testing, Transformers
BibRef
Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Spiking Neural Networks .