Deep Video Understanding Dataset,
2020, used for workshops, and challenges.
WWW Link.
Dataset, Video Understanding.
Brostow, G.J.[Gabriel J.],
Fauqueur, J.[Julien],
Cipolla, R.[Roberto],
Semantic object classes in video:
A high-definition ground truth database,
PRL(30), No. 2, 15 January 2009, pp. 88-97.
Elsevier DOI
0804
Object recognition; Video database; Video understanding; Semantic
segmentation; Label propagation
BibRef
Aodha, O.M.[Oisin Mac],
Brostow, G.J.[Gabriel J.],
Pollefeys, M.[Marc],
Segmenting video into classes of algorithm-suitability,
CVPR10(1054-1061).
IEEE DOI
1006
BibRef
Suresha, M.,
Kuppa, S.,
Raghukumar, D.S.,
A study on deep learning spatiotemporal models and feature extraction
techniques for video understanding,
MultInfoRetr(9), No. 2, June 2020, pp. 81-101.
Springer DOI
2005
BibRef
Kavoosifar, M.R.[Mohammad Reza],
Apiletti, D.[Daniele],
Baralis, E.[Elena],
Garza, P.[Paolo],
Huet, B.[Benoit],
Effective video hyperlinking by means of enriched feature sets and
monomodal query combinations,
MultInfoRetr(9), No. 3, September 2020, pp. 215-227.
Springer DOI
2008
BibRef
Tang, P.J.[Peng-Jie],
Tan, Y.L.[Yun-Lan],
Li, J.Z.[Jin-Zhong],
Tan, B.[Bin],
Translating video into language by enhancing visual and language
representations,
JVCIR(72), 2020, pp. 102875.
Elsevier DOI
2010
Video description, Feature enhancing, CNN, LSTM, Semantic
BibRef
Yu, J.,
Jiang, X.,
Qin, Z.,
Zhang, W.,
Hu, Y.,
Wu, Q.,
Learning Dual Encoding Model for Adaptive Visual Understanding in
Visual Dialogue,
IP(30), 2021, pp. 220-233.
IEEE DOI
2011
Visualization, Semantics, History, Task analysis, Cognition,
Feature extraction, Adaptation models, Dual encoding,
visual dialogue
BibRef
Duan, J.H.[Jin-Hao],
Xu, H.[Hua],
Lin, X.Z.[Xiao-Zhu],
Zhu, S.C.[Shang-Chao],
Du, Y.Z.[Yuan-Ze],
Multi-semantic long-range dependencies capturing for efficient video
representation learning,
IVC(104), 2020, pp. 103988.
Elsevier DOI
2012
Video representation learning,
Long-range dependencies capturing, Video classification
BibRef
Tan, H.L.[Hui Li],
Zhu, H.Y.[Hong-Yuan],
Lim, J.H.[Joo-Hwee],
Tan, C.[Cheston],
A comprehensive survey of procedural video datasets,
CVIU(202), 2021, pp. 103107.
Elsevier DOI
2012
Video datasets, depicting series of actions performed in some
constrained but non-unique order to achieve some intended high-level
goal.
BibRef
Lin, J.[Ji],
Gan, C.[Chuang],
Wang, K.[Kuan],
Han, S.[Song],
TSM: Temporal Shift Module for Efficient and Scalable Video
Understanding on Edge Devices,
PAMI(44), No. 5, May 2022, pp. 2760-2774.
IEEE DOI
2204
BibRef
Earlier: A1, A2, A4, Only:
TSM: Temporal Shift Module for Efficient Video Understanding,
ICCV19(7082-7092)
IEEE DOI
2004
Code, Video Understanding.
WWW Link. Computational modeling, Convolution, Streaming media, Training,
Solid modeling, Temporal shift module, video recognition,
network dissection.
convolutional neural nets, object detection,
video signal processing, video streaming, Real-time systems
BibRef
Zhou, W.[Wei],
Hou, Y.[Yi],
Ouyang, K.W.[Ke-Wei],
Zhou, S.L.[Shi-Lin],
Exploring complementary information of self-supervised pretext tasks
for unsupervised video pre-training,
IET-CV(16), No. 3, 2022, pp. 255-265.
DOI Link
2204
Both knowledge distillation and self-supervised learning.
convolutional neural nets, feature extraction,
unsupervised learning, video signal processing, image sequences
BibRef
Li, Z.Q.[Zhen-Qiang],
Wang, W.M.[Wei-Min],
Li, Z.Y.[Zuo-Yue],
Huang, Y.F.[Yi-Fei],
Sato, Y.[Yoichi],
Spatio-Temporal Perturbations for Video Attribution,
CirSysVideo(32), No. 4, April 2022, pp. 2043-2056.
IEEE DOI
2204
Measurement, Reliability, Task analysis, Spatiotemporal phenomena,
Visualization, Heating systems, Perturbation methods, video understanding
BibRef
Tao, L.[Li],
Wang, X.T.[Xue-Ting],
Yamasaki, T.[Toshihiko],
An Improved Inter-Intra Contrastive Learning Framework on
Self-Supervised Video Representation,
CirSysVideo(32), No. 8, August 2022, pp. 5266-5280.
IEEE DOI
2208
Task analysis, Learning systems, Data models, Optical imaging,
Feature extraction, Representation learning, Optical sensors,
spatio-temporal convolution
BibRef
Huang, L.[Lang],
Zhang, C.[Chao],
Zhang, H.Y.[Hong-Yang],
Self-Adaptive Training: Bridging Supervised and Self-Supervised
Learning,
PAMI(46), No. 3, March 2024, pp. 1362-1377.
IEEE DOI Code:
WWW Link.
2402
Training, Data models, Noise measurement, Deep learning,
Predictive models, Neural networks, Self-supervised learning,
robust learning under noise
BibRef
Huang, L.[Lang],
You, S.[Shan],
Zheng, M.K.[Ming-Kai],
Wang, F.[Fei],
Qian, C.[Chen],
Yamasaki, T.[Toshihiko],
Learning Where to Learn in Cross-View Self-Supervised Learning,
CVPR22(14431-14440)
IEEE DOI
2210
Representation learning, Image segmentation, Head, Aggregates,
Semantics, Self-supervised learning, Object detection,
Self- semi- meta- unsupervised learning
BibRef
Hu, Y.[Yaosi],
Yin, D.C.[Da-Cheng],
Wang, Y.W.[Yu-Wang],
Chen, Z.Z.[Zhen-Zhong],
Luo, C.[Chong],
Decomposing style, content, and motion for videos,
JVCIR(89), 2022, pp. 103686.
Elsevier DOI
2212
Video decomposition, Video synthesis, Self-supervised learning
BibRef
Hong, M.Y.[Ming-Yao],
Zhang, X.F.[Xin-Feng],
Li, G.R.[Guo-Rong],
Huang, Q.M.[Qing-Ming],
Fine-Grained Feature Generation for Generalized Zero-Shot Video
Classification,
IP(32), 2023, pp. 1599-1612.
IEEE DOI
2303
Visualization, Semantics, Task analysis, Training,
Generative adversarial networks, Feature extraction, Data models,
video classification
BibRef
Jin, X.[Xin],
Feng, R.[Ruoyu],
Sun, S.[Simeng],
Feng, R.[Runsen],
He, T.Y.[Tian-Yu],
Chen, Z.B.[Zhi-Bo],
Semantical video coding: Instill static-dynamic clues into structured
bitstream for AI tasks,
JVCIR(93), 2023, pp. 103816.
Elsevier DOI
2305
Video coding, Semantically structured bitstream, Intelligent analytics
BibRef
Schiappa, M.C.[Madeline C.],
Rawat, Y.S.[Yogesh S.],
Shah, M.[Mubarak],
Self-Supervised Learning for Videos: A Survey,
Surveys(55), No. 13s, July 2023, pp. xx-yy.
DOI Link
2309
Survey, Video Understanding.
Survey, Self-Supervised Learning. video understanding, zero-shot learning,
visual-language models, deep learning, multimodal learning
BibRef
Yang, X.M.[Xing-Ming],
Xiong, S.[Sixuan],
Wu, K.W.[Ke-Wei],
Shan, D.F.[Dong-Feng],
Xie, Z.[Zhao],
Attentive spatial-temporal contrastive learning for self-supervised
video representation,
IVC(137), 2023, pp. 104765.
Elsevier DOI
2309
Self-supervised learning, Spatial-temporal feature,
Contrastive learning, Spatial-temporal self-attention
BibRef
Miao, J.X.[Jia-Xu],
Wei, Y.C.[Yun-Chao],
Wang, X.H.[Xiao-Han],
Yang, Y.[Yi],
Temporal Pixel-Level Semantic Understanding Through the VSPW Dataset,
PAMI(45), No. 9, September 2023, pp. 11297-11308.
IEEE DOI
2309
WWW Link.
BibRef
Hu, D.[Di],
Wang, Z.[Zheng],
Nie, F.P.[Fei-Ping],
Wang, R.[Rong],
Li, X.L.[Xue-Long],
Self-Supervised Learning for Heterogeneous Audiovisual Scene Analysis,
MultMed(25), 2023, pp. 3534-3545.
IEEE DOI
2310
BibRef
Namitha, K.[Kalakunnath],
Geetha, M.[Madathilkulangara],
Athi, N.[Narayanan],
An Improved Interaction Estimation and Optimization Method for
Surveillance Video Synopsis,
MultMedMag(30), No. 3, July 2023, pp. 25-36.
IEEE DOI
2310
BibRef
Assefa, M.[Maregu],
Jiang, W.[Wei],
Alemu, K.G.[Kumie Gedamu],
Yilma, G.[Getinet],
Adhikari, D.[Deepak],
Ayalew, M.[Melese],
Seid, A.M.[Abegaz Mohammed],
Erbad, A.[Aiman],
Actor-Aware Self-Supervised Learning for Semi-Supervised Video
Representation Learning,
CirSysVideo(33), No. 11, November 2023, pp. 6679-6692.
IEEE DOI Code:
WWW Link.
2311
BibRef
Hu, Y.F.[Yu-Fan],
Gao, J.Y.[Jun-Yu],
Xu, C.S.[Chang-Sheng],
Learning Multi-Expert Distribution Calibration for Long-Tailed Video
Classification,
MultMed(26), 2024, pp. 555-567.
IEEE DOI
2402
Tail, Head, Calibration, Training, Data models, Task analysis,
Visualization, Long-tailed distribution, video classification,
multi-expert calibration
BibRef
Chen, Z.Y.[Zi-Yu],
Wang, H.L.[Han-Li],
Chen, C.W.[Chang Wen],
Self-Supervised Video Representation Learning by Serial Restoration
With Elastic Complexity,
MultMed(26), 2024, pp. 2235-2248.
IEEE DOI
2402
Task analysis, Feature extraction, Representation learning,
Manuals, Spatiotemporal phenomena, Image restoration,
nearest neighbor retrieval
BibRef
Chen, Z.L.[Zai-Long],
Wang, L.[Lei],
Wang, P.[Peng],
Gao, P.[Peng],
Question-Aware Global-Local Video Understanding Network for
Audio-Visual Question Answering,
CirSysVideo(34), No. 5, May 2024, pp. 4109-4119.
IEEE DOI
2405
Feature extraction, Visualization, Task analysis,
Question answering (information retrieval), Data mining, Fuses,
deep learning
BibRef
Cao, H.Z.[Hao-Zhi],
Xu, Y.C.[Yue-Cong],
Mao, K.Z.[Ke-Zhi],
Xie, L.H.[Li-Hua],
Yin, J.X.[Jian-Xiong],
See, S.[Simon],
Xu, Q.W.[Qian-Wen],
Yang, J.F.[Jian-Fei],
Self-Supervised Video Representation Learning by Video Incoherence
Detection,
Cyber(54), No. 6, June 2024, pp. 3810-3822.
IEEE DOI
2406
Spatiotemporal phenomena, Task analysis, Representation learning,
Cognition, Training, Self-supervised learning, Supervised learning,
video representation learning
BibRef
Zhang, Z.Q.[Zi-Qi],
Ma, Z.[Zongyang],
Yuan, C.F.[Chun-Feng],
Chen, Y.X.[Yu-Xin],
Wang, P.[Peijin],
Qi, Z.A.[Zhong-Ang],
Hao, C.L.[Cheng-Lei],
Li, B.[Bing],
Shan, Y.[Ying],
Hu, W.M.[Wei-Ming],
Maybank, S.[Stephen],
Chinese Title Generation for Short Videos:
Dataset, Metric and Algorithm,
PAMI(46), No. 7, July 2024, pp. 5192-5208.
IEEE DOI
2406
Videos, Task analysis, Measurement, Semantics, Benchmark testing,
Electronic commerce, Annotations, Video and language,
text-video retrieval
BibRef
Bi, S.[Shuai],
Hu, Z.P.[Zheng-Ping],
Zhang, H.[Hehao],
Di, J.[Jirui],
Sun, Z.[Zhe],
Motion-guided spatiotemporal multitask feature discrimination for
self-supervised video representation learning,
PR(155), 2024, pp. 110713.
Elsevier DOI
2408
Unsupervised learning, Self-supervised learning,
Cross-view learning, Multitask discrimination, Video action understanding
BibRef
Pang, B.[Bo],
Peng, G.[Gao],
Li, Y.Z.[Yi-Zhuo],
Lu, C.[Cewu],
Markov Progressive Framework, a Universal Paradigm for Modeling Long
Videos,
PAMI(46), No. 12, December 2024, pp. 9749-9765.
IEEE DOI
2411
Videos, Computational modeling, Semantics, Training, Transformers,
Task analysis, Solid modeling, Video understanding, progressive modeling
BibRef
Li, D.[Dong],
Jin, J.D.[Jian-Dong],
Zhang, Y.H.[Yu-Hao],
Zhong, Y.L.[Yan-Lin],
Wu, Y.Y.[Yao-Yang],
Chen, L.[Lan],
Wang, X.[Xiao],
Luo, B.[Bin],
Semantic-aware frame-event fusion based pattern recognition via large
vision-language models,
PR(158), 2025, pp. 111080.
Elsevier DOI Code:
WWW Link.
2411
RGB-event fusion, Large vision-language models,
Semantic information, Pattern recognition
BibRef
Wu, J.T.[Jian-Tao],
Mo, S.T.[Shen-Tong],
Atito, S.[Sara],
Feng, Z.H.[Zhen-Hua],
Kittler, J.V.[Josef V.],
Husain, S.S.[Syed Sameed],
Awais, M.[Muhammad],
Masked Momentum Contrastive Learning for Semantic Understanding by
Observation,
ICIP24(263-269)
IEEE DOI
2411
Visualization, Image segmentation, Thresholding (Imaging),
Protocols, Zero-shot learning, Large language models, Semantics,
zero-shot segmentation
BibRef
Yun, H.[Hoyeoung],
Ahn, J.[Jinwoo],
Kim, M.[Minseo],
Kim, E.S.[Eun-Sol],
Compositional Video Understanding with Spatiotemporal Structure-based
Transformers,
CVPR24(18751-18760)
IEEE DOI
2410
Learning systems, Visualization, Semantics, Computer architecture,
Transformers, Distance measurement
BibRef
Papalampidi, P.[Pinelopi],
Koppula, S.[Skanda],
Pathak, S.[Shreya],
Chiu, J.[Justin],
Heyward, J.[Joe],
Patraucean, V.[Viorica],
Shen, J.J.[Jia-Jun],
Miech, A.[Antoine],
Zisserman, A.[Andrew],
Nematzdeh, A.[Aida],
A Simple Recipe for Contrastively Pre-Training Video-First Encoders
Beyond 16 Frames,
CVPR24(14386-14397)
IEEE DOI
2410
Training, Visualization, Memory management, Benchmark testing, Encoding
BibRef
Wang, A.D.[An-Dong],
Wu, B.[Bo],
Chen, S.[Sunli],
Chen, Z.F.[Zhen-Fang],
Guan, H.T.[Hao-Tian],
Lee, W.N.[Wei-Ning],
Li, L.E.[Li Erran],
Gan, C.[Chuang],
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned
Open-World Knowledge,
CVPR24(13384-13394)
IEEE DOI
2410
Visualization, Quality assurance, Reviews, Scalability, Manuals,
Benchmark testing, commonsense reasoning
BibRef
Zhong, Y.[Yang],
Baghel, B.K.[Bhiman Kumar],
Multimodal Understanding of Memes with Fair Explanations,
MULA24(2007-2017)
IEEE DOI
2410
Social networking (online), Computational modeling,
Benchmark testing, Cultural differences, Fairness
BibRef
Sheng, D.[Dianmo],
Chen, D.D.[Dong-Dong],
Tan, Z.T.[Zhen-Tao],
Liu, Q.[Qiankun],
Chu, Q.[Qi],
Bao, J.M.[Jian-Min],
Gong, T.[Tao],
Liu, B.[Bin],
Xu, S.W.[Sheng-Wei],
Yu, N.H.[Neng-Hai],
Towards More Unified In-Context Visual Understanding,
CVPR24(13362-13372)
IEEE DOI
2410
Visualization, Quantization (signal), Semantic segmentation,
Large language models, Pipelines, Transformers, Multitasking
BibRef
He, B.[Bo],
Li, H.[Hengduo],
Jang, Y.K.[Young Kyun],
Jia, M.L.[Meng-Lin],
Cao, X.F.[Xue-Fei],
Shah, A.[Ashish],
Shrivastava, A.[Abhinav],
Lim, S.N.[Ser-Nam],
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
Understanding,
CVPR24(13504-13514)
IEEE DOI
2410
Analytical models, Large language models, Memory management,
Video sequences, Graphics processing units,
Long-Term Video Understanding
BibRef
Ma, F.[Fan],
Jin, X.J.[Xiao-Jie],
Wang, H.[Heng],
Xian, Y.C.[Yu-Chen],
Feng, J.S.[Jia-Shi],
Yang, Y.[Yi],
Vista-llama: Reducing Hallucination in Video Language Models via
Equal Distance to Visual Tokens,
CVPR24(13151-13160)
IEEE DOI Code:
WWW Link.
2410
Visualization, Attention mechanisms, Accuracy,
Large language models, Computational modeling, Benchmark testing,
Video Understanding
BibRef
Tan, C.[Chaolei],
Lai, J.H.[Jian-Huang],
Zheng, W.S.[Wei-Shi],
Hu, J.F.[Jian-Fang],
Siamese Learning with Joint Alignment and Regression for
Weakly-Supervised Video Paragraph Grounding,
CVPR24(13569-13580)
IEEE DOI
2410
Location awareness, Grounding, Annotations, Semantics,
Semisupervised learning, Transformers, siamese learning,
video understanding
BibRef
Zhang, C.Y.[Chao-Yi],
Lin, K.[Kevin],
Yang, Z.Y.[Zheng-Yuan],
Wang, J.F.[Jian-Feng],
Li, L.J.[Lin-Jie],
Lin, C.C.[Chung-Ching],
Liu, Z.C.[Zi-Cheng],
Wang, L.J.[Li-Juan],
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
Learning,
CVPR24(13647-13657)
IEEE DOI
2410
Measurement, Visualization, Accuracy, Annotations, Memory management,
Cognition, video understanding, LLM, in-context learning, multimodal,
vision-and-language
BibRef
Jin, P.[Peng],
Takanobu, R.[Ryuichi],
Zhang, W.[Wancai],
Cao, X.C.[Xiao-Chun],
Yuan, L.[Li],
Chat-UniVi: Unified Visual Representation Empowers Large Language
Models with Image and Video Understanding,
CVPR24(13700-13710)
IEEE DOI Code:
WWW Link.
2410
Bridges, Visualization, Codes, Large language models,
Computational modeling, Oral communication, Video Understanding
BibRef
Ren, S.[Shuhuai],
Yao, L.[Linli],
Li, S.C.[Shi-Cheng],
Sun, X.[Xu],
Hou, L.[Lu],
TimeChat: A Time-sensitive Multimodal Large Language Model for Long
Video Understanding,
CVPR24(14313-14323)
IEEE DOI Code:
WWW Link.
2410
Location awareness, Visualization, Codes, Grounding,
Large language models, Cognition,
long video understanding
BibRef
Xu, M.[Ming],
Gould, S.[Stephen],
Temporally Consistent Unbalanced Optimal Transport for Unsupervised
Action Segmentation,
CVPR24(14618-14627)
IEEE DOI
2410
Video on demand, Costs, Pipelines, Encoding,
Web sites, Noise measurement, long-form video understanding, procedural videos
BibRef
Chalk, J.[Jacob],
Huh, J.[Jaesung],
Kazakos, E.[Evangelos],
Zisserman, A.[Andrew],
Damen, D.[Dima],
TIM: A Time Interval Machine for Audio-Visual Action Recognition,
CVPR24(18153-18163)
IEEE DOI Code:
WWW Link.
2410
Measurement, Visualization, Adaptation models, Codes, Accuracy,
Computational modeling, action recognition, action detection,
video understanding
BibRef
Wang, J.[Junke],
Chen, D.D.[Dong-Dong],
Luo, C.[Chong],
He, B.[Bo],
Yuan, L.[Lu],
Wu, Z.[Zuxuan],
Jiang, Y.G.[Yu-Gang],
OmniViD: A Generative Framework for Universal Video Understanding,
CVPR24(18209-18220)
IEEE DOI Code:
WWW Link.
2410
Training, Location awareness, Visualization, Vocabulary,
Computer architecture, Benchmark testing, Robustness
BibRef
Zeng, R.[Runhao],
Chen, X.Y.[Xiao-Yong],
Liang, J.[JiaMing],
Wu, H.[Huisi],
Cao, G.Z.[Guang-Zhong],
Guo, Y.[Yong],
Benchmarking the Robustness of Temporal Action Detection Models
Against Temporal Corruptions,
CVPR24(18263-18274)
IEEE DOI Code:
WWW Link.
2410
Training, Location awareness, Source coding, Benchmark testing,
Feature extraction, Transformers, Temporal Action Detection,
Video Understanding
BibRef
Peirone, S.A.[Simone Alberto],
Pistilli, F.[Francesca],
Alliegro, A.[Antonio],
Averta, G.[Giuseppe],
A Backpack Full of Skills: Egocentric Video Understanding with
Diverse Task Perspectives,
CVPR24(18275-18285)
IEEE DOI
2410
Streaming media, Benchmark testing,
Egocentric Vision, Video Understanding
BibRef
Nguyen, T.T.[Trong-Thuan],
Nguyen, P.[Pha],
Luu, K.[Khoa],
HIG: Hierarchical Interlacement Graph Approach to Scene Graph
Generation in Video Understanding,
CVPR24(18384-18394)
IEEE DOI
2410
Visualization, Scalability, Computational modeling,
Benchmark testing, Task analysis, ASPIRe
BibRef
Tores, J.[Julie],
Sassatelli, L.[Lucile],
Wu, H.Y.[Hui-Yin],
Bergman, C.[Clement],
Andolfi, L.[Léa],
Ecrement, V.[Victor],
Precioso, F.[Frédéric],
Devars, T.[Thierry],
Guaresi, M.[Magali],
Julliard, V.[Virginie],
Lecossais, S.[Sarah],
Visual Objectification in Films: Towards a New AI Task for Video
Interpretation,
CVPR24(10864-10874)
IEEE DOI
2410
Representation learning, Visualization, Codes,
Computational modeling, Psychology, Motion pictures, objectification
BibRef
Jamal, M.A.[Muhammad Abdullah],
Mohareri, O.[Omid],
M33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D
image and video understanding,
WACV24(2532-2542)
IEEE DOI
2404
Representation learning, Solid modeling, Semantic segmentation,
Estimation, Focusing, Algorithms, Machine learning architectures,
Biomedical / healthcare / medicine
BibRef
Tian, Y.[Yuan],
Lu, G.[Guo],
Zhai, G.T.[Guang-Tao],
Gao, Z.Y.[Zhi-Yong],
Non-Semantics Suppressed Mask Learning for Unsupervised Video
Semantic Compression,
ICCV23(13564-13576)
IEEE DOI
2401
BibRef
Li, K.C.[Kun-Chang],
Wang, Y.L.[Ya-Li],
He, Y.[Yinan],
Li, Y.Z.[Yi-Zhuo],
Wang, Y.[Yi],
Wang, L.M.[Li-Min],
Qiao, Y.[Yu],
UniFormerV2: Unlocking the Potential of Image ViTs for Video
Understanding,
ICCV23(1632-1643)
IEEE DOI
2401
BibRef
Afham, M.[Mohamed],
Shukla, S.N.[Satya Narayan],
Poursaeed, O.[Omid],
Zhang, P.[Pengchuan],
Shah, A.[Ashish],
Lim, S.[Sernam],
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for
Long-form Video Understanding,
REDLCV23(1181-1186)
IEEE DOI
2401
BibRef
Strafforello, O.[Ombretta],
Schutte, K.[Klamer],
van Gemert, J.C.[Jan C.],
Are current long-term video understanding datasets long-term?,
CVEU23(2959-2968)
IEEE DOI
2401
BibRef
Zhao, Y.C.[Yu-Cheng],
Luo, C.[Chong],
Tang, C.X.[Chuan-Xin],
Chen, D.D.[Dong-Dong],
Codella, N.[Noel],
Zha, Z.J.[Zheng-Jun],
Streaming Video Model,
CVPR23(14602-14612)
IEEE DOI
2309
WWW Link.
BibRef
Maiya, S.R.[Shishira R],
Girish, S.[Sharath],
Ehrlich, M.[Max],
Wang, H.Y.[Han-Yu],
Lee, K.S.[Kwot Sin],
Poirson, P.[Patrick],
Wu, P.X.[Peng-Xiang],
Wang, C.[Chen],
Shrivastava, A.[Abhinav],
NIRVANA: Neural Implicit Representations of Videos with Adaptive
Networks and Autoregressive Patch-Wise Modeling,
CVPR23(14378-14387)
IEEE DOI
2309
BibRef
Zhang, Y.T.[Yi-Tian],
Bai, Y.[Yue],
Liu, C.[Chang],
Wang, H.[Huan],
Li, S.[Sheng],
Fu, Y.[Yun],
Frame Flexible Network,
CVPR23(10504-10513)
IEEE DOI
2309
WWW Link.
BibRef
Dessalene, E.[Eadom],
Maynord, M.[Michael],
Fermüller, C.[Cornelia],
Aloimonos, Y.F.[Yi-Fannis],
Therbligs in Action: Video Understanding through Motion Primitives,
CVPR23(10618-10626)
IEEE DOI
2309
BibRef
Zhao, Y.[Yue],
Misra, I.[Ishan],
Krähenbühl, P.[Philipp],
Girdhar, R.[Rohit],
Learning Video Representations from Large Language Models,
CVPR23(6586-6597)
IEEE DOI
2309
BibRef
Wang, R.[Rui],
Chen, D.D.[Dong-Dong],
Wu, Z.X.[Zu-Xuan],
Chen, Y.P.[Yin-Peng],
Dai, X.[Xiyang],
Liu, M.C.[Meng-Chen],
Yuan, L.[Lu],
Jiang, Y.G.[Yu-Gang],
Masked Video Distillation: Rethinking Masked Feature Modeling for
Self-supervised Video Representation Learning,
CVPR23(6312-6322)
IEEE DOI
2309
BibRef
Yang, X.T.[Xi-Tong],
Chu, F.J.[Fu-Jen],
Feiszli, M.[Matt],
Goyal, R.[Raghav],
Torresani, L.[Lorenzo],
Tran, D.[Du],
Relational Space-Time Query in Long-Form Videos,
CVPR23(6398-6408)
IEEE DOI
2309
BibRef
Foo, L.G.[Lin Geng],
Gong, J.[Jia],
Fan, Z.P.[Zhi-Peng],
Liu, J.[Jun],
System-Status-Aware Adaptive Network for Online Streaming Video
Understanding,
CVPR23(10514-10523)
IEEE DOI
2309
BibRef
Dong, S.[Sixun],
Hu, H.Z.[Hua-Zhang],
Lian, D.Z.[Dong-Ze],
Luo, W.X.[Wei-Xin],
Qian, Y.C.[Yi-Cheng],
Gao, S.H.[Sheng-Hua],
Weakly Supervised Video Representation Learning with Unaligned Text
for Sequential Videos,
CVPR23(2437-2447)
IEEE DOI
2309
BibRef
Wang, J.[Jue],
Zhu, W.T.[Wen-Tao],
Wang, P.[Pichao],
Yu, X.[Xiang],
Liu, L.[Linda],
Omar, M.[Mohamed],
Hamid, R.[Raffay],
Selective Structured State-Spaces for Long-Form Video Understanding,
CVPR23(6387-6397)
IEEE DOI
2309
BibRef
Zhang, H.[Heng],
Liu, D.[Daqing],
Zheng, Q.[Qi],
Su, B.[Bing],
Modeling Video as Stochastic Processes for Fine-Grained Video
Representation Learning,
CVPR23(2225-2234)
IEEE DOI
2309
WWW Link.
BibRef
Kumar, Y.[Yogesh],
Mishra, A.[Anand],
Few-Shot Referring Relationships in Videos,
CVPR23(2289-2298)
IEEE DOI
2309
BibRef
Harzig, P.[Philipp],
Einfalt, M.[Moritz],
Lienhart, R.[Rainer],
Synchronized Audio-Visual Frames with Fractional Positional Encoding
for Transformers in Video-to-Text Translation,
ICIP22(2041-2045)
IEEE DOI
2211
Image coding, Video on demand, Art, Transformers, Synchronization,
Machine translation, Task analysis, Video-to-text, Transformer, Audio-visual
BibRef
Wiles, O.[Olivia],
Carreira, J.[João],
Barr, I.[Iain],
Zisserman, A.[Andrew],
Malinowski, M.[Mateusz],
Compressed Vision for Efficient Video Understanding,
ACCV22(VII:679-695).
Springer DOI
2307
BibRef
Rho, D.[Daniel],
Cho, J.[Junwoo],
Ko, J.H.[Jong Hwan],
Park, E.[Eunbyung],
Neural Residual Flow Fields for Efficient Video Representations,
ACCV22(II:458-474).
Springer DOI
2307
BibRef
Tian, F.R.[Feng-Rui],
Fan, J.W.[Jia-Wei],
Yu, X.[Xie],
Du, S.Y.[Shao-Yi],
Song, M.[Meina],
Zhao, Y.[Yu],
TCVM: Temporal Contrasting Video Montage Framework for Self-Supervised
Video Representation Learning,
ACCV22(II:526-542).
Springer DOI
2307
BibRef
Huang, Z.M.[Zhi-Meng],
Jia, C.M.[Chuan-Min],
Wang, S.S.[Shan-She],
Ma, S.W.[Si-Wei],
A Compressive Prior Guided Mask Predictive Coding Approach for Video
Analysis,
ACCV22(IV:469-484).
Springer DOI
2307
BibRef
Li, L.[Li],
Zhuang, L.S.[Lian-Sheng],
Gao, S.H.[Sheng-Hua],
Wang, S.[Shafei],
Havit: Hybrid-attention Based Vision Transformer for Video
Classification,
ACCV22(IV:502-517).
Springer DOI
2307
BibRef
Zhang, H.L.[Huan-Le],
Pirsiavash, H.[Hamed],
Liu, X.[Xin],
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for
Few-shot Video Classification,
WACV23(2507-2516)
IEEE DOI
2302
Computational modeling, Benchmark testing, Transformers,
Algorithms: Machine learning architectures, formulations
BibRef
Senocak, A.[Arda],
Kim, J.[Junsik],
Oh, T.H.[Tae-Hyun],
Li, D.Z.[Ding-Zeyu],
Kweon, I.S.[In So],
Event-Specific Audio-Visual Fusion Layers:
A Simple and New Perspective on Video Understanding,
WACV23(2236-2246)
IEEE DOI
2302
Benchmark testing, Multisensory integration, Floods, Task analysis,
Algorithms: Vision + language and/or other modalities
BibRef
Xia, B.Y.[Bo-Yang],
Wu, W.H.[Wen-Hao],
Wang, H.R.[Hao-Ran],
Su, R.[Rui],
He, D.L.[Dong-Liang],
Yang, H.[Haosen],
Fan, X.R.[Xiao-Ran],
Ouyang, W.L.[Wan-Li],
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition,
ECCV22(XXXIV:705-723).
Springer DOI
2211
BibRef
Xia, B.Y.[Bo-Yang],
Wang, Z.H.[Zhi-Hao],
Wu, W.H.[Wen-Hao],
Wang, H.R.[Hao-Ran],
Han, J.G.[Jun-Gong],
Temporal Saliency Query Network for Efficient Video Recognition,
ECCV22(XXXIV:741-759).
Springer DOI
2211
BibRef
Islam, M.M.[Md Mohaiminul],
Bertasius, G.[Gedas],
Long Movie Clip Classification with State-Space Video Models,
ECCV22(XXXV:87-104).
Springer DOI
2211
BibRef
Habibian, A.[Amirhossein],
Yahia, H.B.[Haitam Ben],
Abati, D.[Davide],
Gavves, E.[Efstratios],
Porikli, F.M.[Fatih M.],
Delta Distillation for Efficient Video Processing,
ECCV22(XXXV:213-229).
Springer DOI
2211
BibRef
Li, Z.Z.[Zi-Zhang],
Wang, M.M.[Meng-Meng],
Pi, H.J.[Huai-Jin],
Xu, K.[Kechun],
Mei, J.B.[Jian-Biao],
Liu, Y.[Yong],
E-NeRV: Expedite Neural Video Representation with Disentangled
Spatial-Temporal Context,
ECCV22(XXXV:267-284).
Springer DOI
2211
BibRef
Kosman, E.[Eitan],
di Castro, D.[Dotan],
GraphVid: It only Takes a Few Nodes to Understand a Video,
ECCV22(XXXV:195-212).
Springer DOI
2211
BibRef
Ju, C.[Chen],
Han, T.[Tengda],
Zheng, K.[Kunhao],
Zhang, Y.[Ya],
Xie, W.[Weidi],
Prompting Visual-Language Models for Efficient Video Understanding,
ECCV22(XXXV:105-124).
Springer DOI
2211
BibRef
Liang, S.X.[Shu-Xian],
Shen, X.[Xu],
Huang, J.Q.[Jian-Qiang],
Hua, X.S.[Xian-Sheng],
Delving into Details: Synopsis-to-Detail Networks for Video Recognition,
ECCV22(IV:262-278).
Springer DOI
2211
BibRef
Ur Rehman, Y.A.[Yasar Abbas],
Gao, Y.[Yan],
Shen, J.J.[Jia-Jun],
de Gusmão, P.P.B.[Pedro Porto Buarque],
Lane, N.[Nicholas],
Federated Self-supervised Learning for Video Understanding,
ECCV22(XXXI:506-522).
Springer DOI
2211
BibRef
Dadashzadeh, A.[Amirhossein],
Whone, A.[Alan],
Mirmehdi, M.[Majid],
Auxiliary Learning for Self-Supervised Video Representation via
Similarity-based Knowledge Distillation,
L3D-IVU22(4230-4239)
IEEE DOI
2210
Representation learning, Knowledge engineering, Training,
Predictive models, Data models, Reliability
BibRef
Li, Y.[Yi],
Vasconcelos, N.M.[Nuno M.],
Improving Video Model Transfer with Dynamic Representation Learning,
CVPR22(19258-19269)
IEEE DOI
2210
Representation learning, Knowledge engineering,
Analytical models, Computational modeling, Transfer learning,
Video analysis and understanding
BibRef
Guo, S.[Sheng],
Xiong, Z.H.[Zi-Hua],
Zhong, Y.J.[Yu-Jie],
Wang, L.M.[Li-Min],
Guo, X.B.[Xiao-Bo],
Han, B.[Bing],
Huang, W.L.[Wei-Lin],
Cross-Architecture Self-supervised Video Representation Learning,
CVPR22(19248-19257)
IEEE DOI
2210
Representation learning, Video sequences,
Self-supervised learning, Predictive models, Video analysis and understanding
BibRef
Xu, X.Y.[Xin-Yu],
Li, Y.L.[Yong-Lu],
Lu, C.[Cewu],
Learning to Anticipate Future with Dynamic Context Removal,
CVPR22(12724-12734)
IEEE DOI
2210
WWW Link. Training, Visualization, Schedules, Uncertainty, Benchmark testing,
Transformers, Cognition, Visual reasoning, Video analysis and understanding
BibRef
Gadre, S.Y.[Samir Yitzhak],
Ehsani, K.[Kiana],
Song, S.[Shuran],
Mottaghi, R.[Roozbeh],
Continuous Scene Representations for Embodied AI,
CVPR22(14829-14839)
IEEE DOI
2210
Training, Representation learning, Visualization, Image analysis,
Navigation, Tracking, Robot vision systems, Robot vision,
Scene analysis and understanding
BibRef
Liang, C.[Chen],
Wang, W.G.[Wen-Guan],
Zhou, T.F.[Tian-Fei],
Yang, Y.[Yi],
Visual Abductive Reasoning,
CVPR22(15544-15554)
IEEE DOI
2210
Visualization, Reactive power, Transformers, Cognition,
Task analysis, Vision+language,
Video analysis and understanding
BibRef
Kinfu, K.A.[Kaleab A.],
Vidal, R.[René],
Analysis and Extensions of Adversarial Training for Video
Classification,
RoSe22(3415-3424)
IEEE DOI
2210
Training, Noise reduction,
Generative adversarial networks, Robustness
BibRef
Xiao, F.[Fanyi],
Kundu, K.[Kaustav],
Tighe, J.[Joseph],
Modolo, D.[Davide],
Hierarchical Self-supervised Representation Learning for Movie
Understanding,
CVPR22(9717-9726)
IEEE DOI
2210
Representation learning, Measurement, Soft sensors, Semantics,
Self-supervised learning, Benchmark testing, Motion pictures,
Video analysis and understanding
BibRef
Li, L.L.[Liu-Lei],
Zhou, T.F.[Tian-Fei],
Wang, W.G.[Wen-Guan],
Yang, L.[Lu],
Li, J.W.[Jian-Wu],
Yang, Y.[Yi],
Locality-Aware Inter-and Intra-Video Reconstruction for
Self-Supervised Correspondence Learning,
CVPR22(8709-8720)
IEEE DOI
2210
Representation learning, Location awareness, Visualization,
Semantics, Reconstruction algorithms, Encoding, grouping and shape analysis
BibRef
Jiang, Y.F.[Yi-Fan],
Gong, X.Y.[Xin-Yu],
Wu, J.[Junru],
Shi, H.[Humphrey],
Yan, Z.C.[Zhi-Cheng],
Wang, Z.Y.[Zhang-Yang],
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained
Neural Architecture Search,
WACV22(2354-2363)
IEEE DOI
2202
Computational modeling, Search methods,
X3D, Benchmark testing, Probabilistic logic,
Analysis and Understanding Deep Learning ->
Efficient Training and Inference Methods for Networks
BibRef
Chen, N.L.[Neng-Lun],
Chu, L.[Lei],
Pan, H.[Hao],
Lu, Y.[Yan],
Wang, W.P.[Wen-Ping],
Self-Supervised Image Representation Learning with Geometric Set
Consistency,
CVPR22(19270-19280)
IEEE DOI
2210
Image segmentation, Semantics, Training data, Object detection,
Image representation, Representation learning,
Self- semi- meta- unsupervised learning
BibRef
Lin, Y.Z.[Yuan-Ze],
Guo, X.[Xun],
Lu, Y.[Yan],
Self-Supervised Video Representation Learning with Meta-Contrastive
Network,
ICCV21(8219-8229)
IEEE DOI
2203
Training, Representation learning, Multitasking, Task analysis,
Transfer/Low-shot/Semi/Unsupervised Learning, Video analysis and understanding
BibRef
Guo, X.D.[Xu-Dong],
Guo, X.[Xun],
Lu, Y.[Yan],
SSAN: Separable Self-Attention Network for Video Representation
Learning,
CVPR21(12613-12622)
IEEE DOI
2111
Correlation, Pairwise error probability,
Computational modeling, Semantics, Cognition
BibRef
Yang, X.T.[Xi-Tong],
Fan, H.Q.[Hao-Qi],
Torresani, L.[Lorenzo],
Davis, L.S.[Larry S.],
Wang, H.[Heng],
Beyond Short Clips:
End-to-End Video-Level Learning with Collaborative Memories,
CVPR21(7563-7572)
IEEE DOI
2111
Training, Collaboration,
Predictive models, Fasteners
BibRef
Wu, C.Y.[Chao-Yuan],
Krähenbühl, P.[Philipp],
Towards Long-Form Video Understanding,
CVPR21(1884-1894)
IEEE DOI
2111
Visualization, Protocols, Computational modeling,
Machine vision, Benchmark testing
BibRef
Zhang, C.H.[Chu-Han],
Gupta, A.[Ankush],
Zisserman, A.[Andrew],
Temporal Query Networks for Fine-grained Video Understanding,
CVPR21(4484-4494)
IEEE DOI
2111
Training, Location awareness,
Videos
BibRef
Kangaspunta, J.[Juhana],
Piergiovanni, A.[AJ],
Jonschkowski, R.[Rico],
Ryoo, M.[Michael],
Angelova, A.[Anelia],
Adaptive Intermediate Representations for Video Understanding,
MULA21(1602-1612)
IEEE DOI
2109
Training, Visualization, Computational modeling,
Atmospheric modeling, Motion segmentation, Semantics, Performance gain
BibRef
Duan, H.D.[Hao-Dong],
Zhao, Y.[Yue],
Xiong, Y.J.[Yuan-Jun],
Liu, W.T.[Wen-Tao],
Lin, D.[Dahua],
Omni-sourced Webly-supervised Learning for Video Recognition,
ECCV20(XV:670-688).
Springer DOI
2011
BibRef
Jha, A.,
Kumar, A.,
Pande, S.,
Banerjee, B.,
Chaudhuri, S.,
MT-UNET: A Novel U-Net Based Multi-Task Architecture For Visual Scene
Understanding,
ICIP20(2191-2195)
IEEE DOI
2011
Task analysis, Decoding, Feature extraction, Semantics,
Loss measurement, Image segmentation, Estimation,
deep learning
BibRef
Diba, A.[Ali],
Fayyaz, M.[Mohsen],
Sharma, V.[Vivek],
Paluri, M.[Manohar],
Gall, J.[Jürgen],
Stiefelhagen, R.[Rainer],
Van Gool, L.J.[Luc J.],
Large Scale Holistic Video Understanding,
ECCV20(V:593-610).
Springer DOI
2011
BibRef
Voigtlaender, P.[Paul],
Changpinyo, S.[Soravit],
Pont-Tuset, J.[Jordi],
Soricut, R.[Radu],
Ferrari, V.[Vittorio],
Connecting Vision and Language with Video Localized Narratives,
CVPR23(2461-2471)
IEEE DOI
2309
BibRef
Pont-Tuset, J.[Jordi],
Uijlings, J.[Jasper],
Changpinyo, S.[Soravit],
Soricut, R.[Radu],
Ferrari, V.[Vittorio],
Connecting Vision and Language with Localized Narratives,
ECCV20(V:647-664).
Springer DOI
2011
BibRef
Hu, A.[Anthony],
Cotter, F.[Fergal],
Mohan, N.[Nikhil],
Gurau, C.[Corina],
Kendall, A.[Alex],
Probabilistic Future Prediction for Video Scene Understanding,
ECCV20(XVI: 767-785).
Springer DOI
2010
BibRef
Mavroudi, E.[Effrosyni],
Haro, B.B.[Benjamín Béjar],
Vidal, R.[René],
Representation Learning on Visual-Symbolic Graphs for Video
Understanding,
ECCV20(XXIX: 71-90).
Springer DOI
2010
BibRef
Sener, F.[Fadime],
Singhania, D.[Dipika],
Yao, A.[Angela],
Temporal Aggregate Representations for Long-range Video Understanding,
ECCV20(XVI: 154-171).
Springer DOI
2010
BibRef
Tosi, F.,
Aleotti, F.,
Ramirez, P.Z.,
Poggi, M.,
Salti, S.,
di Stefano, L.,
Mattoccia, S.,
Distilled Semantics for Comprehensive Scene Understanding from Videos,
CVPR20(4653-4664)
IEEE DOI
2008
Semantics, Optical imaging, Cameras, Videos, Training, Estimation,
Computer vision
BibRef
Piergiovanni, A.J.,
Angelova, A.[Anelia],
Ryoo, M.S.[Michael S.],
Evolving Losses for Unsupervised Video Representation Learning,
CVPR20(130-139)
IEEE DOI
2008
Task analysis, Optical losses, Labeling, Training,
Evolutionary computation, Kinetic theory, Loss measurement
BibRef
Xiong, Y.,
Huang, Q.,
Guo, L.,
Zhou, H.,
Zhou, B.,
Lin, D.,
A Graph-Based Framework to Bridge Movies and Synopses,
ICCV19(4591-4600)
IEEE DOI
2004
Code, Video Understanding.
WWW Link. entertainment, graph theory, video signal processing,
graph-based framework, video analytics, movie understanding,
Computer vision
BibRef
Kanehira, A.[Atsushi],
Takemoto, K.[Kentaro],
Inayoshi, S.[Sho],
Harada, T.[Tatsuya],
Multimodal Explanations by Predicting Counterfactuality in Videos,
CVPR19(8586-8594).
IEEE DOI
2002
BibRef
Kanehira, A.[Atsushi],
Harada, T.[Tatsuya],
Learning to Explain With Complemental Examples,
CVPR19(8595-8603).
IEEE DOI
2002
BibRef
Zhou, L.[Luowei],
Kalantidis, Y.[Yannis],
Chen, X.L.[Xin-Lei],
Corso, J.J.[Jason J.],
Rohrbach, M.[Marcus],
Grounded Video Description,
CVPR19(6571-6580).
IEEE DOI
2002
BibRef
Liu, X.Y.[Xing-Yu],
Lee, J.Y.[Joon-Young],
Jin, H.L.[Hai-Lin],
Learning Video Representations From Correspondence Proposals,
CVPR19(4268-4276).
IEEE DOI
2002
BibRef
Xiong, B.[Bo],
Kalantidis, Y.[Yannis],
Ghadiyaram, D.[Deepti],
Grauman, K.[Kristen],
Less Is More: Learning Highlight Detection From Video Duration,
CVPR19(1258-1267).
IEEE DOI
2002
BibRef
Zhang, D.[Da],
Dai, X.[Xiyang],
Wang, X.[Xin],
Wang, Y.F.[Yuan-Fang],
Davis, L.S.[Larry S.],
MAN: Moment Alignment Network for Natural Language Moment Retrieval via
Iterative Graph Adjustment,
CVPR19(1247-1257).
IEEE DOI
2002
Key moments in scene.
BibRef
Fan, L.,
Huang, W.,
Gan, C.,
Ermon, S.,
Gong, B.,
Huang, J.,
End-to-End Learning of Motion Representation for Video Understanding,
CVPR18(6016-6025)
IEEE DOI
1812
Optical imaging, Task analysis, Optical computing, Training,
Optical fiber networks, Brightness, Neural networks
BibRef
Huang, D.,
Ramanathan, V.,
Mahajan, D.,
Torresani, L.,
Paluri, M.,
Fei-Fei, L.,
Niebles, J.C.,
What Makes a Video a Video: Analyzing Temporal Information in Video
Understanding Models and Datasets,
CVPR18(7366-7375)
IEEE DOI
1812
Analytical models, Generators, Kinetic theory, Visualization,
Upper bound, Testing, Training
BibRef
Mahdisoltani, F.[Farzaneh],
Memisevic, R.[Roland],
Fleet, D.J.[David J.],
Hierarchical Video Understanding,
WiCV-E18(IV:659-663).
Springer DOI
1905
BibRef
Shin, K.S.[Kwang-Soo],
Jeon, J.[Junhyeong],
Lee, S.[Seungbin],
Lim, B.[Boyoung],
Jeong, M.S.[Min-Soo],
Nang, J.[Jongho],
Approach for Video Classification with Multi-label on YouTube-8M
Dataset,
Large-Scale18(IV:317-324).
Springer DOI
1905
BibRef
Skalic, M.[Miha],
Austin, D.[David],
Building A Size Constrained Predictive Models for Video Classification,
Large-Scale18(IV:297-305).
Springer DOI
1905
BibRef
Garg, S.[Shivam],
Learning Video Features for Multi-label Classification,
Large-Scale18(IV:325-337).
Springer DOI
1905
BibRef
Cho, C.[Choongyeun],
Antin, B.[Benjamin],
Arora, S.[Sanchit],
Ashrafi, S.[Shwan],
Duan, P.L.[Pei-Lin],
Huynh, D.T.[Dang The],
James, L.[Lee],
Nguyen, H.T.[Hang Tuan],
Solgi, M.[Mojtaba],
Than, C.V.[Cuong Van],
Large-Scale Video Classification with Feature Space Augmentation
Coupled with Learned Label Relations and Ensembling,
Large-Scale18(IV:338-346).
Springer DOI
1905
BibRef
Lin, R.C.[Rong-Cheng],
Xiao, J.[Jing],
Fan, J.P.[Jian-Ping],
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-Level Features
for Large-Scale Video Classification,
Large-Scale18(IV:206-218).
Springer DOI
1905
BibRef
Tang, Y.Y.[Yong-Yi],
Zhang, X.[Xing],
Wang, J.W.[Jing-Wen],
Chen, S.X.[Shao-Xiang],
Ma, L.[Lin],
Jiang, Y.G.[Yu-Gang],
Non-local NetVLAD Encoding for Video Classification,
Large-Scale18(IV:219-228).
Springer DOI
1905
BibRef
Kmiec, S.[Sebastian],
Bae, J.[Juhan],
An, R.J.[Rui-Jian],
Learnable Pooling Methods for Video Classification,
Large-Scale18(IV:229-238).
Springer DOI
1905
BibRef
Liu, T.Q.[Tian-Qi],
Liu, B.[Bo],
Constrained-Size Tensorflow Models for YouTube-8M Video Understanding
Challenge,
Large-Scale18(IV:239-249).
Springer DOI
1905
BibRef
Lee, J.[Joonseok],
Natsev, A.P.[Apostol Paul],
Reade, W.[Walter],
Sukthankar, R.[Rahul],
Toderici, G.[George],
The 2nd YouTube-8M Large-Scale Video Understanding Challenge,
Large-Scale18(IV:193-205).
Springer DOI
1905
BibRef
Zolfaghari, M.[Mohammadreza],
Singh, K.[Kamaljeet],
Brox, T.[Thomas],
ECO: Efficient Convolutional Network for Online Video Understanding,
ECCV18(II: 713-730).
Springer DOI
1810
BibRef
Sah, S.,
Nguyen, T.,
Dominguez, M.,
Such, F.P.,
Ptucha, R.,
Temporally Steered Gaussian Attention for Video Understanding,
DeepLearn-T17(2208-2216)
IEEE DOI
1709
Computational modeling, Decoding, Semantics, Standards,
Streaming media, Training, Visualization
BibRef
Jiang, Y.G.[Yu-Gang],
Ye, G.[Guangnan],
Chang, S.F.[Shih-Fu],
Ellis, D.[Daniel],
Loui, A.C.[Alexander C.],
Consumer video understanding: a benchmark database and an evaluation of
human and machine performance,
ICMR11(29).
DOI Link
1301
BibRef
Yang, Y.[Yang],
Liu, J.G.[Jin-Gen],
Shah, M.[Mubarak],
Video Scene Understanding Using Multi-scale Analysis,
ICCV09(1669-1676).
IEEE DOI
0909
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Surveillance Video Summarization, Surveillance Synopsis .