Deep Video Understanding Dataset,
2020, used for workshops, and challenges.
WWW Link.
Dataset, Video Understanding.
Brostow, G.J.[Gabriel J.],
Fauqueur, J.[Julien],
Cipolla, R.[Roberto],
Semantic object classes in video:
A high-definition ground truth database,
PRL(30), No. 2, 15 January 2009, pp. 88-97.
Elsevier DOI
0804
Object recognition; Video database; Video understanding; Semantic
segmentation; Label propagation
BibRef
Aodha, O.M.[Oisin Mac],
Brostow, G.J.[Gabriel J.],
Pollefeys, M.[Marc],
Segmenting video into classes of algorithm-suitability,
CVPR10(1054-1061).
IEEE DOI
1006
BibRef
Suresha, M.,
Kuppa, S.,
Raghukumar, D.S.,
A study on deep learning spatiotemporal models and feature extraction
techniques for video understanding,
MultInfoRetr(9), No. 2, June 2020, pp. 81-101.
Springer DOI
2005
BibRef
Kavoosifar, M.R.[Mohammad Reza],
Apiletti, D.[Daniele],
Baralis, E.[Elena],
Garza, P.[Paolo],
Huet, B.[Benoit],
Effective video hyperlinking by means of enriched feature sets and
monomodal query combinations,
MultInfoRetr(9), No. 3, September 2020, pp. 215-227.
Springer DOI
2008
BibRef
Tang, P.J.[Peng-Jie],
Tan, Y.L.[Yun-Lan],
Li, J.Z.[Jin-Zhong],
Tan, B.[Bin],
Translating video into language by enhancing visual and language
representations,
JVCIR(72), 2020, pp. 102875.
Elsevier DOI
2010
Video description, Feature enhancing, CNN, LSTM, Semantic
BibRef
Yu, J.,
Jiang, X.,
Qin, Z.,
Zhang, W.,
Hu, Y.,
Wu, Q.,
Learning Dual Encoding Model for Adaptive Visual Understanding in
Visual Dialogue,
IP(30), 2021, pp. 220-233.
IEEE DOI
2011
Visualization, Semantics, History, Task analysis, Cognition,
Feature extraction, Adaptation models, Dual encoding,
visual dialogue
BibRef
Duan, J.H.[Jin-Hao],
Xu, H.[Hua],
Lin, X.Z.[Xiao-Zhu],
Zhu, S.C.[Shang-Chao],
Du, Y.Z.[Yuan-Ze],
Multi-semantic long-range dependencies capturing for efficient video
representation learning,
IVC(104), 2020, pp. 103988.
Elsevier DOI
2012
Video representation learning,
Long-range dependencies capturing, Video classification
BibRef
Tan, H.L.[Hui Li],
Zhu, H.Y.[Hong-Yuan],
Lim, J.H.[Joo-Hwee],
Tan, C.[Cheston],
A comprehensive survey of procedural video datasets,
CVIU(202), 2021, pp. 103107.
Elsevier DOI
2012
Video datasets, depicting series of actions performed in some
constrained but non-unique order to achieve some intended high-level
goal.
BibRef
Lin, J.[Ji],
Gan, C.[Chuang],
Wang, K.[Kuan],
Han, S.[Song],
TSM: Temporal Shift Module for Efficient and Scalable Video
Understanding on Edge Devices,
PAMI(44), No. 5, May 2022, pp. 2760-2774.
IEEE DOI
2204
BibRef
Earlier: A1, A2, A4, Only:
TSM: Temporal Shift Module for Efficient Video Understanding,
ICCV19(7082-7092)
IEEE DOI
2004
Code, Video Understanding.
WWW Link. Computational modeling, Convolution, Streaming media, Training,
Solid modeling, Temporal shift module, video recognition,
network dissection.
convolutional neural nets, object detection,
video signal processing, video streaming, Real-time systems
BibRef
Zhou, W.[Wei],
Hou, Y.[Yi],
Ouyang, K.W.[Ke-Wei],
Zhou, S.L.[Shi-Lin],
Exploring complementary information of self-supervised pretext tasks
for unsupervised video pre-training,
IET-CV(16), No. 3, 2022, pp. 255-265.
DOI Link
2204
Both knowledge distillation and self-supervised learning.
convolutional neural nets, feature extraction,
unsupervised learning, video signal processing, image sequences
BibRef
Li, Z.Q.[Zhen-Qiang],
Wang, W.M.[Wei-Min],
Li, Z.Y.[Zuo-Yue],
Huang, Y.F.[Yi-Fei],
Sato, Y.[Yoichi],
Spatio-Temporal Perturbations for Video Attribution,
CirSysVideo(32), No. 4, April 2022, pp. 2043-2056.
IEEE DOI
2204
Measurement, Reliability, Task analysis, Spatiotemporal phenomena,
Visualization, Heating systems, Perturbation methods, video understanding
BibRef
Lin, Y.Z.[Yuan-Ze],
Guo, X.[Xun],
Lu, Y.[Yan],
Self-Supervised Video Representation Learning with Meta-Contrastive
Network,
ICCV21(8219-8229)
IEEE DOI
2203
Training, Representation learning, Multitasking, Task analysis,
Transfer/Low-shot/Semi/Unsupervised Learning, Video analysis and understanding
BibRef
Guo, X.D.[Xu-Dong],
Guo, X.[Xun],
Lu, Y.[Yan],
SSAN: Separable Self-Attention Network for Video Representation
Learning,
CVPR21(12613-12622)
IEEE DOI
2111
Correlation, Pairwise error probability,
Computational modeling, Semantics, Cognition, Pattern recognition
BibRef
Yang, X.T.[Xi-Tong],
Fan, H.Q.[Hao-Qi],
Torresani, L.[Lorenzo],
Davis, L.S.[Larry S.],
Wang, H.[Heng],
Beyond Short Clips:
End-to-End Video-Level Learning with Collaborative Memories,
CVPR21(7563-7572)
IEEE DOI
2111
Training, Collaboration, Computer architecture,
Predictive models, Fasteners, Pattern recognition
BibRef
Wu, C.Y.[Chao-Yuan],
Krähenbühl, P.[Philipp],
Towards Long-Form Video Understanding,
CVPR21(1884-1894)
IEEE DOI
2111
Visualization, Protocols, Computational modeling,
Machine vision, Computer architecture, Benchmark testing
BibRef
Zhang, C.H.[Chu-Han],
Gupta, A.[Ankush],
Zisserman, A.[Andrew],
Temporal Query Networks for Fine-grained Video Understanding,
CVPR21(4484-4494)
IEEE DOI
2111
Training, Location awareness,
Computer architecture, Pattern recognition, Videos
BibRef
Kangaspunta, J.[Juhana],
Piergiovanni, A.[AJ],
Jonschkowski, R.[Rico],
Ryoo, M.[Michael],
Angelova, A.[Anelia],
Adaptive Intermediate Representations for Video Understanding,
MULA21(1602-1612)
IEEE DOI
2109
Training, Visualization, Computational modeling,
Atmospheric modeling, Motion segmentation, Semantics, Performance gain
BibRef
Duan, H.D.[Hao-Dong],
Zhao, Y.[Yue],
Xiong, Y.J.[Yuan-Jun],
Liu, W.T.[Wen-Tao],
Lin, D.[Dahua],
Omni-sourced Webly-supervised Learning for Video Recognition,
ECCV20(XV:670-688).
Springer DOI
2011
BibRef
Jha, A.,
Kumar, A.,
Pande, S.,
Banerjee, B.,
Chaudhuri, S.,
MT-UNET: A Novel U-Net Based Multi-Task Architecture For Visual Scene
Understanding,
ICIP20(2191-2195)
IEEE DOI
2011
Task analysis, Decoding, Feature extraction, Semantics,
Loss measurement, Image segmentation, Estimation,
deep learning
BibRef
Diba, A.[Ali],
Fayyaz, M.[Mohsen],
Sharma, V.[Vivek],
Paluri, M.[Manohar],
Gall, J.[Jürgen],
Stiefelhagen, R.[Rainer],
Van Gool, L.J.[Luc J.],
Large Scale Holistic Video Understanding,
ECCV20(V:593-610).
Springer DOI
2011
BibRef
Pont-Tuset, J.[Jordi],
Uijlings, J.[Jasper],
Changpinyo, S.[Soravit],
Soricut, R.[Radu],
Ferrari, V.[Vittorio],
Connecting Vision and Language with Localized Narratives,
ECCV20(V:647-664).
Springer DOI
2011
BibRef
Hu, A.[Anthony],
Cotter, F.[Fergal],
Mohan, N.[Nikhil],
Gurau, C.[Corina],
Kendall, A.[Alex],
Probabilistic Future Prediction for Video Scene Understanding,
ECCV20(XVI: 767-785).
Springer DOI
2010
BibRef
Mavroudi, E.[Effrosyni],
Haro, B.B.[Benjamín Béjar],
Vidal, R.[René],
Representation Learning on Visual-Symbolic Graphs for Video
Understanding,
ECCV20(XXIX: 71-90).
Springer DOI
2010
BibRef
Sener, F.[Fadime],
Singhania, D.[Dipika],
Yao, A.[Angela],
Temporal Aggregate Representations for Long-range Video Understanding,
ECCV20(XVI: 154-171).
Springer DOI
2010
BibRef
Tosi, F.,
Aleotti, F.,
Ramirez, P.Z.,
Poggi, M.,
Salti, S.,
di Stefano, L.,
Mattoccia, S.,
Distilled Semantics for Comprehensive Scene Understanding from Videos,
CVPR20(4653-4664)
IEEE DOI
2008
Semantics, Optical imaging, Cameras, Videos, Training, Estimation,
Computer vision
BibRef
Piergiovanni, A.J.,
Angelova, A.[Anelia],
Ryoo, M.S.[Michael S.],
Evolving Losses for Unsupervised Video Representation Learning,
CVPR20(130-139)
IEEE DOI
2008
Task analysis, Optical losses, Labeling, Training,
Evolutionary computation, Kinetic theory, Loss measurement
BibRef
Xiong, Y.,
Huang, Q.,
Guo, L.,
Zhou, H.,
Zhou, B.,
Lin, D.,
A Graph-Based Framework to Bridge Movies and Synopses,
ICCV19(4591-4600)
IEEE DOI
2004
Code, Video Understanding.
WWW Link. entertainment, graph theory, video signal processing,
graph-based framework, video analytics, movie understanding,
Computer vision
BibRef
Kanehira, A.[Atsushi],
Takemoto, K.[Kentaro],
Inayoshi, S.[Sho],
Harada, T.[Tatsuya],
Multimodal Explanations by Predicting Counterfactuality in Videos,
CVPR19(8586-8594).
IEEE DOI
2002
BibRef
Kanehira, A.[Atsushi],
Harada, T.[Tatsuya],
Learning to Explain With Complemental Examples,
CVPR19(8595-8603).
IEEE DOI
2002
BibRef
Zhou, L.[Luowei],
Kalantidis, Y.[Yannis],
Chen, X.L.[Xin-Lei],
Corso, J.J.[Jason J.],
Rohrbach, M.[Marcus],
Grounded Video Description,
CVPR19(6571-6580).
IEEE DOI
2002
BibRef
Liu, X.Y.[Xing-Yu],
Lee, J.Y.[Joon-Young],
Jin, H.L.[Hai-Lin],
Learning Video Representations From Correspondence Proposals,
CVPR19(4268-4276).
IEEE DOI
2002
BibRef
Xiong, B.[Bo],
Kalantidis, Y.[Yannis],
Ghadiyaram, D.[Deepti],
Grauman, K.[Kristen],
Less Is More: Learning Highlight Detection From Video Duration,
CVPR19(1258-1267).
IEEE DOI
2002
BibRef
Zhang, D.[Da],
Dai, X.[Xiyang],
Wang, X.[Xin],
Wang, Y.F.[Yuan-Fang],
Davis, L.S.[Larry S.],
MAN: Moment Alignment Network for Natural Language Moment Retrieval via
Iterative Graph Adjustment,
CVPR19(1247-1257).
IEEE DOI
2002
Key moments in scene.
BibRef
Fan, L.,
Huang, W.,
Gan, C.,
Ermon, S.,
Gong, B.,
Huang, J.,
End-to-End Learning of Motion Representation for Video Understanding,
CVPR18(6016-6025)
IEEE DOI
1812
Optical imaging, Task analysis, Optical computing, Training,
Optical fiber networks, Brightness, Neural networks
BibRef
Huang, D.,
Ramanathan, V.,
Mahajan, D.,
Torresani, L.,
Paluri, M.,
Fei-Fei, L.,
Niebles, J.C.,
What Makes a Video a Video: Analyzing Temporal Information in Video
Understanding Models and Datasets,
CVPR18(7366-7375)
IEEE DOI
1812
Analytical models, Generators, Kinetic theory, Visualization,
Upper bound, Testing, Training
BibRef
Mahdisoltani, F.[Farzaneh],
Memisevic, R.[Roland],
Fleet, D.J.[David J.],
Hierarchical Video Understanding,
WiCV-E18(IV:659-663).
Springer DOI
1905
BibRef
Shin, K.S.[Kwang-Soo],
Jeon, J.[Junhyeong],
Lee, S.[Seungbin],
Lim, B.[Boyoung],
Jeong, M.S.[Min-Soo],
Nang, J.[Jongho],
Approach for Video Classification with Multi-label on YouTube-8M
Dataset,
Large-Scale18(IV:317-324).
Springer DOI
1905
BibRef
Skalic, M.[Miha],
Austin, D.[David],
Building A Size Constrained Predictive Models for Video Classification,
Large-Scale18(IV:297-305).
Springer DOI
1905
BibRef
Garg, S.[Shivam],
Learning Video Features for Multi-label Classification,
Large-Scale18(IV:325-337).
Springer DOI
1905
BibRef
Cho, C.[Choongyeun],
Antin, B.[Benjamin],
Arora, S.[Sanchit],
Ashrafi, S.[Shwan],
Duan, P.[Peilin],
Huynh, D.T.[Dang The],
James, L.[Lee],
Nguyen, H.T.[Hang Tuan],
Solgi, M.[Mojtaba],
Than, C.V.[Cuong Van],
Large-Scale Video Classification with Feature Space Augmentation
Coupled with Learned Label Relations and Ensembling,
Large-Scale18(IV:338-346).
Springer DOI
1905
BibRef
Lin, R.C.[Rong-Cheng],
Xiao, J.[Jing],
Fan, J.P.[Jian-Ping],
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-Level Features
for Large-Scale Video Classification,
Large-Scale18(IV:206-218).
Springer DOI
1905
BibRef
Tang, Y.Y.[Yong-Yi],
Zhang, X.[Xing],
Wang, J.W.[Jing-Wen],
Chen, S.X.[Shao-Xiang],
Ma, L.[Lin],
Jiang, Y.G.[Yu-Gang],
Non-local NetVLAD Encoding for Video Classification,
Large-Scale18(IV:219-228).
Springer DOI
1905
BibRef
Kmiec, S.[Sebastian],
Bae, J.[Juhan],
An, R.[Ruijian],
Learnable Pooling Methods for Video Classification,
Large-Scale18(IV:229-238).
Springer DOI
1905
BibRef
Liu, T.Q.[Tian-Qi],
Liu, B.[Bo],
Constrained-Size Tensorflow Models for YouTube-8M Video Understanding
Challenge,
Large-Scale18(IV:239-249).
Springer DOI
1905
BibRef
Lee, J.[Joonseok],
Natsev, A.P.[Apostol Paul],
Reade, W.[Walter],
Sukthankar, R.[Rahul],
Toderici, G.[George],
The 2nd YouTube-8M Large-Scale Video Understanding Challenge,
Large-Scale18(IV:193-205).
Springer DOI
1905
BibRef
Zolfaghari, M.[Mohammadreza],
Singh, K.[Kamaljeet],
Brox, T.[Thomas],
ECO: Efficient Convolutional Network for Online Video Understanding,
ECCV18(II: 713-730).
Springer DOI
1810
BibRef
Sah, S.,
Nguyen, T.,
Dominguez, M.,
Such, F.P.,
Ptucha, R.,
Temporally Steered Gaussian Attention for Video Understanding,
DeepLearn-T17(2208-2216)
IEEE DOI
1709
Computational modeling, Decoding, Semantics, Standards,
Streaming media, Training, Visualization
BibRef
Jiang, Y.G.[Yu-Gang],
Ye, G.[Guangnan],
Chang, S.F.[Shih-Fu],
Ellis, D.[Daniel],
Loui, A.C.[Alexander C.],
Consumer video understanding: a benchmark database and an evaluation of
human and machine performance,
ICMR11(29).
DOI Link
1301
BibRef
Yang, Y.[Yang],
Liu, J.G.[Jin-Gen],
Shah, M.[Mubarak],
Video Scene Understanding Using Multi-scale Analysis,
ICCV09(1669-1676).
IEEE DOI
0909
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Surveillance Video Summarization, Surveillance Synopsis .