Qiu, Z.F.[Zhao-Fan],
Yao, T.[Ting],
Mei, T.[Tao],
Learning Deep Spatio-Temporal Dependence for Semantic Video
Segmentation,
MultMed(20), No. 4, April 2018, pp. 939-949.
IEEE DOI
1804
BibRef
Earlier:
Learning Spatio-Temporal Representation with Pseudo-3D Residual
Networks,
ICCV17(5534-5542)
IEEE DOI
1802
3D from 2D nets.
Image segmentation, Semantics,
Streaming media,
video segmentation
convolution, feature extraction, image classification,
image recognition, image representation,
Visualization
BibRef
Qiu, Z.F.[Zhao-Fan],
Yao, T.[Ting],
Ngo, C.W.[Chong-Wah],
Tian, X.M.[Xin-Mei],
Mei, T.[Tao],
Learning Spatio-Temporal Representation With Local and Global Diffusion,
CVPR19(12048-12057).
IEEE DOI
2002
BibRef
Yao, T.,
Pan, Y.,
Li, Y.,
Qiu, Z.,
Mei, T.,
Boosting Image Captioning with Attributes,
ICCV17(4904-4912)
IEEE DOI
1802
BibRef
And: A2, A1, A3, A5, Only:
Video Captioning with Transferred Semantic Attributes,
CVPR17(984-992)
IEEE DOI
1711
image representation,
learning (artificial intelligence),
Semantics.
Natural languages,
Probability distribution, Recurrent neural networks, Visualization
BibRef
Zhao, B.,
Li, X.,
Lu, X.,
CAM-RNN: Co-Attention Model Based RNN for Video Captioning,
IP(28), No. 11, November 2019, pp. 5552-5565.
IEEE DOI
1909
Visualization, Task analysis, Logic gates,
Recurrent neural networks, Dogs, Semantics, Decoding,
recurrent neural network
BibRef
Yan, C.,
Tu, Y.,
Wang, X.,
Zhang, Y.,
Hao, X.,
Zhang, Y.,
Dai, Q.,
STAT: Spatial-Temporal Attention Mechanism for Video Captioning,
MultMed(22), No. 1, January 2020, pp. 229-241.
IEEE DOI
2001
BibRef
And:
Corrections:
MultMed(22), No. 3, March 2020, pp. 830-830.
IEEE DOI
2003
Video captioning, spatial-temporal attention mechanism,
encoder-decoder neural networks.
Mechatronics, Automation, Streaming media
BibRef
Aafaq, N.[Nayyer],
Mian, A.[Ajmal],
Liu, W.[Wei],
Gilani, S.Z.[Syed Zulqarnain],
Shah, M.[Mubarak],
Video Description:
A Survey of Methods, Datasets, and Evaluation Metrics,
Surveys(52), No. 6, October 2019, pp. xx-yy.
DOI Link
2001
video to text, Video description, video captioning, language in vision
BibRef
Zhang, Z.,
Xu, D.,
Ouyang, W.,
Tan, C.,
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue
Aided Sentence Summarization,
CirSysVideo(30), No. 9, September 2020, pp. 3130-3139.
IEEE DOI
2009
Proposals, Visualization, Image segmentation, Feature extraction,
Semantics, Decoding, Task analysis, Dense video captioning,
hierarchical attention mechanism
BibRef
Zhang, W.[Wei],
Wang, B.R.[Bai-Rui],
Ma, L.[Lin],
Liu, W.[Wei],
Reconstruct and Represent Video Contents for Captioning via
Reinforcement Learning,
PAMI(42), No. 12, December 2020, pp. 3088-3101.
IEEE DOI
2011
Decoding, Image reconstruction, Semantics, Training data,
Visualization, Video sequences, Video captioning,
backward information
BibRef
Lee, S.[Sujin],
Kim, I.[Incheol],
DVC-Net: A deep neural network model for dense video captioning,
IET-CV(15), No. 1, 2021, pp. 12-23.
DOI Link
2106
BibRef
Qi, S.S.[Shan-Shan],
Yang, L.X.[Lu-Xi],
Video captioning via a symmetric bidirectional decoder,
IET-CV(15), No. 4, 2021, pp. 283-296.
DOI Link
2106
BibRef
Li, L.[Linghui],
Zhang, Y.D.[Yong-Dong],
Tang, S.[Sheng],
Xie, L.X.[Ling-Xi],
Li, X.Y.[Xiao-Yong],
Tian, Q.[Qi],
Adaptive Spatial Location With Balanced Loss for Video Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 17-30.
IEEE DOI
2201
Task analysis, Redundancy, Feature extraction, Visualization,
Detectors, Training, Convolutional neural network,
balanced loss
BibRef
Zheng, Y.[Yi],
Zhang, Y.[Yuejie],
Feng, R.[Rui],
Zhang, T.[Tao],
Fan, W.G.[Wei-Guo],
Stacked Multimodal Attention Network for Context-Aware Video
Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 31-42.
IEEE DOI
2201
Feature extraction, Visualization, Decoding, Training,
Biological system modeling, Context modeling, Predictive models,
reinforcement learning
BibRef
Li, L.[Liang],
Gao, X.Y.[Xing-Yu],
Deng, J.[Jincan],
Tu, Y.[Yunbin],
Zha, Z.J.[Zheng-Jun],
Huang, Q.M.[Qing-Ming],
Long Short-Term Relation Transformer With Global Gating for Video
Captioning,
IP(31), 2022, pp. 2726-2738.
IEEE DOI
2204
Transformers, Cognition, Visualization, Feature extraction, Decoding,
Task analysis, Semantics, Video captioning, relational reasoning, transformer
BibRef
Munusamy, H.[Hemalatha],
Sekhar, C.C.[C. Chandra],
Video captioning using Semantically Contextual Generative Adversarial
Network,
CVIU(221), 2022, pp. 103453.
Elsevier DOI
2206
Video captioning, Generative adversarial network,
Reinforcement learning, Generator, Discriminator
BibRef
Wang, H.[Hao],
Lin, G.S.[Guo-Sheng],
Hoi, S.C.H.[Steven C. H.],
Miao, C.Y.[Chun-Yan],
Cross-Modal Graph With Meta Concepts for Video Captioning,
IP(31), 2022, pp. 5150-5162.
IEEE DOI
2208
Semantics, Visualization, Feature extraction, Predictive models,
Task analysis, Computational modeling, Location awareness, vision-and-language
BibRef
Xiao, H.[Huanhou],
Shi, J.L.[Jing-Lun],
Diverse video captioning through latent variable expansion,
PRL(160), 2022, pp. 19-25.
Elsevier DOI
2208
Latent variables, Diverse captions, CGAN
BibRef
Prudviraj, J.[Jeripothula],
Reddy, M.I.[Malipatel Indrakaran],
Vishnu, C.[Chalavadi],
Mohan, C.K.[Chalavadi Krishna],
AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated
Transformer for Multisentence Video Description,
IP(31), 2022, pp. 5559-5569.
IEEE DOI
2209
Transformers, Streaming media, Task analysis, Visualization,
Video description, Correlation, Natural languages,
transformers
BibRef
Xu, W.[Wanru],
Miao, Z.J.[Zhen-Jiang],
Yu, J.[Jian],
Tian, Y.[Yi],
Wan, L.[Lili],
Ji, Q.[Qiang],
Bridging Video and Text:
A Two-Step Polishing Transformer for Video Captioning,
CirSysVideo(32), No. 9, September 2022, pp. 6293-6307.
IEEE DOI
2209
Semantics, Visualization, Decoding, Transformers, Task analysis,
Planning, Training, Video captioning, transformer,
cross-modal modeling
BibRef
Wu, B.F.[Bo-Feng],
Niu, G.C.[Guo-Cheng],
Yu, J.[Jun],
Xiao, X.Y.[Xin-Yan],
Zhang, J.[Jian],
Wu, H.[Hua],
Towards Knowledge-Aware Video Captioning via Transitive Visual
Relationship Detection,
CirSysVideo(32), No. 10, October 2022, pp. 6753-6765.
IEEE DOI
2210
Visualization, Task analysis, Semantics, Feature extraction,
Decoding, Training, Vocabulary, Video captioning,
natural language process
BibRef
Yan, L.Q.[Li-Qi],
Ma, S.Q.[Si-Qi],
Wang, Q.F.[Qi-Fan],
Chen, Y.J.[Ying-Jie],
Zhang, X.Y.[Xiang-Yu],
Savakis, A.[Andreas],
Liu, D.F.[Dong-Fang],
Video Captioning Using Global-Local Representation,
CirSysVideo(32), No. 10, October 2022, pp. 6642-6656.
IEEE DOI
2210
Training, Task analysis, Visualization, Vocabulary, Semantics,
Decoding, Correlation, video captioning, video representation,
visual analysis
BibRef
Subramaniam, A.[Arulkumar],
Vaidya, J.[Jayesh],
Ameen, M.A.M.[Muhammed Abdul Majeed],
Nambiar, A.[Athira],
Mittal, A.[Anurag],
Co-segmentation inspired attention module for video-based computer
vision tasks,
CVIU(223), 2022, pp. 103532.
Elsevier DOI
2210
Attention, Co-segmentation, Person re-ID, Video-captioning, Video classification
BibRef
Liu, F.L.[Feng-Lin],
Wu, X.[Xian],
You, C.[Chenyu],
Ge, S.[Shen],
Zou, Y.X.[Yue-Xian],
Sun, X.[Xu],
Aligning Source Visual and Target Language Domains for Unpaired Video
Captioning,
PAMI(44), No. 12, December 2022, pp. 9255-9268.
IEEE DOI
2212
Visualization, Pipelines, Training, Data models, Decoding,
Task analysis, Feature extraction, Video captioning, adversarial training
BibRef
Yuan, Y.T.[Yi-Tian],
Ma, L.[Lin],
Zhu, W.[Wenwu],
Syntax Customized Video Captioning by Imitating Exemplar Sentences,
PAMI(44), No. 12, December 2022, pp. 10209-10221.
IEEE DOI
2212
Syntactics, Semantics, Task analysis, Training, Decoding, Encoding,
Recurrent neural networks, Video captioning,
recurrent neural network
BibRef
Chen, H.R.[Hao-Ran],
Li, J.[Jianmin],
Frintrop, S.[Simone],
Hu, X.L.[Xiao-Lin],
The MSR-Video to Text dataset with clean annotations,
CVIU(225), 2022, pp. 103581.
Elsevier DOI
2212
MSR-VTT dataset, Data cleaning, Data analysis, Video captioning
BibRef
Moctezuma, D.[Daniela],
Ramírez-delReal, T.[Tania],
Ruiz, G.[Guillermo],
González-Chávez, O.[Othón],
Video captioning: A comparative review of where we are and which
could be the route,
CVIU(231), 2023, pp. 103671.
Elsevier DOI
2305
Natural language processing, Video captioning, Image understanding
BibRef
Ye, H.H.[Han-Hua],
Li, G.R.[Guo-Rong],
Qi, Y.[Yuankai],
Wang, S.H.[Shu-Hui],
Huang, Q.M.[Qing-Ming],
Yang, M.H.[Ming-Hsuan],
Hierarchical Modular Network for Video Captioning,
CVPR22(17918-17927)
IEEE DOI
2210
Bridges, Representation learning, Visualization, Semantics,
Supervised learning, Linguistics, Vision + language
BibRef
Lin, K.[Kevin],
Li, L.J.[Lin-Jie],
Lin, C.C.[Chung-Ching],
Ahmed, F.[Faisal],
Gan, Z.[Zhe],
Liu, Z.C.[Zi-Cheng],
Lu, Y.[Yumao],
Wang, L.J.[Li-Juan],
SwinBERT: End-to-End Transformers with Sparse Attention for Video
Captioning,
CVPR22(17928-17937)
IEEE DOI
2210
Adaptation models, Video sequences, Redundancy, Natural languages,
Transformers, Feature extraction, Vision + language
BibRef
Shi, Y.[Yaya],
Yang, X.[Xu],
Xu, H.Y.[Hai-Yang],
Yuan, C.F.[Chun-Feng],
Li, B.[Bing],
Hu, W.M.[Wei-Ming],
Zha, Z.J.[Zheng-Jun],
EMScore: Evaluating Video Captioning via Coarse-Grained and
Fine-Grained Embedding Matching,
CVPR22(17908-17917)
IEEE DOI
2210
Measurement, Visualization, Correlation, Systematics,
Computational modeling, Semantics, Vision + language
BibRef
Chen, S.X.[Shao-Xiang],
Jiang, Y.G.[Yu-Gang],
Motion Guided Region Message Passing for Video Captioning,
ICCV21(1523-1532)
IEEE DOI
2203
Location awareness, Visualization, Message passing,
Computational modeling, Detectors, Feature extraction,
Video analysis and understanding
BibRef
Joshi, P.,
Saharia, C.,
Singh, V.,
Gautam, D.,
Ramakrishnan, G.,
Jyothi, P.,
A Tale of Two Modalities for Video Captioning,
MMVAMTC19(3708-3712)
IEEE DOI
2004
audio signal processing, learning (artificial intelligence),
natural language processing, text analysis, multi modal
BibRef
Wang, T.[Teng],
Zhang, R.[Ruimao],
Lu, Z.C.[Zhi-Chao],
Zheng, F.[Feng],
Cheng, R.[Ran],
Luo, P.[Ping],
End-to-End Dense Video Captioning with Parallel Decoding,
ICCV21(6827-6837)
IEEE DOI
2203
Location awareness, Handheld computers, Stacking, Redundancy,
Pipelines, Transformers, Decoding,
Vision + language
BibRef
Yang, B.[Bang],
Zou, Y.X.[Yue-Xian],
Visual Oriented Encoder: Integrating Multimodal and Multi-Scale
Contexts for Video Captioning,
ICPR21(188-195)
IEEE DOI
2105
Visualization, Semantics, Natural languages, Benchmark testing,
Feature extraction, Encoding, Data mining
BibRef
Perez-Martin, J.[Jesus],
Bustos, B.[Benjamin],
Pérez, J.[Jorge],
Attentive Visual Semantic Specialized Network for Video Captioning,
ICPR21(5767-5774)
IEEE DOI
2105
Visualization, Adaptation models, Video description, Semantics,
Logic gates, Syntactics,
video captioning
BibRef
Lu, M.[Min],
Li, X.[Xueyong],
Liu, C.[Caihua],
Context Visual Information-based Deliberation Network for Video
Captioning,
ICPR21(9812-9818)
IEEE DOI
2105
Visualization, Semantics, Coherence, Benchmark testing,
Pattern recognition, Decoding
BibRef
Olivastri, S.,
Singh, G.,
Cuzzolin, F.,
End-to-End Video Captioning,
HVU19(1474-1482)
IEEE DOI
2004
convolutional neural nets, decoding, image recognition,
learning (artificial intelligence), recurrent neural nets,
BibRef
Li, L.,
Gong, B.,
End-to-End Video Captioning With Multitask Reinforcement Learning,
WACV19(339-348)
IEEE DOI
1904
convolutional neural nets,
learning (artificial intelligence), recurrent neural nets,
Hardware
BibRef
Wang, B.,
Ma, L.,
Zhang, W.,
Liu, W.,
Reconstruction Network for Video Captioning,
CVPR18(7622-7631)
IEEE DOI
1812
Decoding, Semantics, Image reconstruction, Video sequences,
Visualization, Feature extraction, Natural languages
BibRef
Li, Y.,
Yao, T.,
Pan, Y.,
Chao, H.,
Mei, T.,
Jointly Localizing and Describing Events for Dense Video Captioning,
CVPR18(7492-7500)
IEEE DOI
1812
Proposals, Dogs, Complexity theory, Task analysis, Training, Optimization
BibRef
Wang, J.,
Jiang, W.,
Ma, L.,
Liu, W.,
Xu, Y.,
Bidirectional Attentive Fusion with Context Gating for Dense Video
Captioning,
CVPR18(7190-7198)
IEEE DOI
1812
Proposals, Visualization, Task analysis, Video sequences, Fuses,
Semantics, Feature extraction
BibRef
Wu, X.,
Li, G.,
Cao, Q.,
Ji, Q.,
Lin, L.,
Interpretable Video Captioning via Trajectory Structured Localization,
CVPR18(6829-6837)
IEEE DOI
1812
Trajectory, Feature extraction, Decoding, Visualization, Semantics,
Recurrent neural networks
BibRef
Wang, X.,
Chen, W.,
Wu, J.,
Wang, Y.,
Wang, W.Y.,
Video Captioning via Hierarchical Reinforcement Learning,
CVPR18(4213-4222)
IEEE DOI
1812
Task analysis, Semantics, Dogs, Neural networks,
Portable computers
BibRef
Zhou, L.,
Zhou, Y.,
Corso, J.J.,
Socher, R.,
Xiong, C.,
End-to-End Dense Video Captioning with Masked Transformer,
CVPR18(8739-8748)
IEEE DOI
1812
Proposals, Decoding, Encoding, Hidden Markov models, Feeds, Training,
Visualization
BibRef
Yang, D.,
Yuan, C.,
Hierarchical Context Encoding for Events Captioning in Videos,
ICIP18(1288-1292)
IEEE DOI
1809
Videos, Proposals, Task analysis, Mathematical model,
Computational modeling, Decoding, Measurement, Video captioning,
video summarization
BibRef
Shen, Z.Q.[Zhi-Qiang],
Li, J.G.[Jian-Guo],
Su, Z.[Zhou],
Li, M.J.[Min-Jun],
Chen, Y.R.[Yu-Rong],
Jiang, Y.G.[Yu-Gang],
Xue, X.Y.[Xiang-Yang],
Weakly Supervised Dense Video Captioning,
CVPR17(5159-5167)
IEEE DOI
1711
Motion segmentation, Neural networks, Training,
Visualization, Vocabulary
BibRef
Baraldi, L.,
Grana, C.,
Cucchiara, R.,
Hierarchical Boundary-Aware Neural Encoder for Video Captioning,
CVPR17(3185-3194)
IEEE DOI
1711
Encoding, Logic gates, Microprocessors,
Motion pictures, Streaming media, Visualization
BibRef
Pan, P.B.[Ping-Bo],
Xu, Z.W.[Zhong-Wen],
Yang, Y.[Yi],
Wu, F.[Fei],
Zhuang, Y.T.[Yue-Ting],
Hierarchical Recurrent Neural Encoder for Video Representation with
Application to Captioning,
CVPR16(1029-1038)
IEEE DOI
1612
video captioning where temporal information plays a crucial role.
BibRef
Yu, H.N.[Hao-Nan],
Wang, J.[Jiang],
Huang, Z.H.[Zhi-Heng],
Yang, Y.[Yi],
Xu, W.[Wei],
Video Paragraph Captioning Using Hierarchical Recurrent Neural
Networks,
CVPR16(4584-4593)
IEEE DOI
1612
Generating one or multiple sentences to describe a realistic video
BibRef
Shin, A.[Andrew],
Ohnishi, K.[Katsunori],
Harada, T.[Tatsuya],
Beyond caption to narrative: Video captioning with multiple sentences,
ICIP16(3364-3368)
IEEE DOI
1610
Feature extraction
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Video Summarization, Abstract, MPEG Based, AVC, H264, MPEG Metadata .