19.4.5.6.1 Video Captioning

Chapter Contents (Back)
Video Captioning.
See also Annotation, Captioning, Image Captioning.

Qiu, Z.F.[Zhao-Fan], Yao, T.[Ting], Mei, T.[Tao],
Learning Deep Spatio-Temporal Dependence for Semantic Video Segmentation,
MultMed(20), No. 4, April 2018, pp. 939-949.
IEEE DOI 1804
BibRef
Earlier:
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks,
ICCV17(5534-5542)
IEEE DOI 1802
3D from 2D nets. Image segmentation, Semantics, Streaming media, video segmentation convolution, feature extraction, image classification, image recognition, image representation, Visualization BibRef

Qiu, Z.F.[Zhao-Fan], Yao, T.[Ting], Ngo, C.W.[Chong-Wah], Tian, X.M.[Xin-Mei], Mei, T.[Tao],
Learning Spatio-Temporal Representation With Local and Global Diffusion,
CVPR19(12048-12057).
IEEE DOI 2002
BibRef

Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T.,
Boosting Image Captioning with Attributes,
ICCV17(4904-4912)
IEEE DOI 1802
BibRef
And: A2, A1, A3, A5, Only:
Video Captioning with Transferred Semantic Attributes,
CVPR17(984-992)
IEEE DOI 1711
image representation, learning (artificial intelligence), Semantics. Natural languages, Probability distribution, Recurrent neural networks, Visualization BibRef

Zhao, B., Li, X., Lu, X.,
CAM-RNN: Co-Attention Model Based RNN for Video Captioning,
IP(28), No. 11, November 2019, pp. 5552-5565.
IEEE DOI 1909
Visualization, Task analysis, Logic gates, Recurrent neural networks, Dogs, Semantics, Decoding, recurrent neural network BibRef

Yan, C., Tu, Y., Wang, X., Zhang, Y., Hao, X., Zhang, Y., Dai, Q.,
STAT: Spatial-Temporal Attention Mechanism for Video Captioning,
MultMed(22), No. 1, January 2020, pp. 229-241.
IEEE DOI 2001
BibRef
And: Corrections: MultMed(22), No. 3, March 2020, pp. 830-830.
IEEE DOI 2003
Video captioning, spatial-temporal attention mechanism, encoder-decoder neural networks. Mechatronics, Automation, Streaming media BibRef

Aafaq, N.[Nayyer], Mian, A.[Ajmal], Liu, W.[Wei], Gilani, S.Z.[Syed Zulqarnain], Shah, M.[Mubarak],
Video Description: A Survey of Methods, Datasets, and Evaluation Metrics,
Surveys(52), No. 6, October 2019, pp. xx-yy.
DOI Link 2001
video to text, Video description, video captioning, language in vision BibRef

Zhang, Z., Xu, D., Ouyang, W., Tan, C.,
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization,
CirSysVideo(30), No. 9, September 2020, pp. 3130-3139.
IEEE DOI 2009
Proposals, Visualization, Image segmentation, Feature extraction, Semantics, Decoding, Task analysis, Dense video captioning, hierarchical attention mechanism BibRef

Zhang, W.[Wei], Wang, B.R.[Bai-Rui], Ma, L.[Lin], Liu, W.[Wei],
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning,
PAMI(42), No. 12, December 2020, pp. 3088-3101.
IEEE DOI 2011
Decoding, Image reconstruction, Semantics, Training data, Visualization, Video sequences, Video captioning, backward information BibRef

Lee, S.[Sujin], Kim, I.[Incheol],
DVC-Net: A deep neural network model for dense video captioning,
IET-CV(15), No. 1, 2021, pp. 12-23.
DOI Link 2106
BibRef

Qi, S.S.[Shan-Shan], Yang, L.X.[Lu-Xi],
Video captioning via a symmetric bidirectional decoder,
IET-CV(15), No. 4, 2021, pp. 283-296.
DOI Link 2106
BibRef

Li, L.[Linghui], Zhang, Y.D.[Yong-Dong], Tang, S.[Sheng], Xie, L.X.[Ling-Xi], Li, X.Y.[Xiao-Yong], Tian, Q.[Qi],
Adaptive Spatial Location With Balanced Loss for Video Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 17-30.
IEEE DOI 2201
Task analysis, Redundancy, Feature extraction, Visualization, Detectors, Training, Convolutional neural network, balanced loss BibRef

Zheng, Y.[Yi], Zhang, Y.[Yuejie], Feng, R.[Rui], Zhang, T.[Tao], Fan, W.G.[Wei-Guo],
Stacked Multimodal Attention Network for Context-Aware Video Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 31-42.
IEEE DOI 2201
Feature extraction, Visualization, Decoding, Training, Biological system modeling, Context modeling, Predictive models, reinforcement learning BibRef

Li, L.[Liang], Gao, X.Y.[Xing-Yu], Deng, J.[Jincan], Tu, Y.[Yunbin], Zha, Z.J.[Zheng-Jun], Huang, Q.M.[Qing-Ming],
Long Short-Term Relation Transformer With Global Gating for Video Captioning,
IP(31), 2022, pp. 2726-2738.
IEEE DOI 2204
Transformers, Cognition, Visualization, Feature extraction, Decoding, Task analysis, Semantics, Video captioning, relational reasoning, transformer BibRef

Munusamy, H.[Hemalatha], Sekhar, C.C.[C. Chandra],
Video captioning using Semantically Contextual Generative Adversarial Network,
CVIU(221), 2022, pp. 103453.
Elsevier DOI 2206
Video captioning, Generative adversarial network, Reinforcement learning, Generator, Discriminator BibRef

Wang, H.[Hao], Lin, G.S.[Guo-Sheng], Hoi, S.C.H.[Steven C. H.], Miao, C.Y.[Chun-Yan],
Cross-Modal Graph With Meta Concepts for Video Captioning,
IP(31), 2022, pp. 5150-5162.
IEEE DOI 2208
Semantics, Visualization, Feature extraction, Predictive models, Task analysis, Computational modeling, Location awareness, vision-and-language BibRef

Xiao, H.[Huanhou], Shi, J.L.[Jing-Lun],
Diverse video captioning through latent variable expansion,
PRL(160), 2022, pp. 19-25.
Elsevier DOI 2208
Latent variables, Diverse captions, CGAN BibRef

Prudviraj, J.[Jeripothula], Reddy, M.I.[Malipatel Indrakaran], Vishnu, C.[Chalavadi], Mohan, C.K.[Chalavadi Krishna],
AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description,
IP(31), 2022, pp. 5559-5569.
IEEE DOI 2209
Transformers, Streaming media, Task analysis, Visualization, Video description, Correlation, Natural languages, transformers BibRef

Xu, W.[Wanru], Miao, Z.J.[Zhen-Jiang], Yu, J.[Jian], Tian, Y.[Yi], Wan, L.[Lili], Ji, Q.[Qiang],
Bridging Video and Text: A Two-Step Polishing Transformer for Video Captioning,
CirSysVideo(32), No. 9, September 2022, pp. 6293-6307.
IEEE DOI 2209
Semantics, Visualization, Decoding, Transformers, Task analysis, Planning, Training, Video captioning, transformer, cross-modal modeling BibRef

Wu, B.F.[Bo-Feng], Niu, G.C.[Guo-Cheng], Yu, J.[Jun], Xiao, X.Y.[Xin-Yan], Zhang, J.[Jian], Wu, H.[Hua],
Towards Knowledge-Aware Video Captioning via Transitive Visual Relationship Detection,
CirSysVideo(32), No. 10, October 2022, pp. 6753-6765.
IEEE DOI 2210
Visualization, Task analysis, Semantics, Feature extraction, Decoding, Training, Vocabulary, Video captioning, natural language process BibRef

Yan, L.Q.[Li-Qi], Ma, S.Q.[Si-Qi], Wang, Q.F.[Qi-Fan], Chen, Y.J.[Ying-Jie], Zhang, X.Y.[Xiang-Yu], Savakis, A.[Andreas], Liu, D.F.[Dong-Fang],
Video Captioning Using Global-Local Representation,
CirSysVideo(32), No. 10, October 2022, pp. 6642-6656.
IEEE DOI 2210
Training, Task analysis, Visualization, Vocabulary, Semantics, Decoding, Correlation, video captioning, video representation, visual analysis BibRef

Subramaniam, A.[Arulkumar], Vaidya, J.[Jayesh], Ameen, M.A.M.[Muhammed Abdul Majeed], Nambiar, A.[Athira], Mittal, A.[Anurag],
Co-segmentation inspired attention module for video-based computer vision tasks,
CVIU(223), 2022, pp. 103532.
Elsevier DOI 2210
Attention, Co-segmentation, Person re-ID, Video-captioning, Video classification BibRef

Liu, F.L.[Feng-Lin], Wu, X.[Xian], You, C.[Chenyu], Ge, S.[Shen], Zou, Y.X.[Yue-Xian], Sun, X.[Xu],
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning,
PAMI(44), No. 12, December 2022, pp. 9255-9268.
IEEE DOI 2212
Visualization, Pipelines, Training, Data models, Decoding, Task analysis, Feature extraction, Video captioning, adversarial training BibRef

Yuan, Y.T.[Yi-Tian], Ma, L.[Lin], Zhu, W.[Wenwu],
Syntax Customized Video Captioning by Imitating Exemplar Sentences,
PAMI(44), No. 12, December 2022, pp. 10209-10221.
IEEE DOI 2212
Syntactics, Semantics, Task analysis, Training, Decoding, Encoding, Recurrent neural networks, Video captioning, recurrent neural network BibRef

Chen, H.R.[Hao-Ran], Li, J.[Jianmin], Frintrop, S.[Simone], Hu, X.L.[Xiao-Lin],
The MSR-Video to Text dataset with clean annotations,
CVIU(225), 2022, pp. 103581.
Elsevier DOI 2212
MSR-VTT dataset, Data cleaning, Data analysis, Video captioning BibRef

Moctezuma, D.[Daniela], Ramírez-delReal, T.[Tania], Ruiz, G.[Guillermo], González-Chávez, O.[Othón],
Video captioning: A comparative review of where we are and which could be the route,
CVIU(231), 2023, pp. 103671.
Elsevier DOI 2305
Natural language processing, Video captioning, Image understanding BibRef


Seo, P.H.[Paul Hongsuck], Nagrani, A.[Arsha], Arnab, A.[Anurag], Schmid, C.[Cordelia],
End-to-end Generative Pretraining for Multimodal Video Captioning,
CVPR22(17938-17947)
IEEE DOI 2210
Representation learning, Computational modeling, Bidirectional control, Benchmark testing, Decoding, Self- semi- meta- unsupervised learning BibRef

Ye, H.H.[Han-Hua], Li, G.R.[Guo-Rong], Qi, Y.[Yuankai], Wang, S.H.[Shu-Hui], Huang, Q.M.[Qing-Ming], Yang, M.H.[Ming-Hsuan],
Hierarchical Modular Network for Video Captioning,
CVPR22(17918-17927)
IEEE DOI 2210
Bridges, Representation learning, Visualization, Semantics, Supervised learning, Linguistics, Vision + language BibRef

Lin, K.[Kevin], Li, L.J.[Lin-Jie], Lin, C.C.[Chung-Ching], Ahmed, F.[Faisal], Gan, Z.[Zhe], Liu, Z.C.[Zi-Cheng], Lu, Y.[Yumao], Wang, L.J.[Li-Juan],
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning,
CVPR22(17928-17937)
IEEE DOI 2210
Adaptation models, Video sequences, Redundancy, Natural languages, Transformers, Feature extraction, Vision + language BibRef

Shi, Y.[Yaya], Yang, X.[Xu], Xu, H.Y.[Hai-Yang], Yuan, C.F.[Chun-Feng], Li, B.[Bing], Hu, W.M.[Wei-Ming], Zha, Z.J.[Zheng-Jun],
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching,
CVPR22(17908-17917)
IEEE DOI 2210
Measurement, Visualization, Correlation, Systematics, Computational modeling, Semantics, Vision + language BibRef

Chen, S.X.[Shao-Xiang], Jiang, Y.G.[Yu-Gang],
Motion Guided Region Message Passing for Video Captioning,
ICCV21(1523-1532)
IEEE DOI 2203
Location awareness, Visualization, Message passing, Computational modeling, Detectors, Feature extraction, Video analysis and understanding BibRef

Joshi, P., Saharia, C., Singh, V., Gautam, D., Ramakrishnan, G., Jyothi, P.,
A Tale of Two Modalities for Video Captioning,
MMVAMTC19(3708-3712)
IEEE DOI 2004
audio signal processing, learning (artificial intelligence), natural language processing, text analysis, multi modal BibRef

Wang, T.[Teng], Zhang, R.[Ruimao], Lu, Z.C.[Zhi-Chao], Zheng, F.[Feng], Cheng, R.[Ran], Luo, P.[Ping],
End-to-End Dense Video Captioning with Parallel Decoding,
ICCV21(6827-6837)
IEEE DOI 2203
Location awareness, Handheld computers, Stacking, Redundancy, Pipelines, Transformers, Decoding, Vision + language BibRef

Yang, B.[Bang], Zou, Y.X.[Yue-Xian],
Visual Oriented Encoder: Integrating Multimodal and Multi-Scale Contexts for Video Captioning,
ICPR21(188-195)
IEEE DOI 2105
Visualization, Semantics, Natural languages, Benchmark testing, Feature extraction, Encoding, Data mining BibRef

Perez-Martin, J.[Jesus], Bustos, B.[Benjamin], Pérez, J.[Jorge],
Attentive Visual Semantic Specialized Network for Video Captioning,
ICPR21(5767-5774)
IEEE DOI 2105
Visualization, Adaptation models, Video description, Semantics, Logic gates, Syntactics, video captioning BibRef

Lu, M.[Min], Li, X.[Xueyong], Liu, C.[Caihua],
Context Visual Information-based Deliberation Network for Video Captioning,
ICPR21(9812-9818)
IEEE DOI 2105
Visualization, Semantics, Coherence, Benchmark testing, Pattern recognition, Decoding BibRef

Olivastri, S., Singh, G., Cuzzolin, F.,
End-to-End Video Captioning,
HVU19(1474-1482)
IEEE DOI 2004
convolutional neural nets, decoding, image recognition, learning (artificial intelligence), recurrent neural nets, BibRef

Li, L., Gong, B.,
End-to-End Video Captioning With Multitask Reinforcement Learning,
WACV19(339-348)
IEEE DOI 1904
convolutional neural nets, learning (artificial intelligence), recurrent neural nets, Hardware BibRef

Wang, B., Ma, L., Zhang, W., Liu, W.,
Reconstruction Network for Video Captioning,
CVPR18(7622-7631)
IEEE DOI 1812
Decoding, Semantics, Image reconstruction, Video sequences, Visualization, Feature extraction, Natural languages BibRef

Li, Y., Yao, T., Pan, Y., Chao, H., Mei, T.,
Jointly Localizing and Describing Events for Dense Video Captioning,
CVPR18(7492-7500)
IEEE DOI 1812
Proposals, Dogs, Complexity theory, Task analysis, Training, Optimization BibRef

Wang, J., Jiang, W., Ma, L., Liu, W., Xu, Y.,
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning,
CVPR18(7190-7198)
IEEE DOI 1812
Proposals, Visualization, Task analysis, Video sequences, Fuses, Semantics, Feature extraction BibRef

Wu, X., Li, G., Cao, Q., Ji, Q., Lin, L.,
Interpretable Video Captioning via Trajectory Structured Localization,
CVPR18(6829-6837)
IEEE DOI 1812
Trajectory, Feature extraction, Decoding, Visualization, Semantics, Recurrent neural networks BibRef

Wang, X., Chen, W., Wu, J., Wang, Y., Wang, W.Y.,
Video Captioning via Hierarchical Reinforcement Learning,
CVPR18(4213-4222)
IEEE DOI 1812
Task analysis, Semantics, Dogs, Neural networks, Portable computers BibRef

Zhou, L., Zhou, Y., Corso, J.J., Socher, R., Xiong, C.,
End-to-End Dense Video Captioning with Masked Transformer,
CVPR18(8739-8748)
IEEE DOI 1812
Proposals, Decoding, Encoding, Hidden Markov models, Feeds, Training, Visualization BibRef

Yang, D., Yuan, C.,
Hierarchical Context Encoding for Events Captioning in Videos,
ICIP18(1288-1292)
IEEE DOI 1809
Videos, Proposals, Task analysis, Mathematical model, Computational modeling, Decoding, Measurement, Video captioning, video summarization BibRef

Shen, Z.Q.[Zhi-Qiang], Li, J.G.[Jian-Guo], Su, Z.[Zhou], Li, M.J.[Min-Jun], Chen, Y.R.[Yu-Rong], Jiang, Y.G.[Yu-Gang], Xue, X.Y.[Xiang-Yang],
Weakly Supervised Dense Video Captioning,
CVPR17(5159-5167)
IEEE DOI 1711
Motion segmentation, Neural networks, Training, Visualization, Vocabulary BibRef

Baraldi, L., Grana, C., Cucchiara, R.,
Hierarchical Boundary-Aware Neural Encoder for Video Captioning,
CVPR17(3185-3194)
IEEE DOI 1711
Encoding, Logic gates, Microprocessors, Motion pictures, Streaming media, Visualization BibRef

Pan, P.B.[Ping-Bo], Xu, Z.W.[Zhong-Wen], Yang, Y.[Yi], Wu, F.[Fei], Zhuang, Y.T.[Yue-Ting],
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning,
CVPR16(1029-1038)
IEEE DOI 1612
video captioning where temporal information plays a crucial role. BibRef

Yu, H.N.[Hao-Nan], Wang, J.[Jiang], Huang, Z.H.[Zhi-Heng], Yang, Y.[Yi], Xu, W.[Wei],
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks,
CVPR16(4584-4593)
IEEE DOI 1612
Generating one or multiple sentences to describe a realistic video BibRef

Shin, A.[Andrew], Ohnishi, K.[Katsunori], Harada, T.[Tatsuya],
Beyond caption to narrative: Video captioning with multiple sentences,
ICIP16(3364-3368)
IEEE DOI 1610
Feature extraction BibRef

Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
Video Summarization, Abstract, MPEG Based, AVC, H264, MPEG Metadata .


Last update:Jun 1, 2023 at 10:05:03