Verma, Y.[Yashaswi],
Jawahar, C.V.,
A support vector approach for cross-modal search of images and texts,
CVIU(154), No. 1, 2017, pp. 48-63.
Elsevier DOI
1612
Image search
BibRef
Dutta, A.[Ayushi],
Verma, Y.[Yashaswi],
Jawahar, C.V.,
Recurrent Image Annotation with Explicit Inter-Label Dependencies,
ECCV20(XXIX: 191-207).
Springer DOI
2010
BibRef
Xue, J.F.[Jian-Fei],
Eguchi, K.[Koji],
Video Data Modeling Using Sequential Correspondence Hierarchical
Dirichlet Processes,
IEICE(E100-D), No. 1, January 2017, pp. 33-41.
WWW Link.
1701
multimodal data such as the mixture of visual words and speech words
extracted from video files
BibRef
Liu, A.A.[An-An],
Xu, N.[Ning],
Wong, Y.K.[Yong-Kang],
Li, J.[Junnan],
Su, Y.T.[Yu-Ting],
Kankanhalli, M.[Mohan],
Hierarchical & multimodal video captioning: Discovering and
transferring multimodal knowledge for vision to language,
CVIU(163), No. 1, 2017, pp. 113-125.
Elsevier DOI
1712
Video to text
BibRef
Guan, J.N.[Jin-Ning],
Wang, E.[Eric],
Repeated review based image captioning for image evidence review,
SP:IC(63), 2018, pp. 141-148.
Elsevier DOI
1804
Repeated review, Image captioning, Encoder-decoder, Multimodal layer
BibRef
Hu, M.,
Yang, Y.,
Shen, F.,
Zhang, L.,
Shen, H.T.,
Li, X.,
Robust Web Image Annotation via Exploring Multi-Facet and Structural
Knowledge,
IP(26), No. 10, October 2017, pp. 4871-4884.
IEEE DOI
1708
image annotation, image retrieval, iterative methods,
learning (artificial intelligence), multimedia systems,
optimisation, pattern classification, RMSL,
data structural information,
digital technologies,
image semantic indexing, image semantic retrieval,
robust multiview semi-supervised learning, visual features,
Manifolds, Multimedia communication, Semantics,
Semisupervised learning, Supervised learning, Image annotation,
l2, p-norm, multi-view learning, semi-supervised learning
BibRef
Gil-Gonzalez, J.,
Alvarez-Meza, A.,
Orozco-Gutierrez, A.,
Learning from multiple annotators using kernel alignment,
PRL(116), 2018, pp. 150-156.
Elsevier DOI
1812
Multiple annotators, Kernel methods, Classification
BibRef
Zheng, H.[He],
Wu, J.H.[Jia-Hong],
Liang, R.[Rui],
Li, Y.[Ye],
Li, X.Z.[Xu-Zhi],
Multi-task learning for captioning images with novel words,
IET-CV(13), No. 3, April 2019, pp. 294-301.
DOI Link
1904
BibRef
Park, C.C.,
Kim, B.,
Kim, G.,
Towards Personalized Image Captioning via Multimodal Memory Networks,
PAMI(41), No. 4, April 2019, pp. 999-1012.
IEEE DOI
1903
BibRef
Earlier:
Attend to You: Personalized Image Captioning with Context Sequence
Memory Networks,
CVPR17(6432-6440)
IEEE DOI
1711
Tagging, Twitter, Task analysis, Computational modeling, Writing,
Vocabulary, Context modeling, Image captioning, personalization,
convolutional neural networks.
Pattern recognition
BibRef
Yang, M.,
Zhao, W.,
Xu, W.,
Feng, Y.,
Zhao, Z.,
Chen, X.,
Lei, K.,
Multitask Learning for Cross-Domain Image Captioning,
MultMed(21), No. 4, April 2019, pp. 1047-1061.
IEEE DOI
1903
Task analysis, Image generation, Data models, Training data,
Neural networks, Training, Maximum likelihood estimation,
reinforcement learning
BibRef
Yu, N.,
Hu, X.,
Song, B.,
Yang, J.,
Zhang, J.,
Topic-Oriented Image Captioning Based on Order-Embedding,
IP(28), No. 6, June 2019, pp. 2743-2754.
IEEE DOI
1905
image classification, image matching, image retrieval,
learning (artificial intelligence), image matching,
cross-modal retrieval
BibRef
Li, X.,
Xu, C.,
Wang, X.,
Lan, W.,
Jia, Z.,
Yang, G.,
Xu, J.,
COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval,
MultMed(21), No. 9, September 2019, pp. 2347-2360.
IEEE DOI
1909
Image annotation, Task analysis, Training, Image retrieval, Internet,
Streaming media, Visualization, COCO-CN, Chinese language,
image retrieval
BibRef
Tian, C.[Chunna],
Tian, M.[Ming],
Jiang, M.M.[Meng-Meng],
Liu, H.[Heng],
Deng, D.H.[Dong-Hu],
How much do cross-modal related semantics benefit image captioning by
weighting attributes and re-ranking sentences?,
PRL(125), 2019, pp. 639-645.
Elsevier DOI
1909
Semant attributes, Attribute reweighting,
Cross-modal related semantics, Sentence re-ranking
BibRef
Niu, Y.,
Lu, Z.,
Wen, J.,
Xiang, T.,
Chang, S.,
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image
Annotation,
IP(28), No. 4, April 2019, pp. 1720-1731.
IEEE DOI
1901
feature extraction, image classification, image fusion,
image representation, learning (artificial intelligence),
label quantity prediction
BibRef
Huang, Y.,
Chen, J.,
Ouyang, W.,
Wan, W.,
Xue, Y.,
Image Captioning With End-to-End Attribute Detection and Subsequent
Attributes Prediction,
IP(29), 2020, pp. 4013-4026.
IEEE DOI
2002
Image captioning, semantic attention, end-to-end training,
multimodal attribute detector, subsequent attribute predictor
BibRef
Zhao, W.,
Wu, X.,
Luo, J.,
Cross-Domain Image Captioning via Cross-Modal Retrieval and Model
Adaptation,
IP(30), 2021, pp. 1180-1192.
IEEE DOI
2012
Adaptation models, Task analysis, Visualization,
Computational modeling, Linguistics, Semantics, Image segmentation,
model adaptation
BibRef
Wang, H.[Hang],
Du, Y.T.[You-Tian],
Zhang, G.X.[Guang-Xun],
Cai, Z.M.[Zhong-Min],
Su, C.[Chang],
Learning Fundamental Visual Concepts Based on Evolved Multi-Edge
Concept Graph,
MultMed(23), 2021, pp. 4400-4413.
IEEE DOI
2112
Visualization, Semantics, Image annotation, Image edge detection,
Data models, Adaptation models, Task analysis,
cross media
BibRef
Zhang, J.,
Mei, K.,
Zheng, Y.,
Fan, J.,
Integrating Part of Speech Guidance for Image Captioning,
MultMed(23), 2021, pp. 92-104.
IEEE DOI
2012
Visualization, Predictive models, Semantics, Feature extraction,
Task analysis, Speech processing, Part of speech,
multi-task learning
BibRef
Kim, D.J.[Dong-Jin],
Oh, T.H.[Tae-Hyun],
Choi, J.[Jinsoo],
Kweon, I.S.[In So],
Dense Relational Image Captioning via Multi-Task Triple-Stream
Networks,
PAMI(44), No. 11, November 2022, pp. 7348-7362.
IEEE DOI
2210
BibRef
Earlier: A1, A3, A2, A4:
Dense Relational Captioning: Triple-Stream Networks for
Relationship-Based Captioning,
CVPR19(6264-6273).
IEEE DOI
2002
Task analysis, Visualization, Proposals, Dogs, Motorcycles,
Natural languages, Genomics, Dense captioning, image captioning,
scene graph
BibRef
Nguyen, T.S.[Thanh-Son],
Fernando, B.[Basura],
Effective Multimodal Encoding for Image Paragraph Captioning,
IP(31), 2022, pp. 6381-6395.
IEEE DOI
2211
Image coding, Visualization, Encoding, Generators, Training,
Image reconstruction, Decoding, Multimodal encoding generation, autoencoder
BibRef
Duan, Y.Q.[Yi-Qun],
Wang, Z.[Zhen],
Li, Y.[Yi],
Wang, J.Y.[Jing-Ya],
Cross-domain multi-style merge for image captioning,
CVIU(228), 2023, pp. 103617.
Elsevier DOI
2302
Vision and language, Image captioning, Controllable generation
BibRef
Wu, X.X.[Xin-Xiao],
Li, T.[Tong],
Sentimental Visual Captioning using Multimodal Transformer,
IJCV(131), No. 1, January 2023, pp. 1073-1090.
Springer DOI
2303
BibRef
Ding, Z.W.[Zhi-Wei],
Lan, G.L.[Gui-Lin],
Song, Y.Z.[Yan-Zhi],
Yang, Z.W.[Zhou-Wang],
SGIR: Star Graph-Based Interaction for Efficient and Robust
Multimodal Representation,
MultMed(26), 2024, pp. 4217-4229.
IEEE DOI
2403
Multimodal representation method that learns private and hub
representations of modalities.
Stars, Feature extraction, Visualization, Transformers,
Task analysis, Medical services, Noise measurement, modal interaction
BibRef
Zhao, W.T.[Wen-Tian],
Wu, X.X.[Xin-Xiao],
Boosting Entity-Aware Image Captioning With Multi-Modal Knowledge
Graph,
MultMed(26), 2024, pp. 2659-2670.
IEEE DOI
2402
Visualization, Internet, Knowledge graphs, Online services, Encyclopedias,
Knowledge engineering, Knowledge based systems, knowledge graph
BibRef
Gao, J.L.[Jun-Long],
Li, J.[Jiguo],
Jia, C.M.[Chuan-Min],
Wang, S.S.[Shan-She],
Ma, S.W.[Si-Wei],
Gao, W.[Wen],
Cross Modal Compression With Variable Rate Prompt,
MultMed(26), 2024, pp. 3444-3456.
IEEE DOI
2402
Into text description.
Image coding, Semantics, Earth Observing System, Decoding,
Visualization, Training, Measurement, Cross modal compression,
variable rate prompt
BibRef
Gao, J.L.[Jun-Long],
Jia, C.M.[Chuan-Min],
Huang, Z.M.[Zhi-Meng],
Wang, S.S.[Shan-She],
Ma, S.W.[Si-Wei],
Gao, W.[Wen],
Rate-Distortion Optimized Cross Modal Compression With Multiple
Domains,
CirSysVideo(34), No. 8, August 2024, pp. 6978-6992.
IEEE DOI
2408
Image coding, Semantics, Rate-distortion, Image reconstruction,
Transform coding, Distortion, Image edge detection,
reinforcement learning
BibRef
Cao, S.[Shan],
An, G.[Gaoyun],
Cen, Y.G.[Yi-Gang],
Yang, Z.Q.[Zhao-Qilin],
Lin, W.S.[Wei-Si],
CAST: Cross-Modal Retrieval and Visual Conditioning for image
captioning,
PR(153), 2024, pp. 110555.
Elsevier DOI
2405
Image captioning, Image-text retriever,
Image and memory comprehender, Dual attention decoder
BibRef
Song, Z.J.[Zi-Jie],
Hu, Z.Z.[Zhen-Zhen],
Zhou, Y.[Yuanen],
Zhao, Y.[Ye],
Hong, R.C.[Ri-Chang],
Wang, M.[Meng],
Embedded Heterogeneous Attention Transformer for Cross-Lingual Image
Captioning,
MultMed(26), 2024, pp. 9008-9020.
IEEE DOI
2408
Visualization, Task analysis, Transformers, Tensors, Semantics,
Computational modeling, Cognition, Image captioning,
heterogeneous attention reasoning
BibRef
Li, Y.[Yinan],
Ji, J.Y.[Jia-Yi],
Sun, X.S.[Xiao-Shuai],
Zhou, Y.[Yiyi],
Luo, Y.P.[Yun-Peng],
Ji, R.R.[Rong-Rong],
M3ixup: A multi-modal data augmentation approach for image captioning,
PR(158), 2025, pp. 110941.
Elsevier DOI
2411
Image captioning, Multi-modal mixup, Data augmentation, Discriminate captioning
BibRef
Deng, H.Y.[Hong-Yu],
Xie, Y.S.[Yu-Shan],
Wang, Q.[Qi],
Wang, J.J.[Jian-Jun],
Ruan, W.J.[Wei-Jian],
Liu, W.[Wu],
Liu, Y.J.[Yong-Jin],
CDKM: Common and Distinct Knowledge Mining Network With Content
Interaction for Dense Captioning,
MultMed(26), 2024, pp. 10462-10473.
IEEE DOI
2411
Task analysis, Visualization, Feature extraction, Object detection,
Knowledge engineering, Correlation, Context modeling, multimodal
BibRef
Jin, B.[Bu],
Zheng, Y.P.[Yu-Peng],
Li, P.F.[Peng-Fei],
Li, W.[Weize],
Zheng, Y.H.[Yu-Hang],
Hu, S.[Sujie],
Liu, X.Y.[Xin-Yu],
Zhu, J.[Jinwei],
Yan, Z.J.[Zhi-Jie],
Sun, H.Y.[Hai-Yang],
Zhan, K.[Kun],
Jia, P.[Peng],
Long, X.X.[Xiao-Xiao],
Chen, Y.L.[Yi-Lun],
Zhao, H.[Hao],
Tod3cap: Towards 3d Dense Captioning in Outdoor Scenes,
ECCV24(XVIII: 367-384).
Springer DOI
2412
BibRef
Kim, M.J.[Min-Jung],
Lim, H.S.[Hyung Suk],
Lee, S.[Soonyoung],
Kim, B.[Bumsoo],
Kim, G.[Gunhee],
Bi-directional Contextual Attention for 3d Dense Captioning,
ECCV24(XVIII: 385-401).
Springer DOI
2412
BibRef
Zhao, Y.Z.[Yu-Zhong],
Liu, Y.[Yue],
Guo, Z.[Zonghao],
Wu, W.J.[Wei-Jia],
Gong, C.[Chen],
Ye, Q.X.[Qi-Xiang],
Wan, F.[Fang],
Controlcap: Controllable Region-level Captioning,
ECCV24(XXXVIII: 21-38).
Springer DOI
2412
BibRef
Wang, Z.[Zhen],
Jiang, X.Y.[Xin-Yun],
Xiao, J.[Jun],
Chen, T.[Tao],
Chen, L.[Long],
Decap: Towards Generalized Explicit Caption Editing via Diffusion
Mechanism,
ECCV24(XLIII: 365-381).
Springer DOI
2412
BibRef
Mao, S.Q.[Shun-Qi],
Zhang, C.Y.[Chao-Yi],
Su, H.[Hang],
Song, H.[Hwanjun],
Shalyminov, I.[Igor],
Cai, W.D.[Wei-Dong],
Controllable Contextualized Image Captioning: Directing the Visual
Narrative Through User-defined Highlights,
ECCV24(L: 464-481).
Springer DOI
2412
BibRef
Sarto, S.[Sara],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Bridge: Bridging Gaps in Image Captioning Evaluation with Stronger
Visual Cues,
ECCV24(LXXVIII: 70-87).
Springer DOI
2412
BibRef
Matsuda, K.[Kazuki],
Wada, Y.[Yuiga],
Sugiura, K.[Komei],
DENEB: A Hallucination-robust Automatic Evaluation Metric for Image
Captioning,
ACCV24(III: 166-182).
Springer DOI
2412
BibRef
Hu, J.C.[Jia Cheng],
Cavicchioli, R.[Roberto],
Capotondi, A.[Alessandro],
A Request for Clarity over the End of Sequence Token in the
Self-critical Sequence Training,
CIAP23(I:39-50).
Springer DOI Code:
WWW Link.
2312
BibRef
Hu, W.Z.[Wen-Zhe],
Wang, L.X.[Lan-Xiao],
Xu, L.F.[Lin-Feng],
Spatial-Semantic Attention for Grounded Image Captioning,
ICIP22(61-65)
IEEE DOI
2211
Measurement, Grounding, Semantics, Predictive models,
Feature extraction, Data mining, Proposals,
Multimodal
BibRef
Sharif, N.[Naeha],
Jalwana, M.A.A.K.[Mohammad A.A.K.],
Bennamoun, M.[Mohammed],
Liu, W.[Wei],
Shah, S.A.A.[Syed Afaq Ali],
Leveraging Linguistically-aware Object Relations and NASNet for Image
Captioning,
IVCNZ20(1-6)
IEEE DOI
2012
Visualization, Semantics, Pipelines, Computer architecture,
Knowledge discovery, Feature extraction, Task analysis,
NASNet
BibRef
Kuo, C.W.[Chia-Wen],
Kira, Z.[Zsolt],
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual
Context for Image Captioning,
CVPR22(17948-17958)
IEEE DOI
2210
Measurement, Visualization, Analytical models, Graphical models,
Grounding, Computational modeling, Genomics, Vision + language
BibRef
Zhou, M.Y.[Ming-Yang],
Zhou, L.W.[Luo-Wei],
Wang, S.H.[Shuo-Hang],
Cheng, Y.[Yu],
Li, L.J.[Lin-Jie],
Yu, Z.[Zhou],
Liu, J.J.[Jing-Jing],
UC2: Universal Cross-lingual Cross-modal Vision-and-Language
Pre-training,
CVPR21(4153-4163)
IEEE DOI
2111
Training, Visualization, Benchmark testing, Knowledge discovery,
Data models, Machine translation
BibRef
Laina, I.,
Rupprecht, C.,
Navab, N.,
Towards Unsupervised Image Captioning With Shared Multimodal
Embeddings,
ICCV19(7413-7423)
IEEE DOI
2004
natural language processing, text analysis,
multimodal embeddings, explicit supervision,
Semantics
BibRef
Akbari, H.[Hassan],
Karaman, S.[Svebor],
Bhargava, S.[Surabhi],
Chen, B.[Brian],
Vondrick, C.[Carl],
Chang, S.F.[Shih-Fu],
Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding,
CVPR19(12468-12478).
IEEE DOI
2002
BibRef
Chen, T.H.,
Liao, Y.H.,
Chuang, C.Y.,
Hsu, W.T.,
Fu, J.,
Sun, M.,
Show, Adapt and Tell:
Adversarial Training of Cross-Domain Image Captioner,
ICCV17(521-530)
IEEE DOI
1802
image processing, inference mechanisms, text analysis, MSCOCO,
adversarial training procedure, captioner act, critic networks,
Training data
BibRef
Pini, S.[Stefano],
Cornia, M.[Marcella],
Baraldi, L.[Lorenzo],
Cucchiara, R.[Rita],
Towards Video Captioning with Naming:
A Novel Dataset and a Multi-modal Approach,
CIAP17(II:384-395).
Springer DOI
1711
BibRef
Pan, J.Y.[Jia-Yu],
Yang, H.J.[Hyung-Jeong],
Faloutsos, C.[Christos],
MMSS: Graph-based Multi-modal Story-oriented Video Summarization and
Retrieval,
CMU-CS-TR-04-114. 2004.
HTML Version.
0501
BibRef
Pan, J.Y.[Jia-Yu],
Yang, H.J.[Hyung-Jeong],
Faloutsos, C.[Christos],
Duygulu, P.[Pinar],
GCap: Graph-based Automatic Image Captioning,
MMDE04(146).
IEEE DOI
0406
BibRef
Pan, J.Y.[Jia-Yu],
Advanced Tools for Video and Multimedia Mining,
CMU-CS-06-126, May 2006.
BibRef
0605
Ph.D.Thesis,
HTML Version.
BibRef
Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Transformer for Captioning, Image Captioning .