13.6.9 Captioning, Image Captioning

Chapter Contents (Back)
Image Captioning. Captioning. Fine-Grained. The most important object or concept in the image.
See also Image Annotation.
See also Video Captioning.
See also LSTM: Long Short-Term Memory for Captioning, Image Captioning.
See also LSTM: Long Short-Term Memory.
See also Transformer for Captioning, Image Captioning.
See also Multi-Modal, Cross-Modal Captioning, Image Captioning.

Feng, Y.S.[Yan-Song], Lapata, M.,
Automatic Caption Generation for News Images,
PAMI(35), No. 4, April 2013, pp. 797-812.
IEEE DOI 1303
Use existing captions and tags, expand to similar images. BibRef

Vinyals, O.[Oriol], Toshev, A.[Alexander], Bengio, S.[Samy], Erhan, D.[Dumitru],
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge,
PAMI(39), No. 4, April 2017, pp. 652-663.
IEEE DOI 1703
BibRef
Earlier:
Show and tell: A neural image caption generator,
CVPR15(3156-3164)
IEEE DOI 1510
Computational modeling BibRef

Wang, J.Y.[Jing-Ya], Zhu, X.T.[Xia-Tian], Gong, S.G.[Shao-Gang],
Discovering visual concept structure with sparse and incomplete tags,
AI(250), No. 1, 2017, pp. 16-36.
Elsevier DOI 1708
Automatically discovering the semantic structure of tagged visual data (e.g. web videos and images). BibRef

Kilickaya, M.[Mert], Akkus, B.K.[Burak Kerim], Cakici, R.[Ruket], Erdem, A.[Aykut], Erdem, E.[Erkut], Ikizler-Cinbis, N.[Nazli],
Data-driven image captioning via salient region discovery,
IET-CV(11), No. 6, September 2017, pp. 398-406.
DOI Link 1709
BibRef

He, X.D.[Xiao-Dong], Deng, L.[Li],
Deep Learning for Image-to-Text Generation: A Technical Overview,
SPMag(34), No. 6, November 2017, pp. 109-116.
IEEE DOI 1712
BibRef
And: Errata: SPMag(35), No. 1, January 2018, pp. 178.
IEEE DOI Artificial intelligence, Image classification, Natural language processing, Pediatrics, Semantics, Training data, Visualization BibRef

Li, L.H.[Ling-Hui], Tang, S.[Sheng], Zhang, Y.D.[Yong-Dong], Deng, L.X.[Li-Xi], Tian, Q.[Qi],
GLA: Global-Local Attention for Image Description,
MultMed(20), No. 3, March 2018, pp. 726-737.
IEEE DOI 1802
Computational modeling, Decoding, Feature extraction, Image recognition, Natural language processing, recurrent neural network BibRef

Lu, X., Wang, B., Zheng, X., Li, X.,
Exploring Models and Data for Remote Sensing Image Caption Generation,
GeoRS(56), No. 4, April 2018, pp. 2183-2195.
IEEE DOI 1804
Feature extraction, Image representation, Recurrent neural networks, Remote sensing, Semantics, semantic understanding BibRef

Wu, C.L.[Chun-Lei], Wei, Y.W.[Yi-Wei], Chu, X.L.[Xiao-Liang], Su, F.[Fei], Wang, L.Q.[Lei-Quan],
Modeling visual and word-conditional semantic attention for image captioning,
SP:IC(67), 2018, pp. 100-107.
Elsevier DOI 1808
Image captioning, Word-conditional semantic attention, Visual attention, Attention variation BibRef

Zhang, M., Yang, Y., Zhang, H., Ji, Y., Shen, H.T., Chua, T.,
More is Better: Precise and Detailed Image Captioning Using Online Positive Recall and Missing Concepts Mining,
IP(28), No. 1, January 2019, pp. 32-44.
IEEE DOI 1810
data mining, image representation, image retrieval, image segmentation, learning (artificial intelligence), element-wise selection BibRef

Gella, S.[Spandana], Keller, F.[Frank], Lapata, M.[Mirella],
Disambiguating Visual Verbs,
PAMI(41), No. 2, February 2019, pp. 311-322.
IEEE DOI 1901
Given an image and a verb, assign the correct sense of the verb. Visualization, Image recognition, Semantics, Natural language processing, Horses, Bicycles, BibRef

Xu, N.[Ning], Liu, A.A.[An-An], Liu, J.[Jing], Nie, W.Z.[Wei-Zhi], Su, Y.T.[Yu-Ting],
Scene graph captioner: Image captioning based on structural visual representation,
JVCIR(58), 2019, pp. 477-485.
Elsevier DOI 1901
Image captioning, Scene graph, Structural representation, Attention BibRef

He, X.W.[Xin-Wei], Shi, B.G.[Bao-Guang], Bai, X.[Xiang], Xia, G.S.[Gui-Song], Zhang, Z.X.[Zhao-Xiang], Dong, W.S.[Wei-Sheng],
Image Caption Generation with Part of Speech Guidance,
PRL(119), 2019, pp. 229-237.
Elsevier DOI 1902
Image caption generation, Part-of-speech tags, Long Short-Term Memory, Visual attributes BibRef

Xiao, X.Y.[Xin-Yu], Wang, L.F.[Ling-Feng], Ding, K.[Kun], Xiang, S.M.[Shi-Ming], Pan, C.H.[Chun-Hong],
Dense semantic embedding network for image captioning,
PR(90), 2019, pp. 285-296.
Elsevier DOI 1903
Image captioning, Retrieval, High-level semantic information, Visual concept, Densely embedding, Long short-term memory BibRef

Zhang, X.R.[Xiang-Rong], Wang, X.[Xin], Tang, X.[Xu], Zhou, H.Y.[Hui-Yu], Li, C.[Chen],
Description Generation for Remote Sensing Images Using Attribute Attention Mechanism,
RS(11), No. 6, 2019, pp. xx-yy.
DOI Link 1903
BibRef

Ding, S.T.[Song-Tao], Qu, S.[Shiru], Xi, Y.L.[Yu-Ling], Sangaiah, A.K.[Arun Kumar], Wan, S.H.[Shao-Hua],
Image caption generation with high-level image features,
PRL(123), 2019, pp. 89-95.
Elsevier DOI 1906
Image captioning, Language model, Bottom-up attention mechanism, Faster R-CNN BibRef

Liu, X.X.[Xiao-Xiao], Xu, Q.Y.[Qing-Yang], Wang, N.[Ning],
A survey on deep neural network-based image captioning,
VC(35), No. 3, March 2019, pp. 445-470.
WWW Link. 1906
BibRef

Hossain, M.Z.[Md. Zakir], Sohel, F.[Ferdous], Shiratuddin, M.F.[Mohd Fairuz], Laga, H.[Hamid],
A Comprehensive Survey of Deep Learning for Image Captioning,
Surveys(51), No. 6, February 2019, pp. Article No 118.
DOI Link 1906
Survey, Captioning. BibRef

Zhang, Z.J.[Zong-Jian], Wu, Q.[Qiang], Wang, Y.[Yang], Chen, F.[Fang],
High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention,
MultMed(21), No. 7, July 2019, pp. 1681-1693.
IEEE DOI 1906
BibRef
Earlier:
Fine-Grained and Semantic-Guided Visual Attention for Image Captioning,
WACV18(1709-1717)
IEEE DOI 1806
Visualization, Semantics, Feature extraction, Decoding, Task analysis, Object oriented modeling, Image resolution, fully convolutional network-long short term memory framework. feedforward neural nets, image representation, image segmentation, convolutional neural network, Visualization BibRef

Li, X., Jiang, S.,
Know More Say Less: Image Captioning Based on Scene Graphs,
MultMed(21), No. 8, August 2019, pp. 2117-2130.
IEEE DOI 1908
convolutional neural nets, feature extraction, graph theory, image representation, learning (artificial intelligence), vision-language BibRef

Sharif, N.[Naeha], White, L.[Lyndon], Bennamoun, M.[Mohammed], Liu, W.[Wei], Shah, S.A.A.[Syed Afaq Ali],
LCEval: Learned Composite Metric for Caption Evaluation,
IJCV(127), No. 10, October 2019, pp. 1586-1610.
Springer DOI 1909
Fine-grained analysis. BibRef

Zhang, Z.Y.[Zheng-Yuan], Diao, W.H.[Wen-Hui], Zhang, W.K.[Wen-Kai], Yan, M.L.[Meng-Long], Gao, X.[Xin], Sun, X.[Xian],
LAM: Remote Sensing Image Captioning with Label-Attention Mechanism,
RS(11), No. 20, 2019, pp. xx-yy.
DOI Link 1910
BibRef

Fu, K.[Kun], Li, Y.[Yang], Zhang, W.K.[Wen-Kai], Yu, H.F.[Hong-Feng], Sun, X.[Xian],
Boosting Memory with a Persistent Memory Mechanism for Remote Sensing Image Captioning,
RS(12), No. 11, 2020, pp. xx-yy.
DOI Link 2006
BibRef

Tan, J.H., Chan, C.S., Chuah, J.H.,
COMIC: Toward A Compact Image Captioning Model With Attention,
MultMed(21), No. 10, October 2019, pp. 2686-2696.
IEEE DOI 1910
embedded systems; feature extraction; image retrieval; matrix algebra. BibRef

Zhou, L., Zhang, Y., Jiang, Y., Zhang, T., Fan, W.,
Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning,
IP(29), No. 1, 2020, pp. 694-709.
IEEE DOI 1910
feature extraction, image processing, learning (artificial intelligence), visual attribute BibRef

Yang, L.[Liang], Hu, H.F.[Hai-Feng],
Visual Skeleton and Reparative Attention for Part-of-Speech image captioning system,
CVIU(189), 2019, pp. 102819.
Elsevier DOI 1911
Neural network, Visual attention, Image captioning BibRef

Wang, J.B.[Jun-Bo], Wang, W.[Wei], Wang, L.[Liang], Wang, Z.Y.[Zhi-Yong], Feng, D.D.[David Dagan], Tan, T.N.[Tie-Niu],
Learning Visual Relationship and Context-Aware Attention for Image Captioning,
PR(98), 2020, pp. 107075.
Elsevier DOI 1911
Image captioning, Relational reasoning, Context-aware attention BibRef

Xiao, X., Wang, L., Ding, K., Xiang, S., Pan, C.,
Deep Hierarchical Encoder-Decoder Network for Image Captioning,
MultMed(21), No. 11, November 2019, pp. 2942-2956.
IEEE DOI 1911
Visualization, Semantics, Hidden Markov models, Decoding, Logic gates, Training, Computer architecture, vision-sentence BibRef

Jiang, T.[Teng], Zhang, Z.[Zehan], Yang, Y.[Yupu],
Modeling coverage with semantic embedding for image caption generation,
VC(35), No. 11, November 2018, pp. 1655-1665.
WWW Link. 1911
BibRef

Lu, X., Wang, B., Zheng, X.,
Sound Active Attention Framework for Remote Sensing Image Captioning,
GeoRS(58), No. 3, March 2020, pp. 1985-2000.
IEEE DOI 2003
Active attention, remote sensing image captioning, semantic understanding BibRef

Li, Y.Y.[Yang-Yang], Fang, S.K.[Shuang-Kang], Jiao, L.C.[Li-Cheng], Liu, R.J.[Rui-Jiao], Shang, R.H.[Rong-Hua],
A Multi-Level Attention Model for Remote Sensing Image Captions,
RS(12), No. 6, 2020, pp. xx-yy.
DOI Link 2003
What are the important things in the image. BibRef

Chen, X.H.[Xing-Han], Zhang, M.X.[Ming-Xing], Wang, Z.[Zheng], Zuo, L.[Lin], Li, B.[Bo], Yang, Y.[Yang],
Leveraging unpaired out-of-domain data for image captioning,
PRL(132), 2020, pp. 132-140.
Elsevier DOI 2005
Image captioning, Out-of-domain data, Deep learning BibRef

Xu, N., Zhang, H., Liu, A., Nie, W., Su, Y., Nie, J., Zhang, Y.,
Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning,
MultMed(22), No. 5, May 2020, pp. 1372-1383.
IEEE DOI 2005
Visualization, Measurement, Task analysis, Reinforcement learning, Optimization, Adaptation models, Semantics, Multi-level policy, image captioning BibRef

Guo, L., Liu, J., Lu, S., Lu, H.,
Show, Tell, and Polish: Ruminant Decoding for Image Captioning,
MultMed(22), No. 8, August 2020, pp. 2149-2162.
IEEE DOI 2007
Decoding, Visualization, Planning, Training, Semantics, Reinforcement learning, Task analysis, Image captioning, rumination BibRef

Feng, Q., Wu, Y., Fan, H., Yan, C., Xu, M., Yang, Y.,
Cascaded Revision Network for Novel Object Captioning,
CirSysVideo(30), No. 10, October 2020, pp. 3413-3421.
IEEE DOI 2010
Visualization, Semantics, Task analysis, Detectors, Training, Knowledge engineering, Feature extraction, Captioning, semantic matching BibRef

Wei, H.Y.[Hai-Yang], Li, Z.X.[Zhi-Xin], Zhang, C.L.[Can-Long], Ma, H.F.[Hui-Fang],
The synergy of double attention: Combine sentence-level and word-level attention for image captioning,
CVIU(201), 2020, pp. 103068.
Elsevier DOI 2011
Image captioning, Sentence-level attention, Word-level attention, Reinforcement learning BibRef

Shilpa, M.[Mohankumar], He, J.[Jun], Zhao, Y.J.[Yi-Jia], Sun, B.[Bo], Yu, L.J.[Le-Jun],
Feedback evaluations to promote image captioning,
IET-IPR(14), No. 13, November 2020, pp. 3021-3027.
DOI Link 2012
BibRef

Liu, H., Zhang, S., Lin, K., Wen, J., Li, J., Hu, X.,
Vocabulary-Wide Credit Assignment for Training Image Captioning Models,
IP(30), 2021, pp. 2450-2460.
IEEE DOI 2102
Training, Measurement, Task analysis, Vocabulary, Feature extraction, Maximum likelihood estimation, Adaptation models BibRef

Xu, N.[Ning], Tian, H.S.[Hong-Shuo], Wang, Y.H.[Yan-Hui], Nie, W.Z.[Wei-Zhi], Song, D.[Dan], Liu, A.A.[An-An], Liu, W.[Wu],
Coupled-dynamic learning for vision and language: Exploring Interaction between different tasks,
PR(113), 2021, pp. 107829.
Elsevier DOI 2103
Image captioning, Image synthesis, Coupled dynamics BibRef

Yang, L., Wang, H., Tang, P., Li, Q.,
CaptionNet: A Tailor-made Recurrent Neural Network for Generating Image Descriptions,
MultMed(23), 2021, pp. 835-845.
IEEE DOI 2103
Visualization, Feature extraction, Semantics, Task analysis, Predictive models, Computational modeling, reinforcement learning BibRef

Liu, A.A.[An-An], Wang, Y.H.[Yan-Hui], Xu, N.[Ning], Liu, S.[Shan], Li, X.[Xuanya],
Scene-Graph-Guided message passing network for dense captioning,
PRL(145), 2021, pp. 187-193.
Elsevier DOI 2104
Scene graph, Dense captioning, Message passing BibRef

Zhang, L.[Le], Zhang, Y.S.[Yan-Shuo], Zhao, X.[Xin], Zou, Z.X.[Ze-Xiao],
Image captioning via proximal policy optimization,
IVC(108), 2021, pp. 104126.
Elsevier DOI 2104
Image captioning, Reinforcement learning, Proximal policy optimization BibRef

Ji, J.Z.[Jun-Zhong], Du, Z.R.[Zhuo-Ran], Zhang, X.D.[Xiao-Dan],
Divergent-convergent attention for image captioning,
PR(115), 2021, pp. 107928.
Elsevier DOI 2104
Image Captioning, Divergent Observation, Convergent Attention BibRef

Wei, Y.W.[Yi-Wei], Wu, C.L.[Chun-Lei], Jia, Z.Y.[Zhi-Yang], Hu, X.[XuFei], Guo, S.[Shuang], Shi, H.T.[Hai-Tao],
Past is important: Improved image captioning by looking back in time,
SP:IC(94), 2021, pp. 116183.
Elsevier DOI 2104
Image captioning, Reinforcement learning, Visual attention BibRef

Zhang, Z.J.[Zong-Jian], Wu, Q.[Qiang], Wang, Y.[Yang], Chen, F.[Fang],
Exploring region relationships implicitly: Image captioning with visual relationship attention,
IVC(109), 2021, pp. 104146.
Elsevier DOI 2105
Image captioning, Visual relationship attention, Relationship-level attention parallel attention mechanism, Learned spatial constraint BibRef

Zhang, Z.J.[Zong-Jian], Wu, Q.[Qiang], Wang, Y.[Yang], Chen, F.[Fang],
Exploring Pairwise Relationships Adaptively From Linguistic Context in Image Captioning,
MultMed(24), 2022, pp. 3101-3113.
IEEE DOI 2206
Visualization, Linguistics, Decoding, Modulation, Context modeling, Adaptation models, Semantics, Bilinear attention, visual relationship attention BibRef

Li, X.L.[Xue-Long], Zhang, X.T.[Xue-Ting], Huang, W.[Wei], Wang, Q.[Qi],
Truncation Cross Entropy Loss for Remote Sensing Image Captioning,
GeoRS(59), No. 6, June 2021, pp. 5246-5257.
IEEE DOI 2106
Feature extraction, Remote sensing, Entropy, Semantics, Decoding, Optimization, Visualization, Image captioning, overfitting, truncation cross entropy (TCE) loss BibRef

Zhong, X.[Xian], Nie, G.Z.[Guo-Zhang], Huang, W.X.[Wen-Xin], Liu, W.X.[Wen-Xuan], Ma, B.[Bo], Lin, C.W.[Chia-Wen],
Attention-guided image captioning with adaptive global and local feature fusion,
JVCIR(78), 2021, pp. 103138.
Elsevier DOI 2107
Image captioning, Encoder-decoder, Spatial information, Adaptive attention BibRef

Sumbul, G.[Gencer], Nayak, S.[Sonali], Demir, B.[Begüm],
SD-RSIC: Summarization-Driven Deep Remote Sensing Image Captioning,
GeoRS(59), No. 8, August 2021, pp. 6922-6934.
IEEE DOI 2108
Training, Standards, Semantics, Feature extraction, Remote sensing, Neural networks, Task analysis, Caption summarization, remote sensing (RS) BibRef

Wu, J.[Jie], Chen, T.S.[Tian-Shui], Wu, H.F.[He-Feng], Yang, Z.[Zhi], Luo, G.C.[Guang-Chun], Lin, L.[Liang],
Fine-Grained Image Captioning With Global-Local Discriminative Objective,
MultMed(23), 2021, pp. 2413-2427.
IEEE DOI 2108
Training, Visualization, Task analysis, Semantics, Reinforcement learning, Pipelines, Maximum likelihood estimation, Self-retrieval BibRef

Wu, L.X.[Ling-Xiang], Xu, M.[Min], Sang, L.[Lei], Yao, T.[Ting], Mei, T.[Tao],
Noise Augmented Double-Stream Graph Convolutional Networks for Image Captioning,
CirSysVideo(31), No. 8, August 2021, pp. 3118-3127.
IEEE DOI 2108
Visualization, Training, Generators, Reinforcement learning, Decoding, Streaming media, Recurrent neural networks, Captioning, adaptive noise BibRef

Nivedita, M., Chandrashekar, P.[Priyanka], Mahapatra, S.[Shibani], Phamila, Y.A.V.[Y. Asnath Victy], Selvaperumal, S.K.[Sathish Kumar],
Image Captioning for Video Surveillance System using Neural Networks,
IJIG(21), No. 4, October 2021 2021, pp. 2150044.
DOI Link 2110
BibRef

Wang, Q.[Qi], Huang, W.[Wei], Zhang, X.T.[Xue-Ting], Li, X.L.[Xue-Long],
Word-Sentence Framework for Remote Sensing Image Captioning,
GeoRS(59), No. 12, December 2021, pp. 10532-10543.
IEEE DOI 2112
Remote sensing, Feature extraction, Generators, Decoding, Task analysis, Visualization, Semantics, Deep learning, word-sentence framework BibRef

Wan, B.Y.[Bo-Yang], Jiang, W.H.[Wen-Hui], Fang, Y.M.[Yu-Ming], Zhu, M.W.[Min-Wei], Li, Q.[Qin], Liu, Y.[Yang],
Revisiting image captioning via maximum discrepancy competition,
PR(122), 2022, pp. 108358.
Elsevier DOI 2112
Image captioning, Model comparison, Attention mechanism BibRef

Chen, T.Y.[Tian-Yu], Li, Z.X.[Zhi-Xin], Wu, J.L.[Jing-Li], Ma, H.F.[Hui-Fang], Su, B.P.[Bian-Ping],
Improving image captioning with Pyramid Attention and SC-GAN,
IVC(117), 2022, pp. 104340.
Elsevier DOI 2112
Image captioning, Pyramid Attention network, Self-critical training, Reinforcement learning, Sequence-level learning BibRef

Zhou, Y.J.[Yu-Jie], Long, J.F.[Jie-Feng], Xu, S.P.[Su-Ping], Shang, L.[Lin],
Attribute-driven image captioning via soft-switch pointer,
PRL(152), 2021, pp. 34-41.
Elsevier DOI 2112
Image captioning, Visual attributes detection, Attention, Pointing mechanism BibRef

Zha, Z.J.[Zheng-Jun], Liu, D.[Daqing], Zhang, H.W.[Han-Wang], Zhang, Y.D.[Yong-Dong], Wu, F.[Feng],
Context-Aware Visual Policy Network for Fine-Grained Image Captioning,
PAMI(44), No. 2, February 2022, pp. 710-722.
IEEE DOI 2201
Visualization, Task analysis, Cognition, Decision making, Training, Natural languages, Reinforcement learning, Image captioning, policy network BibRef

Wang, Q.Z.[Qing-Zhong], Wan, J.[Jia], Chan, A.B.[Antoni B.],
On Diversity in Image Captioning: Metrics and Methods,
PAMI(44), No. 2, February 2022, pp. 1035-1049.
IEEE DOI 2201
Measurement, Semantics, Learning (artificial intelligence), Vegetation, Legged locomotion, Training, Computational modeling, diversity metric BibRef

Wang, J.[Jiuniu], Xu, W.J.[Wen-Jia], Wang, Q.Z.[Qing-Zhong], Chan, A.B.[Antoni B.],
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets,
ECCV20(I:370-386).
Springer DOI 2011
BibRef

Luo, G.F.[Gai-Fang], Cheng, L.J.[Li-Jun], Jing, C.[Chao], Zhao, C.[Can], Song, G.Z.[Guo-Zhu],
A thorough review of models, evaluation metrics, and datasets on image captioning,
IET-IPR(16), No. 2, 2022, pp. 311-332.
DOI Link 2201
BibRef

Ben, H.X.[Hui-Xia], Pan, Y.W.[Ying-Wei], Li, Y.[Yehao], Yao, T.[Ting], Hong, R.C.[Ri-Chang], Wang, M.[Meng], Mei, T.[Tao],
Unpaired Image Captioning With semantic-Constrained Self-Learning,
MultMed(24), 2022, pp. 904-916.
IEEE DOI 2202
Semantics, Image recognition, Training, Visualization, Decoding, Task analysis, Dogs, Encoder-decoder networks, image captioning, self-supervised learning BibRef

Song, P.P.[Pei-Pei], Guo, D.[Dan], Zhou, J.X.[Jin-Xing], Xu, M.L.[Ming-Liang], Wang, M.[Meng],
Memorial GAN With Joint Semantic Optimization for Unpaired Image Captioning,
Cyber(53), No. 7, July 2023, pp. 4388-4399.
IEEE DOI 2307
Semantics, Synthetic aperture sonar, Visualization, Task analysis, Optimization, Generative adversarial networks, Correlation, unpaired image captioning BibRef

Li, Y.[Yehao], Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Chao, H.Y.[Hong-Yang], Mei, T.[Tao],
Pointing Novel Objects in Image Captioning,
CVPR19(12489-12498).
IEEE DOI 2002
BibRef

Liu, M.F.[Mao-Fu], Hu, H.J.[Hui-Jun], Li, L.J.[Ling-Jun], Yu, Y.[Yan], Guan, W.L.[Wei-Li],
Chinese Image Caption Generation via Visual Attention and Topic Modeling,
Cyber(52), No. 2, February 2022, pp. 1247-1257.
IEEE DOI 2202
Visualization, Decoding, Semantics, Predictive models, Feature extraction, Natural language processing, visual attention BibRef

Yang, Q.Q.[Qiao-Qiao], Ni, Z.H.[Zi-Hao], Ren, P.[Peng],
Meta captioning: A meta learning based remote sensing image captioning framework,
PandRS(186), 2022, pp. 190-200.
Elsevier DOI 2203
Remote sensing image captioning, Meta learning BibRef

Yang, X.[Xu], Zhang, H.W.[Han-Wang], Cai, J.F.[Jian-Fei],
Auto-Encoding and Distilling Scene Graphs for Image Captioning,
PAMI(44), No. 5, May 2022, pp. 2313-2327.
IEEE DOI 2204
Visualization, Decoding, Training, Roads, Pipelines, Dictionaries, Semantics, Image captioning, scene graph, transfer learning, knowledge distillation BibRef

Yang, X.[Xu], Zhang, H.W.[Han-Wang], Cai, J.F.[Jian-Fei],
Deconfounded Image Captioning: A Causal Retrospect,
PAMI(45), No. 11, November 2023, pp. 12996-13010.
IEEE DOI 2310
BibRef

Yang, X.[Xu], Tang, K.[Kaihua], Zhang, H.W.[Han-Wang], Cai, J.F.[Jian-Fei],
Auto-Encoding Scene Graphs for Image Captioning,
CVPR19(10677-10686).
IEEE DOI 2002
BibRef

Yang, Z.P.[Zuo-Peng], Wang, P.B.[Peng-Bo], Chu, T.S.[Tian-Shu], Yang, J.[Jie],
Human-Centric Image Captioning,
PR(126), 2022, pp. 108545.
Elsevier DOI 2204
Human-centric, Image captioning, Feature hierarchization BibRef

Li, X.[Xuan], Zhang, W.K.[Wen-Kai], Sun, X.[Xian], Gao, X.[Xin],
Without detection: Two-step clustering features with local-global attention for image captioning,
IET-CV(16), No. 3, 2022, pp. 280-294.
DOI Link 2204
BibRef

Yu, L.T.[Li-Tao], Zhang, J.[Jian], Wu, Q.[Qiang],
Dual Attention on Pyramid Feature Maps for Image Captioning,
MultMed(24), No. 2022, pp. 1775-1786.
IEEE DOI 2204
Visualization, Decoding, Task analysis, Semantics, Feature extraction, Context modeling, Image captioning, pyramid attention BibRef

Zhang, M.[Min], Chen, J.X.[Jing-Xiang], Li, P.F.[Peng-Fei], Jiang, M.[Ming], Zhou, Z.[Zhe],
Topic scene graphs for image captioning,
IET-CV(16), No. 4, 2022, pp. 364-375.
DOI Link 2205
natural language processing BibRef

Yu, Q.[Qiang], Zhang, C.X.[Chun-Xia], Weng, L.[Lubin], Xiang, S.M.[Shi-Ming], Pan, C.H.[Chun-Hong],
Scene captioning with deep fusion of images and point clouds,
PRL(158), 2022, pp. 9-15.
Elsevier DOI 2205
Scene captioning, Point cloud, Deep fusion BibRef

Chaudhari, C.P.[Chaitrali Prasanna], Devane, S.[Satish],
Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation,
IJIG(22), No. 2, April 2022, pp. 2250021.
DOI Link 2205
BibRef

Shao, X.J.[Xiang-Jun], Xiang, Z.L.[Zheng-Long], Li, Y.X.[Yuan-Xiang], Zhang, M.J.[Ming-Jie],
Variational joint self-attention for image captioning,
IET-IPR(16), No. 8, 2022, pp. 2075-2086.
DOI Link 2205
BibRef

Li, Y.C.[Yao-Chen], Wu, C.[Chuan], Li, L.[Ling], Liu, Y.H.[Yue-Hu], Zhu, J.[Jihua],
Caption Generation From Road Images for Traffic Scene Modeling,
ITS(23), No. 7, July 2022, pp. 7805-7816.
IEEE DOI 2207
Semantics, Roads, Visualization, Feature extraction, Image reconstruction, Vehicle dynamics, Geometric analysis, visual relationship detection BibRef

Wang, Y.H.[Yan-Hui], Xu, N.[Ning], Liu, A.A.[An-An], Li, W.H.[Wen-Hui], Zhang, Y.D.[Yong-Dong],
High-Order Interaction Learning for Image Captioning,
CirSysVideo(32), No. 7, July 2022, pp. 4417-4430.
IEEE DOI 2207
Visualization, Semantics, Feature extraction, Decoding, Task analysis, Ions, Encoding, Image captioning, encoder-decoder framework BibRef

Guo, D.D.[Dan-Dan], Lu, R.Y.[Rui-Ying], Chen, B.[Bo], Zeng, Z.Q.[Ze-Qun], Zhou, M.Y.[Ming-Yuan],
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning,
IJCV(130), No. 8, August 2022, pp. 1920-1937.
Springer DOI 2207
BibRef

Demirel, B.[Berkan], Cinbis, R.G.[Ramazan Gokberk],
Caption generation on scenes with seen and unseen object categories,
IVC(124), 2022, pp. 104515.
Elsevier DOI 2208
Zero-shot learning, Zero-shot image captioning BibRef

Liu, Z.Y.[Zong-Yin], Dong, A.M.[An-Ming], Yu, J.G.[Ji-Guo], Han, Y.B.[Yu-Bing], Zhou, Y.[You], Zhao, K.[Kai],
Scene classification for remote sensing images with self-attention augmented CNN,
IET-IPR(16), No. 11, 2022, pp. 3085-3096.
DOI Link 2208
BibRef

Wu, X.X.[Xin-Xiao], Zhao, W.T.[Wen-Tian], Luo, J.B.[Jie-Bo],
Learning Cooperative Neural Modules for Stylized Image Captioning,
IJCV(130), No. 9, September 2022, pp. 2305-2320.
Springer DOI 2208
BibRef

Zhou, H.[Haonan], Du, X.P.[Xiao-Ping], Xia, L.[Lurui], Li, S.[Sen],
Self-Learning for Few-Shot Remote Sensing Image Captioning,
RS(14), No. 18, 2022, pp. xx-yy.
DOI Link 2209
BibRef

Stefanini, M.[Matteo], Cornia, M.[Marcella], Baraldi, L.[Lorenzo], Cascianelli, S.[Silvia], Fiameni, G.[Giuseppe], Cucchiara, R.[Rita],
From Show to Tell: A Survey on Deep Learning-Based Image Captioning,
PAMI(45), No. 1, January 2023, pp. 539-559.
IEEE DOI 2212
Survey, Image Captions. Visualization, Feature extraction, Task analysis, Convolutional neural networks, Additives, Image coding, Training BibRef

Wu, Y.[Yu], Jiang, L.[Lu], Yang, Y.[Yi],
Switchable Novel Object Captioner,
PAMI(45), No. 1, January 2023, pp. 1162-1173.
IEEE DOI 2212
Training, Visualization, Switches, Task analysis, Training data, Decoding, Convolutional neural networks, Image captioning, zero-shot learning BibRef

Yang, X.[Xu], Zhang, H.W.[Han-Wang], Gao, C.Y.[Chong-Yang], Cai, J.F.[Jian-Fei],
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning,
IJCV(131), No. 1, January 2023, pp. 82-100.
Springer DOI 2301
BibRef
Earlier: A1, A2, A4, Only:
Learning to Collocate Neural Modules for Image Captioning,
ICCV19(4249-4259)
IEEE DOI 2004
image processing, learning (artificial intelligence), natural language processing, neural nets, Neural networks BibRef

Ma, Y.W.[Yi-Wei], Ji, J.Y.[Jia-Yi], Sun, X.S.[Xiao-Shuai], Zhou, Y.[Yiyi], Ji, R.R.[Rong-Rong],
Towards local visual modeling for image captioning,
PR(138), 2023, pp. 109420.
Elsevier DOI 2303
Image captioning, Attention mechanism, Local visual modeling BibRef

Barati, A.[Alireza], Farsi, H.[Hassan], Mohamadzadeh, S.[Sajad],
Integration of the latent variable knowledge into deep image captioning with Bayesian modeling,
IET-IPR(17), No. 7, 2023, pp. 2256-2271.
DOI Link 2305
attention mechanism, automatic image captioning, deep neural networks, high-level semantic concepts, latent variable BibRef

Feng, J.L.[Jun-Long], Zhao, J.P.[Jian-Ping],
Effectively Utilizing the Category Labels for Image Captioning,
IEICE(E106-D), No. 5, May 2023, pp. 617-624.
WWW Link. 2305
BibRef

Wang, D.P.[De-Peng], Hu, Z.Z.[Zhen-Zhen], Zhou, Y.[Yuanen], Hong, R.C.[Ri-Chang], Wang, M.[Meng],
A Text-Guided Generation and Refinement Model for Image Captioning,
MultMed(25), 2023, pp. 2966-2977.
IEEE DOI 2309
BibRef

Wang, Q.[Qi], Huang, W.[Wei], Zhang, X.T.[Xue-Ting], Li, X.L.[Xue-Long],
GLCM: Global-Local Captioning Model for Remote Sensing Image Captioning,
Cyber(53), No. 11, November 2023, pp. 6910-6922.
IEEE DOI 2310
BibRef

Ji, J.Y.[Jia-Yi], Huang, X.Y.[Xiao-Yang], Sun, X.S.[Xiao-Shuai], Zhou, Y.[Yiyi], Luo, G.[Gen], Cao, L.J.[Liu-Juan], Liu, J.Z.[Jian-Zhuang], Shao, L.[Ling], Ji, R.R.[Rong-Rong],
Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning,
MultMed(25), 2023, pp. 3962-3974.
IEEE DOI 2310
BibRef

Cornia, M.[Marcella], Baraldi, L.[Lorenzo], Tal, A.[Ayellet], Cucchiara, R.[Rita],
Fully-attentive iterative networks for region-based controllable image and video captioning,
CVIU(237), 2023, pp. 103857.
Elsevier DOI 2311
Controllable captioning, Image captioning, Video captioning, Vision-and-language BibRef

Al-Qatf, M.[Majjed], Wang, X.[Xingfu], Hawbani, A.[Ammar], Abdussalam, A.[Amr], Alsamhi, S.H.[Saeed Hammod],
Image Captioning With Novel Topics Guidance and Retrieval-Based Topics Re-Weighting,
MultMed(25), 2023, pp. 5984-5999.
IEEE DOI 2311
BibRef

Zhu, P.P.[Pei-Pei], Wang, X.[Xiao], Luo, Y.[Yong], Sun, Z.L.[Zheng-Long], Zheng, W.S.[Wei-Shi], Wang, Y.[Yaowei], Chen, C.[Changwen],
Unpaired Image Captioning by Image-Level Weakly-Supervised Visual Concept Recognition,
MultMed(25), 2023, pp. 6702-6716.
IEEE DOI 2311
BibRef

Hu, N.N.[Nan-Nan], Ming, Y.[Yue], Fan, C.X.[Chun-Xiao], Feng, F.[Fan], Lyu, B.Y.[Bo-Yang],
TSFNet: Triple-Steam Image Captioning,
MultMed(25), 2023, pp. 6904-6916.
IEEE DOI 2311
BibRef

González-Chávez, O.[Othón], Ruiz, G.[Guillermo], Moctezuma, D.[Daniela], Ramirez-delReal, T.[Tania],
Are metrics measuring what they should? An evaluation of Image Captioning task metrics,
SP:IC(120), 2024, pp. 117071.
Elsevier DOI 2312
Metrics, Image Captioning, Image understanding, Language model BibRef

Padate, R.[Roshni], Jain, A.[Amit], Kalla, M.[Mukesh], Sharma, A.[Arvind],
A Widespread Assessment and Open Issues on Image Captioning Models,
IJIG(23), No. 6 2023, pp. 2350057.
DOI Link 2312
BibRef

Shao, Z.[Zhuang], Han, J.G.[Jun-Gong], Debattista, K.[Kurt], Pang, Y.W.[Yan-Wei],
Textual Context-Aware Dense Captioning With Diverse Words,
MultMed(25), 2023, pp. 8753-8766.
IEEE DOI 2312
BibRef

Cheng, J.[Jun], Wu, F.[Fuxiang], Liu, L.[Liu], Zhang, Q.[Qieshi], Rutkowski, L.[Leszek], Tao, D.C.[Da-Cheng],
InDecGAN: Learning to Generate Complex Images From Captions via Independent Object-Level Decomposition and Enhancement,
MultMed(25), 2023, pp. 8279-8293.
IEEE DOI 2312
BibRef

Ding, N.[Ning], Deng, C.R.[Chao-Rui], Tan, M.K.[Ming-Kui], Du, Q.[Qing], Ge, Z.W.[Zhi-Wei], Wu, Q.[Qi],
Image Captioning With Controllable and Adaptive Length Levels,
PAMI(46), No. 2, February 2024, pp. 764-779.
IEEE DOI 2401
Length-controllable image captioning, non-autoregressive image captioning, length level reranking, refinement-enhanced sequence training BibRef

Xu, G.H.[Guang-Hui], Niu, S.C.[Shuai-Cheng], Tan, M.K.[Ming-Kui], Luo, Y.C.[Yu-Cheng], Du, Q.[Qing], Wu, Q.[Qi],
Towards Accurate Text-based Image Captioning with Content Diversity Exploration,
CVPR21(12632-12641)
IEEE DOI 2111
Visualization, Image resolution, Benchmark testing, Proposals, Optical character recognition software BibRef

Zhu, P.P.[Pei-Pei], Wang, X.[Xiao], Zhu, L.[Lin], Sun, Z.L.[Zheng-Long], Zheng, W.S.[Wei-Shi], Wang, Y.[Yaowei], Chen, C.W.[Chang-Wen],
Prompt-Based Learning for Unpaired Image Captioning,
MultMed(26), 2024, pp. 379-393.
IEEE DOI 2402
Measurement, Semantics, Task analysis, Visualization, Adversarial machine learning, Correlation, Training, Metric prompt, unpaired image captioning BibRef

Liu, A.A.[An-An], Zhai, Y.C.[Ying-Chen], Xu, N.[Ning], Tian, H.[Hongshuo], Nie, W.Z.[Wei-Zhi], Zhang, Y.D.[Yong-Dong],
Event-Aware Retrospective Learning for Knowledge-Based Image Captioning,
MultMed(26), 2024, pp. 4898-4911.
IEEE DOI 2404
Visualization, Knowledge engineering, Knowledge based systems, Correlation, Semantics, Genomics, Bioinformatics, Image captioning, retrospective learning BibRef

Song, L.F.[Li-Fei], Li, F.[Fei], Wang, Y.[Ying], Liu, Y.[Yu], Wang, Y.[Yuanhua], Xiang, S.M.[Shi-Ming],
Image captioning: Semantic selection unit with stacked residual attention,
IVC(144), 2024, pp. 104965.
Elsevier DOI 2404
Image captioning, Semantic attributes, Semantic selection unit, Transformer, Stacked residual attention BibRef

Ajankar, S.[Sonali], Dutta, T.[Tanima],
Image-Relevant Entities Knowledge-Aware News Image Captioning,
MultMedMag(31), No. 1, January 2024, pp. 88-98.
IEEE DOI 2404
Decoding, Task analysis, Feature extraction, Visualization, Encoding, Internet, Encyclopedias, Publishing, Image capture, Online services, Multisensory integration BibRef

Dai, Z.Z.[Zhuang-Zhuang], Tran, V.[Vu], Markham, A.[Andrew], Trigoni, N.[Niki], Rahman, M.A.[M. Arif], Wijayasingha, L.N.S., Stankovic, J.[John], Li, C.[Chen],
EgoCap and EgoFormer: First-person image captioning with context fusion,
PRL(181), 2024, pp. 50-56.
Elsevier DOI Code:
WWW Link. 2405
Image captioning, Storytelling, Dataset BibRef

Shao, Z.[Zhuang], Han, J.G.[Jun-Gong], Debattista, K.[Kurt], Pang, Y.W.[Yan-Wei],
DCMSTRD: End-to-end Dense Captioning via Multi-Scale Transformer Decoding,
MultMed(26), 2024, pp. 7581-7593.
IEEE DOI 2405
Decoding, Transformers, Visualization, Feature extraction, Task analysis, Computer architecture, Training, Dense captioning, multi-scale language decoder (MSLD) BibRef

Cornia, M.[Marcella], Baraldi, L.[Lorenzo], Fiameni, G.[Giuseppe], Cucchiara, R.[Rita],
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets,
IJCV(132), No. 5, May 2024, pp. 1701-1720.
Springer DOI 2405
BibRef

Barraco, M.[Manuele], Sarto, S.[Sara], Cornia, M.[Marcella], Baraldi, L.[Lorenzo], Cucchiara, R.[Rita],
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning,
ICCV23(3009-3019)
IEEE DOI Code:
WWW Link. 2401
BibRef

Barraco, M.[Manuele], Stefanini, M.[Matteo], Cornia, M.[Marcella], Cascianelli, S.[Silvia], Baraldi, L.[Lorenzo], Cucchiara, R.[Rita],
CaMEL: Mean Teacher Learning for Image Captioning,
ICPR22(4087-4094)
IEEE DOI 2212
Training, Measurement, Knowledge engineering, Visualization, Source coding, Natural languages, Feature extraction BibRef

Cornia, M.[Marcella], Baraldi, L.[Lorenzo], Cucchiara, R.[Rita],
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions,
CVPR19(8299-8308).
IEEE DOI 2002
BibRef

Wang, L.X.[Lan-Xiao], Qiu, H.Q.[He-Qian], Qiu, B.[Benliu], Meng, F.M.[Fan-Man], Wu, Q.B.[Qing-Bo], Li, H.L.[Hong-Liang],
TridentCap: Image-Fact-Style Trident Semantic Framework for Stylized Image Captioning,
CirSysVideo(34), No. 5, May 2024, pp. 3563-3575.
IEEE DOI Code:
WWW Link. 2405
Semantics, Decoding, Dogs, Task analysis, Feature extraction, Annotations, Visualization, Stylized image captioning, pseudo labels filter BibRef

Zhang, H.[Haonan], Zeng, P.P.[Peng-Peng], Gao, L.[Lianli], Lyu, X.Y.[Xin-Yu], Song, J.K.[Jing-Kuan], Shen, H.T.[Heng Tao],
SPT: Spatial Pyramid Transformer for Image Captioning,
CirSysVideo(34), No. 6, June 2024, pp. 4829-4842.
IEEE DOI Code:
WWW Link. 2406
Transformers, Visualization, Feature extraction, Semantics, Decoding, Task analysis, Spatial resolution, Image captioning, clustering BibRef

Wang, H.Y.[Heng-You], Song, K.[Kani], Jiang, X.[Xiang], He, Z.Q.[Zhi-Quan],
ragBERT: Relationship-aligned and grammar-wise BERT model for image captioning,
IVC(148), 2024, pp. 105105.
Elsevier DOI 2407
Image captioning, Relationship tags, Grammar, BERT BibRef

Li, J.Y.[Jing-Yu], Zhang, L.[Lei], Zhang, K.[Kun], Hu, B.[Bo], Xie, H.T.[Hong-Tao], Mao, Z.D.[Zhen-Dong],
Cascade Semantic Prompt Alignment Network for Image Captioning,
CirSysVideo(34), No. 7, July 2024, pp. 5266-5281.
IEEE DOI Code:
WWW Link. 2407
Semantics, Visualization, Feature extraction, Detectors, Integrated circuit modeling, Transformers, Task analysis, prompt BibRef

Cheng, Q.[Qimin], Xu, Y.Q.[Yu-Qi], Huang, Z.Y.[Zi-Yang],
VCC-DiffNet: Visual Conditional Control Diffusion Network for Remote Sensing Image Captioning,
RS(16), No. 16, 2024, pp. 2961.
DOI Link 2408
BibRef

Zou, Y.[Yang], Liao, S.Y.[Shi-Yu], Wang, Q.F.[Qi-Fei],
Chinese image captioning with fusion encoder and visual keyword search,
IET-IPR(18), No. 11, 2024, pp. 3055-3069.
DOI Link 2409
Chinese image captioning, fusion encoder, image retrieval, sentence-level optimization, visual keyword search BibRef

Chen, S.J.[Si-Jin], Zhu, H.Y.[Hong-Yuan], Li, M.S.[Ming-Sheng], Chen, X.[Xin], Guo, P.[Peng], Lei, Y.J.[Yig-Jie], Yu, G.[Gang], Li, T.[Taihao], Chen, T.[Tao],
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning,
PAMI(46), No. 11, November 2024, pp. 7331-7347.
IEEE DOI 2410
BibRef
Earlier: A1, A2, A4, A6, A7, A9, Only:
End-to-End 3D Dense Captioning with Vote2Cap-DETR,
CVPR23(11124-11133)
IEEE DOI 2309
Location awareness, Task analysis, Transformers, Solid modeling, Decoding, Pipelines, 3D dense captioning, 3D scene understanding, transformers BibRef

Lv, F.X.[Fei-Xiao], Wang, R.[Rui], Jing, L.H.[Li-Hua], Dai, P.W.[Peng-Wen],
HIST: Hierarchical and sequential transformer for image captioning,
IET-CV(18), No. 7, 2024, pp. 1043-1056.
DOI Link 2411
computer vision, feature extraction, learning (artificial intelligence), neural nets BibRef

Du, R.[Runyan], Zhang, W.K.[Wen-Kai], Li, S.[Shuoke], Chen, J.L.[Jia-Liang], Guo, Z.[Zhi],
Spatial guided image captioning: Guiding attention with object's spatial interaction,
IET-IPR(18), No. 12, 2024, pp. 3368-3380.
DOI Link 2411
image representation, image texture BibRef

Li, Y.P.[Yun-Peng], Zhang, X.R.[Xiang-Rong], Zhang, T.Y.[Tian-Yang], Wang, G.C.[Guan-Chun], Wang, X.L.[Xin-Lin], Li, S.[Shuo],
A Patch-Level Region-Aware Module with a Multi-Label Framework for Remote Sensing Image Captioning,
RS(16), No. 21, 2024, pp. 3987.
DOI Link 2411
BibRef

Zhang, K.[Ke], Li, P.[Peijie], Wang, J.Q.[Jian-Qiang],
A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions,
RS(16), No. 21, 2024, pp. 4113.
DOI Link 2411
BibRef


Udo, H.[Honori], Koshinaka, T.[Takafumi],
Reading is Believing: Revisiting Language Bottleneck Models for Image Classification,
ICIP24(943-949)
IEEE DOI 2411
Deep learning, Accuracy, Disasters, Closed box, Feature extraction, Transformers, Task analysis, Vision and Language, image captioning, Vision Transformer BibRef

Das, S.[Subham], Sekhar, C.C.[C. Chandra],
Leveraging Generated Image Captions for Visual Commonsense Reasoning,
ICIP24(2508-2514)
IEEE DOI 2411
Visualization, Accuracy, Image color analysis, Semantics, Natural languages, Transformers, Vision and Language Transformers BibRef

Chaffin, A.[Antoine], Kijak, E.[Ewa], Claveau, V.[Vincent],
Distinctive Image Captioning: Leveraging Ground Truth Captions in Clip Guided Reinforcement Learning,
ICIP24(2550-2556)
IEEE DOI 2411
Training, Vocabulary, Costs, Grounding, Computational modeling, Reinforcement learning, Image captioning, Cross-modal retrieval, Reinforcement learning BibRef

Jeong, K.[Kiyoon], Lee, W.[Woojun], Nam, W.[Woongchan], Ma, M.[Minjeong], Kang, P.[Pilsung],
Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores,
NICE24(7366-7372)
IEEE DOI Code:
WWW Link. 2410
Measurement, Training, Semantics, Pipelines, Writing, Caption Reranking, Image Captioning, Caption Evaluation BibRef

Kim, T.[Taehoon], Marsden, M.[Mark], Ahn, P.[Pyunghwan], Kim, S.[Sangyun], Lee, S.[Sihaeng], Sala, A.[Alessandra], Kim, S.H.[Seung Hwan],
Large-Scale Bidirectional Training for Zero-Shot Image Captioning,
NICE24(7373-7383)
IEEE DOI 2410
Training, Measurement, Accuracy, Computer architecture, Feature extraction, Data models, zero-shot, image captioning, vision-language BibRef

Kim, T.[Taehoon], Ahn, P.[Pyunghwan], Kim, S.[Sangyun], Lee, S.[Sihaeng], Marsden, M.[Mark], Sala, A.[Alessandra], Kim, S.H.[Seung Hwan], Han, B.H.[Bo-Hyung], Lee, K.M.[Kyoung Mu], Lee, H.L.[Hong-Lak], Bae, K.[Kyounghoon], Wu, X.Y.[Xiang-Yu], Gao, Y.[Yi], Zhang, H.L.[Hai-Liang], Yang, Y.[Yang], Guo, W.[Weili], Lu, J.F.[Jian-Feng], Oh, Y.[Youngtaek], Cho, J.W.[Jae Won], Kim, D.J.[Dong-Jin], Kweon, I.S.[In So], Kim, J.[Junmo], Kang, W.[Wooyoung], Jhoo, W.Y.[Won Young], Roh, B.[Byungseok], Mun, J.[Jonghwan], Oh, S.[Solgil], Ak, K.E.[Kenan Emir], Lee, G.G.[Gwang-Gook], Xu, Y.[Yan], Shen, M.W.[Ming-Wei], Hwang, K.[Kyomin], Shin, W.S.[Won-Sik], Lee, K.[Kamin], Park, W.[Wonhark], Lee, D.[Dongkwan], Kwak, N.[Nojun], Wang, Y.J.[Yu-Jin], Wang, Y.[Yimu], Gu, T.C.[Tian-Cheng], Lv, X.C.[Xing-Chang], Sun, M.[Mingmao],
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning,
NICE24(7356-7365)
IEEE DOI 2410
Training, Adaptation models, Visualization, Computational modeling, Training data, Image captioning, Vision-language models, Multimodal representation BibRef

Urbanek, J.[Jack], Bordes, F.[Florian], Astolfi, P.[Pietro], Williamson, M.[Mary], Sharma, V.[Vasu], Romero-Soriano, A.[Adriana],
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions,
CVPR24(26690-26699)
IEEE DOI 2410
Training, Visualization, Computational modeling, Benchmark testing, Reliability BibRef

Nebbia, G.[Giacomo], Kovashka, A.[Adriana],
Image-caption difficulty for efficient weakly-supervised object detection from in-the-wild data,
L3D-IVU24(2596-2605)
IEEE DOI 2410
Training, Deep learning, Costs, Filtering, Computational modeling BibRef

Sakaino, H.[Hidetomo], Phuong, T.N.[Thao Nguyen], Duy, V.N.[Vinh Nguyen],
PV-Cap: 3D Dynamic Scene Understanding Through Open Physics-based Vocabulary,
AICity24(7932-7942)
IEEE DOI 2410
Deep learning, Training, Solid modeling, Vocabulary, Roads, Cameras, open vocabulary, 3D events, outdoor scene, dynamic caption, 3D-CPP, natural scene BibRef

Kong, F.[Fanjie], Chen, Y.B.[Yan-Bei], Cai, J.R.[Jia-Rui], Modolo, D.[Davide],
Hyperbolic Learning with Synthetic Captions for Open-World Detection,
CVPR24(16762-16771)
IEEE DOI 2410
Training, Visualization, Grounding, Noise, Detectors, Object detection, Hyperbolic Learning, Open-World, Detection, Synthetic Captions BibRef

Zeng, Z.Q.[Ze-Qun], Xie, Y.[Yan], Zhang, H.[Hao], Chen, C.[Chiyu], Chen, B.[Bo], Wang, Z.J.[Zheng-Jue],
MeaCap: Memory-Augmented Zero-shot Image Captioning,
CVPR24(14100-14110)
IEEE DOI Code:
WWW Link. 2410
Measurement, Codes, Accuracy, Integrated circuit modeling, zero-shot image captioning, hallucination BibRef

Wada, Y.[Yuiga], Kaneda, K.[Kanta], Saito, D.[Daichi], Sugiura, K.[Komei],
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning,
CVPR24(13559-13568)
IEEE DOI 2410
Measurement, Correlation, Computational modeling, Contrastive learning, Benchmark testing, Feature extraction, human feedback BibRef

Huang, X.K.[Xiao-Ke], Wang, J.F.[Jian-Feng], Tang, Y.S.[Yan-Song], Zhang, Z.[Zheng], Hu, H.[Han], Lu, J.W.[Ji-Wen], Wang, L.J.[Li-Juan], Liu, Z.C.[Zi-Cheng],
Segment and Caption Anything,
CVPR24(13405-13417)
IEEE DOI 2410
Training, Costs, Computational modeling, Semantics, Memory management, Object detection, Segmentation, Image Captioning BibRef

Ge, Y.H.[Yun-Hao], Zeng, X.H.[Xiao-Hui], Huffman, J.S.[Jacob Samuel], Lin, T.Y.[Tsung-Yi], Liu, M.Y.[Ming-Yu], Cui, Y.[Yin],
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation,
CVPR24(14033-14042)
IEEE DOI 2410
Visualization, Solid modeling, Computational modeling, Pipelines, Text to image, Object detection, captioning, LLM BibRef

Ruan, J.[Jie], Wu, Y.[Yue], Wan, X.J.[Xiao-Jun], Zhu, Y.S.[Yue-Sheng],
Describe Images in a Boring Way: Towards Cross-Modal Sarcasm Generation,
WACV24(5689-5698)
IEEE DOI 2404
Correlation, Codes, Training data, Data mining, Algorithms, Vision + language and/or other modalities BibRef

Hirsch, E.[Elad], Tal, A.[Ayellet],
CLID: Controlled-Length Image Descriptions with Limited Data,
WACV24(5519-5529)
IEEE DOI 2404
Training, Codes, Data models, Algorithms, Vision + language and/or other modalities BibRef

Petryk, S.[Suzanne], Whitehead, S.[Spencer], Gonzalez, J.E.[Joseph E.], Darrell, T.J.[Trevor J.], Rohrbach, A.[Anna], Rohrbach, M.[Marcus],
Simple Token-Level Confidence Improves Caption Correctness,
WACV24(5730-5740)
IEEE DOI 2404
Aggregates, Training data, Cognition, Data models, Algorithms, Vision + language and/or other modalities BibRef

Sabir, A.[Ahmed],
Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned,
MVA23(1-5)
DOI Link 2403
Visualization, Machine vision, Semantics, Context modeling BibRef

Verma, A.[Anand], Agarwal, S.[Saurabh], Arya, K.V., Petrlik, I.[Ivan], Esparza, R.[Roberto], Rodriguez, C.[Ciro],
Image Captioning with Reinforcement Learning,
ICCVMI23(1-7)
IEEE DOI 2403
Measurement, Training, Machine learning algorithms, Reinforcement learning, SPICE, MS COCO BibRef

Wei, Y.C.[Yi-Chao], Li, L.[Lin], Geng, S.L.[Sheng-Ling],
Remote Sensing Image Captioning Using Hire-MLP,
CVIDL23(109-112)
IEEE DOI 2403
Measurement, Deep learning, Visualization, Image recognition, Computational modeling, Neural networks, Feature extraction BibRef

Fan, J.[Jiashuo], Liang, Y.[Yaoyuan], Liu, L.[Leyao], Huang, S.[Shaolun], Zhang, L.[Lei],
RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning,
ICCV23(15464-15474)
IEEE DOI 2401
BibRef

Li, R.[Runjia], Sun, S.Y.[Shu-Yang], Elhoseiny, M.[Mohamed], Torr, P.[Philip],
OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?,
ICCV23(20236-20246)
IEEE DOI 2401
BibRef

Hu, A.[Anwen], Chen, S.Z.[Shi-Zhe], Zhang, L.[Liang], Jin, Q.[Qin],
Explore and Tell: Embodied Visual Captioning in 3D Environments,
ICCV23(2482-2491)
IEEE DOI Code:
WWW Link. 2401
BibRef

Kang, W.[Wooyoung], Mun, J.[Jonghwan], Lee, S.J.[Sung-Jun], Roh, B.[Byungseok],
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning,
ICCV23(2930-2940)
IEEE DOI Code:
WWW Link. 2401
BibRef

Fei, J.J.[Jun-Jie], Wang, T.[Teng], Zhang, J.[Jinrui], He, Z.Y.[Zhen-Yu], Wang, C.J.[Cheng-Jie], Zheng, F.[Feng],
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning,
ICCV23(3113-3123)
IEEE DOI Code:
WWW Link. 2401
BibRef

Kornblith, S.[Simon], Li, L.[Lala], Wang, Z.[Zirui], Nguyen, T.[Thao],
Guiding image captioning models toward more specific captions,
ICCV23(15213-15223)
IEEE DOI 2401
BibRef

Kim, Y.[Yeonju], Kim, J.[Junho], Lee, B.K.[Byung-Kwan], Shin, S.[Sebin], Ro, Y.M.[Yong Man],
Mitigating Dataset Bias in Image Captioning Through Clip Confounder-Free Captioning Network,
ICIP23(1720-1724)
IEEE DOI Code:
WWW Link. 2312
BibRef

Dessì, R.[Roberto], Bevilacqua, M.[Michele], Gualdoni, E.[Eleonora], Rakotonirina, N.C.[Nathanaël Carraz], Franzon, F.[Francesca], Baroni, M.[Marco],
Cross-Domain Image Captioning with Discriminative Finetuning,
CVPR23(6935-6944)
IEEE DOI 2309
BibRef

Vo, D.M.[Duc Minh], Luong, Q.A.[Quoc-An], Sugimoto, A.[Akihiro], Nakayama, H.[Hideki],
A-CAP: Anticipation Captioning with Commonsense Knowledge,
CVPR23(10824-10833)
IEEE DOI 2309
BibRef

Kuo, C.W.[Chia-Wen], Kira, Z.[Zsolt],
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning,
CVPR23(11039-11049)
IEEE DOI 2309
BibRef

Ramos, R.[Rita], Martins, B.[Bruno], Elliott, D.[Desmond], Kementchedjhieva, Y.[Yova],
Smallcap: Lightweight Image Captioning Prompted with Retrieval Augmentation,
CVPR23(2840-2849)
IEEE DOI 2309
BibRef

Hirota, Y.[Yusuke], Nakashima, Y.[Yuta], Garcia, N.[Noa],
Model-Agnostic Gender Debiased Image Captioning,
CVPR23(15191-15200)
IEEE DOI 2309
BibRef

Tran, H.T.T.[Huyen Thi Thanh], Okatani, T.[Takayuki],
Bright as the Sun: In-depth Analysis of Imagination-driven Image Captioning,
ACCV22(IV:675-691).
Springer DOI 2307
BibRef

Phueaksri, I.[Itthisak], Kastner, M.A.[Marc A.], Kawanishi, Y.[Yasutomo], Komamizu, T.[Takahiro], Ide, I.[Ichiro],
Towards Captioning an Image Collection from a Combined Scene Graph Representation Approach,
MMMod23(I: 178-190).
Springer DOI 2304
BibRef

Zhang, Y.[Youyuan], Wang, J.[Jiuniu], Wu, H.[Hao], Xu, W.J.[Wen-Jia],
Distinctive Image Captioning via Clip Guided Group Optimization,
CMHRI22(223-238).
Springer DOI 2304
BibRef

Qiu, Y.[Yue], Yamamoto, S.[Shintaro], Yamada, R.[Ryosuke], Suzuki, R.[Ryota], Kataoka, H.[Hirokatsu], Iwata, K.[Kenji], Satoh, Y.[Yutaka],
3D Change Localization and Captioning from Dynamic Scans of Indoor Scenes,
WACV23(1176-1185)
IEEE DOI 2302
Location awareness, Point cloud compression, Image recognition, Limiting, Detectors, Benchmark testing, 3D computer vision BibRef

Honda, U.[Ukyo], Watanabe, T.[Taro], Matsumoto, Y.[Yuji],
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning,
WACV23(1124-1134)
IEEE DOI 2302
Vocabulary, Limiting, Computational modeling, Switches, Reinforcement learning, Control systems, visual reasoning BibRef

Sui, J.H.[Jia-Hong], Yu, H.M.[Hui-Min], Liang, X.Y.[Xin-Yue], Ping, P.[Ping],
Image Caption Method Based on Graph Attention Network with Global Context,
ICIVC22(480-487)
IEEE DOI 2301
Deep learning, Visualization, Image coding, Semantics, Neural networks, Image representation, Feature extraction, global feature BibRef

Arguello, P.[Paula], Lopez, J.[Jhon], Hinojosa, C.[Carlos], Arguello, H.[Henry],
Optics Lens Design for Privacy-Preserving Scene Captioning,
ICIP22(3551-3555)
IEEE DOI 2211
Integrated optics, Privacy, Optical design, Optical distortion, Optical detectors, Optical imaging, Feature extraction, Computational Optics BibRef

Meng, Z.H.[Zi-Hang], Yang, D.[David], Cao, X.F.[Xue-Fei], Shah, A.[Ashish], Lim, S.N.[Ser-Nam],
Object-Centric Unsupervised Image Captioning,
ECCV22(XXXVI:219-235).
Springer DOI 2211
BibRef

Wang, Z.[Zhen], Chen, L.[Long], Ma, W.B.[Wen-Bo], Han, G.X.[Guang-Xing], Niu, Y.[Yulei], Shao, J.[Jian], Xiao, J.[Jun],
Explicit Image Caption Editing,
ECCV22(XXXVI:113-129).
Springer DOI 2211
BibRef

Jiao, Y.[Yang], Chen, S.X.[Shao-Xiang], Jie, Z.Q.[Ze-Qun], Chen, J.J.[Jing-Jing], Ma, L.[Lin], Jiang, Y.G.[Yu-Gang],
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes,
ECCV22(XXXV:528-545).
Springer DOI 2211
BibRef

Nagrani, A.[Arsha], Seo, P.H.[Paul Hongsuck], Seybold, B.[Bryan], Hauth, A.[Anja], Manen, S.[Santiago], Sun, C.[Chen], Schmid, C.[Cordelia],
Learning Audio-Video Modalities from Image Captions,
ECCV22(XIV:407-426).
Springer DOI 2211
BibRef

Tewel, Y.[Yoad], Shalev, Y.[Yoav], Schwartz, I.[Idan], Wolf, L.B.[Lior B.],
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic,
CVPR22(17897-17907)
IEEE DOI 2210
Knowledge engineering, Training, Measurement, Visualization, Text recognition, Semantics, Magnetic heads, Vision+language, Transfer/low-shot/long-tail learning BibRef

Truong, P.[Prune], Danelljan, M.[Martin], Yu, F.[Fisher], Van Gool, L.J.[Luc J.],
Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences,
CVPR22(8698-8708)
IEEE DOI 2210
Image resolution, Costs, Semantics, Computer architecture, Benchmark testing, Probabilistic logic, Motion and tracking, retrieval BibRef

Chan, D.M.[David M.], Myers, A.[Austin], Vijayanarasimhan, S.[Sudheendra], Ross, D.A.[David A.], Seybold, B.[Bryan], Canny, J.F.[John F.],
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics,
VDU22(4739-4748)
IEEE DOI 2210
Measurement, Visualization, Analytical models, Video description, Computational modeling, Training data, Linguistics BibRef

Popattia, M.[Murad], Rafi, M.[Muhammad], Qureshi, R.[Rizwan], Nawaz, S.[Shah],
Guiding Attention using Partial-Order Relationships for Image Captioning,
MULA22(4670-4679)
IEEE DOI 2210
Training, Measurement, Visualization, Semantics, Computer architecture BibRef

Mohamed, Y.[Youssef], Khan, F.F.[Faizan Farooq], Haydarov, K.[Kilichbek], Elhoseiny, M.[Mohamed],
It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection,
CVPR22(21231-21240)
IEEE DOI 2210
Measurement, Codes, Human intelligence, Data collection, Data models, Datasets and evaluation, Others, Vision + language BibRef

Chen, J.[Jun], Guo, H.[Han], Yi, K.[Kai], Li, B.Y.[Bo-Yang], Elhoseiny, M.[Mohamed],
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning,
CVPR22(18009-18019)
IEEE DOI 2210
Training, Representation learning, Adaptation models, Visualization, Computational modeling, Semantics, Linguistics, Transfer/low-shot/long-tail learning BibRef

Chen, S.[Simin], Song, Z.H.[Zi-He], Haque, M.[Mirazul], Liu, C.[Cong], Yang, W.[Wei],
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models,
CVPR22(15344-15353)
IEEE DOI 2210
Visualization, Computational modeling, Perturbation methods, Robustness, Real-time systems, Efficient learning and inferences BibRef

Hirota, Y.[Yusuke], Nakashima, Y.[Yuta], Garcia, N.[Noa],
Quantifying Societal Bias Amplification in Image Captioning,
CVPR22(13440-13449)
IEEE DOI 2210
Measurement, Equalizers, Computational modeling, Focusing, Predictive models, Skin, Transparency, fairness, accountability, Vision + language BibRef

Beddiar, D.[Djamila], Oussalah, M.[Mourad], Tapio, S.[Seppänen],
Explainability for Medical Image Captioning,
IPTA22(1-6)
IEEE DOI 2206
Visualization, Computational modeling, Semantics, Feature extraction, Decoding, Convolutional neural networks, Artificial Intelligence Explainability BibRef

Bounab, Y.[Yazid], Oussalah, M.[Mourad], Ferdenache, A.[Ahlam],
Reconciling Image Captioning and User's Comments for Urban Tourism,
IPTA20(1-6)
IEEE DOI 2206
Visualization, Databases, Tourism industry, Pipelines, Tools, Internet, Planning, Image captioning, social media, image description, google vision API BibRef

Zha, Z.W.[Zhi-Wei], Zhou, P.F.[Peng-Fei], Bai, C.[Cong],
Exploring Implicit and Explicit Relations with the Dual Relation-Aware Network for Image Captioning,
MMMod22(II:97-108).
Springer DOI 2203
BibRef

Ruta, D.[Dan], Motiian, S.[Saeid], Faieta, B.[Baldo], Lin, Z.[Zhe], Jin, H.L.[Hai-Lin], Filipkowski, A.[Alex], Gilbert, A.[Andrew], Collomosse, J.[John],
ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity,
ICCV21(11906-11915)
IEEE DOI 2203
Training, Representation learning, Visualization, Adaptation models, User-generated content, Neural generative models BibRef

Nguyen, K.[Kien], Tripathi, S.[Subarna], Du, B.[Bang], Guha, T.[Tanaya], Nguyen, T.Q.[Truong Q.],
In Defense of Scene Graphs for Image Captioning,
ICCV21(1387-1396)
IEEE DOI 2203
Convolutional codes, Visualization, Image coding, Semantics, Pipelines, Generators, Vision + language, Scene analysis and understanding BibRef

Shi, J.[Jiahe], Li, Y.[Yali], Wang, S.J.[Sheng-Jin],
Partial Off-policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning,
ICCV21(2167-2176)
IEEE DOI 2203
Correlation, Computational modeling, Reinforcement learning, Generative adversarial networks, Task analysis, BibRef

Alahmadi, R.[Rehab], Hahn, J.[James],
Improve Image Captioning by Estimating the Gazing Patterns from the Caption,
WACV22(2453-2462)
IEEE DOI 2202
Visualization, Computational modeling, Neural networks, Feature extraction, Vision and Languages Scene Understanding BibRef

Biten, A.F.[Ali Furkan], Gómez, L.[Lluís], Karatzas, D.[Dimosthenis],
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning,
WACV22(2473-2482)
IEEE DOI 2202
Measurement, Training, Visualization, Analytical models, Computational modeling, Training data, Vision and Languages BibRef

Deb, T.[Tonmoay], Sadmanee, A.[Akib], Bhaumik, K.K.[Kishor Kumar], Ali, A.A.[Amin Ahsan], Amin, M.A.[M Ashraful], Rahman, A.K.M.M.[A.K.M. Mahbubur],
Variational Stacked Local Attention Networks for Diverse Video Captioning,
WACV22(2493-2502)
IEEE DOI 2202
Measurement, Visualization, Stacking, Redundancy, Natural languages, Streaming media, Syntactics, Vision and Languages Datasets, Analysis and Understanding BibRef

Sharif, N.[Naeha], White, L.[Lyndon], Bennamoun, M.[Mohammed], Liu, W.[Wei], Shah, S.A.A.[Syed Afaq Ali],
WEmbSim: A Simple yet Effective Metric for Image Captioning,
DICTA20(1-8)
IEEE DOI 2201
Measurement, Correlation, Databases, Digital images, Machine learning, SPICE, Task analysis, Image Captioning, Word Embeddings BibRef

Qiu, J.Y.[Jia-Yan], Yang, Y.D.[Yi-Ding], Wang, X.[Xinchao], Tao, D.C.[Da-Cheng],
Scene Essence,
CVPR21(8318-8329)
IEEE DOI 2111
Image recognition, Graph neural networks, Labeling, Lenses BibRef

Hosseinzadeh, M.[Mehrdad], Wang, Y.[Yang],
Image Change Captioning by Learning from an Auxiliary Task,
CVPR21(2724-2733)
IEEE DOI 2111
Training, Image color analysis, Image retrieval, Semantics, Benchmark testing BibRef

Chen, L.[Long], Jiang, Z.H.[Zhi-Hong], Xiao, J.[Jun], Liu, W.[Wei],
Human-like Controllable Image Captioning with Verb-specific Semantic Roles,
CVPR21(16841-16851)
IEEE DOI 2111
Visualization, Codes, Semantics, Benchmark testing, Controllability BibRef

Chen, D.Z.Y.[Dave Zhen-Yu], Gholami, A.[Ali], Nießner, M.[Matthias], Chang, A.X.[Angel X.],
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans,
CVPR21(3192-3202)
IEEE DOI 2111
Location awareness, Message passing, Natural languages, Pipelines, Computer architecture, Object detection BibRef

Luong, Q.A.[Quoc-An], Vo, D.M.[Duc Minh], Sugimoto, A.[Akihiro],
Saliency based Subject Selection for Diverse Image Captioning,
MVA21(1-5)
DOI Link 2109
Measurement, Visualization, Diversity methods BibRef

Sharif, N.[Naeha], Bennamoun, M.[Mohammed], Liu, W.[Wei], Shah, S.A.A.[Syed Afaq Ali],
SubICap: Towards Subword-informed Image Captioning,
WACV21(3539-3540)
IEEE DOI 2106
Measurement, Training, Vocabulary, Image segmentation, Image color analysis, Computational modeling, Semantics BibRef

Umemura, K.[Kazuki], Kastner, M.A.[Marc A.], Ide, I.[Ichiro], Kawanishi, Y.[Yasutomo], Hirayama, T.[Takatsugu], Doman, K.[Keisuke], Deguchi, D.[Daisuke], Murase, H.[Hiroshi],
Tell as You Imagine: Sentence Imageability-aware Image Captioning,
MMMod21(II:62-73).
Springer DOI 2106
BibRef

Hallonquist, N.[Neil], German, D.[Donald], Younes, L.[Laurent],
Graph Discovery for Visual Test Generation,
ICPR21(7500-7507)
IEEE DOI 2105
Visualization, Vocabulary, Machine vision, Semantics, Image representation, Knowledge discovery, Probability distribution BibRef

Li, X.J.[Xin-Jie], Yang, C.[Chun], Chen, S.L.[Song-Lu], Zhu, C.[Chao], Yin, X.C.[Xu-Cheng],
Semantic Bilinear Pooling for Fine-Grained Recognition,
ICPR21(3660-3666)
IEEE DOI 2105
Training, Deep learning, Semantics, Birds, Testing, Semantic Information, Bilinear Pooling, Fine-Grained Recognition BibRef

Chavhan, R.[Ruchika], Banerjee, B.[Biplab], Zhu, X.X.[Xiao Xiang], Chaudhuri, S.[Subhasis],
A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning,
ICPR21(4918-4925)
IEEE DOI 2105
Training, Image coding, Reinforcement learning, Gain measurement, Benchmark testing, Optical imaging, Data models BibRef

Kalimuthu, M.[Marimuthu], Mogadala, A.[Aditya], Mosbach, M.[Marius], Klakow, D.[Dietrich],
Fusion Models for Improved Image Captioning,
MMDLCA20(381-395).
Springer DOI 2103
BibRef

Cetinic, E.[Eva],
Iconographic Image Captioning for Artworks,
FAPER20(502-516).
Springer DOI 2103
BibRef

Huang, Y.Q.[Yi-Qing], Chen, J.S.[Jian-Sheng],
Show, Conceive and Tell: Image Captioning with Prospective Linguistic Information,
ACCV20(VI:478-494).
Springer DOI 2103
BibRef

Deng, C.R.[Chao-Rui], Ding, N.[Ning], Tan, M.K.[Ming-Kui], Wu, Q.[Qi],
Length-controllable Image Captioning,
ECCV20(XIII:712-729).
Springer DOI 2011
BibRef

Gurari, D.[Danna], Zhao, Y.N.[Yi-Nan], Zhang, M.[Meng], Bhattacharya, N.[Nilavra],
Captioning Images Taken by People Who Are Blind,
ECCV20(XVII:417-434).
Springer DOI 2011
BibRef

Zhong, Y.W.[Yi-Wu], Wang, L.W.[Li-Wei], Chen, J.S.[Jian-Shu], Yu, D.[Dong], Li, Y.[Yin],
Comprehensive Image Captioning via Scene Graph Decomposition,
ECCV20(XIV:211-229).
Springer DOI 2011
BibRef

Wang, Z.[Zeyu], Feng, B.[Berthy], Narasimhan, K.[Karthik], Russakovsky, O.[Olga],
Towards Unique and Informative Captioning of Images,
ECCV20(VII:629-644).
Springer DOI 2011
BibRef

Sidorov, O.[Oleksii], Hu, R.H.[Rong-Hang], Rohrbach, M.[Marcus], Singh, A.[Amanpreet],
Textcaps: A Dataset for Image Captioning with Reading Comprehension,
ECCV20(II:742-758).
Springer DOI 2011
BibRef

Durand, T.[Thibaut],
Learning User Representations for Open Vocabulary Image Hashtag Prediction,
CVPR20(9766-9775)
IEEE DOI 2008
Tagging, Twitter, Computational modeling, Vocabulary, Predictive models, History, Visualization BibRef

Prabhudesai, M.[Mihir], Tung, H.Y.F.[Hsiao-Yu Fish], Javed, S.A.[Syed Ashar], Sieb, M.[Maximilian], Harley, A.W.[Adam W.], Fragkiadaki, K.[Katerina],
Embodied Language Grounding With 3D Visual Feature Representations,
CVPR20(2217-2226)
IEEE DOI 2008
Associating language utterances to 3D visual abstractions. Visualization, Cameras, Feature extraction, Detectors, Solid modeling BibRef

Li, Z., Tran, Q., Mai, L., Lin, Z., Yuille, A.L.,
Context-Aware Group Captioning via Self-Attention and Contrastive Features,
CVPR20(3437-3447)
IEEE DOI 2008
Task analysis, Visualization, Context modeling, Training, Natural languages, Computational modeling BibRef

Zhou, Y., Wang, M., Liu, D., Hu, Z., Zhang, H.,
More Grounded Image Captioning by Distilling Image-Text Matching Model,
CVPR20(4776-4785)
IEEE DOI 2008
Visualization, Grounding, Task analysis, Training, Measurement, Computational modeling, Image edge detection BibRef

Sammani, F., Melas-Kyriazi, L.,
Show, Edit and Tell: A Framework for Editing Image Captions,
CVPR20(4807-4815)
IEEE DOI 2008
Decoding, Visualization, Task analysis, Logic gates, Natural languages, Adaptation models, Glass BibRef

Chen, S., Jin, Q., Wang, P., Wu, Q.,
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs,
CVPR20(9959-9968)
IEEE DOI 2008
Semantics, Decoding, Visualization, Feature extraction, Controllability, Task analysis, Measurement BibRef

Guo, L., Liu, J., Zhu, X., Yao, P., Lu, S., Lu, H.,
Normalized and Geometry-Aware Self-Attention Network for Image Captioning,
CVPR20(10324-10333)
IEEE DOI 2008
Geometry, Task analysis, Visualization, Decoding, Training, Feature extraction, Computer architecture BibRef

Chen, J., Jin, Q.,
Better Captioning With Sequence-Level Exploration,
CVPR20(10887-10896)
IEEE DOI 2008
Task analysis, Measurement, Training, Computational modeling, Computer architecture, Portable computers, Decoding BibRef

Pan, Y., Yao, T., Li, Y., Mei, T.,
X-Linear Attention Networks for Image Captioning,
CVPR20(10968-10977)
IEEE DOI 2008
Visualization, Decoding, Cognition, Knowledge discovery, Task analysis, Aggregates, Weight measurement BibRef

Park, G.[Geondo], Han, C.[Chihye], Kim, D.[Daeshik], Yoon, W.J.[Won-Jun],
MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding,
WACV20(1507-1515)
IEEE DOI 2006
Feature extraction, Visualization, Semantics, Task analysis, Recurrent neural networks, Image representation, Image coding BibRef

Chen, C., Zhang, R., Koh, E., Kim, S., Cohen, S., Rossi, R.,
Figure Captioning with Relation Maps for Reasoning,
WACV20(1526-1534)
IEEE DOI 2006
Bars, Training, Visualization, Decoding, Computational modeling, Task analysis, Portable document format BibRef

He, S., Tavakoli, H.R., Borji, A., Pugeault, N.,
Human Attention in Image Captioning: Dataset and Analysis,
ICCV19(8528-8537)
IEEE DOI 2004
Code, Captioning.
WWW Link. convolutional neural nets, image segmentation, natural language processing, object detection, visual perception, Adaptation models BibRef

Huang, L., Wang, W., Chen, J., Wei, X.,
Attention on Attention for Image Captioning,
ICCV19(4633-4642)
IEEE DOI 2004
Code, Captioning.
WWW Link. decoding, encoding, image processing, natural language processing, element-wise multiplication, image captioning, weighted average, Testing BibRef

Yao, T., Pan, Y., Li, Y., Mei, T.,
Hierarchy Parsing for Image Captioning,
ICCV19(2621-2629)
IEEE DOI 2004
convolutional neural nets, feature extraction, image coding, image representation, image segmentation, Image segmentation BibRef

Liu, L., Tang, J., Wan, X., Guo, Z.,
Generating Diverse and Descriptive Image Captions Using Visual Paraphrases,
ICCV19(4239-4248)
IEEE DOI 2004
image classification, learning (artificial intelligence), Machine learning BibRef

Ke, L., Pei, W., Li, R., Shen, X., Tai, Y.,
Reflective Decoding Network for Image Captioning,
ICCV19(8887-8896)
IEEE DOI 2004
decoding, encoding, feature extraction, learning (artificial intelligence), Random access memory BibRef

Vered, G., Oren, G., Atzmon, Y., Chechik, G.,
Joint Optimization for Cooperative Image Captioning,
ICCV19(8897-8906)
IEEE DOI 2004
gradient methods, image sampling, natural language processing, stochastic programming, text analysis, Loss measurement BibRef

Ge, H., Yan, Z., Zhang, K., Zhao, M., Sun, L.,
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style,
ICCV19(1754-1763)
IEEE DOI 2004
cognition, computational linguistics, learning (artificial intelligence), Cognition BibRef

Agrawal, H., Desai, K., Wang, Y., Chen, X., Jain, R., Johnson, M., Batra, D., Parikh, D., Lee, S., Anderson, P.,
nocaps: novel object captioning at scale,
ICCV19(8947-8956)
IEEE DOI 2004
feature extraction, learning (artificial intelligence), object detection, Vegetation BibRef

Nguyen, A., Tran, Q.D., Do, T., Reid, I., Caldwell, D.G., Tsagarakis, N.G.,
Object Captioning and Retrieval with Natural Language,
ACVR19(2584-2592)
IEEE DOI 2004
convolutional neural nets, image retrieval, learning (artificial intelligence), vision and language BibRef

Gu, J., Joty, S., Cai, J., Zhao, H., Yang, X., Wang, G.,
Unpaired Image Captioning via Scene Graph Alignments,
ICCV19(10322-10331)
IEEE DOI 2004
graph theory, image representation, image retrieval, natural language processing, text analysis, Encoding BibRef

Shen, T., Kar, A., Fidler, S.,
Learning to Caption Images Through a Lifetime by Asking Questions,
ICCV19(10392-10401)
IEEE DOI 2004
image retrieval, multi-agent systems, natural language processing, Automobiles BibRef

Aneja, J.[Jyoti], Agrawal, H.[Harsh], Batra, D.[Dhruv], Schwing, A.G.[Alexander G.],
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning,
ICCV19(4260-4269)
IEEE DOI 2004
image retrieval, image segmentation, learning (artificial intelligence), recurrent neural nets, Controllability BibRef

Deshpande, A.[Aditya], Aneja, J.[Jyoti], Wang, L.W.[Li-Wei], Schwing, A.G.[Alexander G.], Forsyth, D.A.[David A.],
Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech,
CVPR19(10687-10696).
IEEE DOI 2002
BibRef

Wei, H.Y.[Hai-Yang], Li, Z.X.[Zhi-Xin], Zhang, C.L.[Can-Long],
Image Captioning Based on Visual and Semantic Attention,
MMMod20(I:151-162).
Springer DOI 2003
BibRef

Dognin, P.[Pierre], Melnyk, I.[Igor], Mroueh, Y.[Youssef], Ross, J.[Jerret], Sercu, T.[Tom],
Adversarial Semantic Alignment for Improved Image Captions,
CVPR19(10455-10463).
IEEE DOI 2002
BibRef

Fukui, H.[Hiroshi], Hirakawa, T.[Tsubasa], Yamashita, T.[Takayoshi], Fujiyoshi, H.[Hironobu],
Attention Branch Network: Learning of Attention Mechanism for Visual Explanation,
CVPR19(10697-10706).
IEEE DOI 2002
BibRef

Biten, A.F.[Ali Furkan], Gomez, L.[Lluis], Rusinol, M.[Marcal], Karatzas, D.[Dimosthenis],
Good News, Everyone! Context Driven Entity-Aware Captioning for News Images,
CVPR19(12458-12467).
IEEE DOI 2002
BibRef

Surís, D.[Dídac], Epstein, D.[Dave], Ji, H.[Heng], Chang, S.F.[Shih-Fu], Vondrick, C.[Carl],
Learning to Learn Words from Visual Scenes,
ECCV20(XXIX: 434-452).
Springer DOI 2010
BibRef

Shuster, K.[Kurt], Humeau, S.[Samuel], Hu, H.[Hexiang], Bordes, A.[Antoine], Weston, J.[Jason],
Engaging Image Captioning via Personality,
CVPR19(12508-12518).
IEEE DOI 2002
BibRef

Feng, Y.[Yang], Ma, L.[Lin], Liu, W.[Wei], Luo, J.B.[Jie-Bo],
Unsupervised Image Captioning,
CVPR19(4120-4129).
IEEE DOI 2002
BibRef

Xu, Y.[Yan], Wu, B.Y.[Bao-Yuan], Shen, F.M.[Fu-Min], Fan, Y.B.[Yan-Bo], Zhang, Y.[Yong], Shen, H.T.[Heng Tao], Liu, W.[Wei],
Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables,
CVPR19(4130-4139).
IEEE DOI 2002
BibRef

Wang, Q.Z.[Qing-Zhong], Chan, A.B.[Antoni B.],
Describing Like Humans: On Diversity in Image Captioning,
CVPR19(4190-4198).
IEEE DOI 2002
BibRef

Guo, L.T.[Long-Teng], Liu, J.[Jing], Yao, P.[Peng], Li, J.W.[Jiang-Wei], Lu, H.Q.[Han-Qing],
MSCap: Multi-Style Image Captioning With Unpaired Stylized Text,
CVPR19(4199-4208).
IEEE DOI 2002
BibRef

Zhang, L.[Lu], Zhang, J.M.[Jian-Ming], Lin, Z.[Zhe], Lu, H.C.[Hu-Chuan], He, Y.[You],
CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection,
CVPR19(6017-6026).
IEEE DOI 2002
BibRef

Yin, G.J.[Guo-Jun], Sheng, L.[Lu], Liu, B.[Bin], Yu, N.H.[Neng-Hai], Wang, X.G.[Xiao-Gang], Shao, J.[Jing],
Context and Attribute Grounded Dense Captioning,
CVPR19(6234-6243).
IEEE DOI 2002
BibRef

Gao, J.L.[Jun-Long], Wang, S.Q.[Shi-Qi], Wang, S.S.[Shan-She], Ma, S.W.[Si-Wei], Gao, W.[Wen],
Self-Critical N-Step Training for Image Captioning,
CVPR19(6293-6301).
IEEE DOI 2002
BibRef

Qin, Y.[Yu], Du, J.J.[Jia-Jun], Zhang, Y.H.[Yong-Hua], Lu, H.T.[Hong-Tao],
Look Back and Predict Forward in Image Captioning,
CVPR19(8359-8367).
IEEE DOI 2002
BibRef

Zheng, Y.[Yue], Li, Y.[Yali], Wang, S.J.[Sheng-Jin],
Intention Oriented Image Captions With Guiding Objects,
CVPR19(8387-8396).
IEEE DOI 2002
BibRef

Huang, Y., Li, C., Li, T., Wan, W., Chen, J.,
Image Captioning with Attribute Refinement,
ICIP19(1820-1824)
IEEE DOI 1910
Image captioning, attribute recognition, Semantic attention, Deep Neural Network, Conditional Random Field BibRef

Lee, J., Lee, Y., Seong, S., Kim, K., Kim, S., Kim, J.,
Capturing Long-Range Dependencies in Video Captioning,
ICIP19(1880-1884)
IEEE DOI 1910
Video captioning, non-local block, long short-term memory, long-range dependency, video representation BibRef

Shi, J., Li, Y., Wang, S.,
Cascade Attention: Multiple Feature Based Learning for Image Captioning,
ICIP19(1970-1974)
IEEE DOI 1910
Image Captioning, Attention Mechanism, Cascade Attention BibRef

Wang, Y., Shen, Y., Xiong, H., Lin, W.,
Adaptive Hard Example Mining for Image Captioning,
ICIP19(3342-3346)
IEEE DOI 1910
Reinforcement Learning, Image Captioning BibRef

Xiao, H., Shi, J.,
A Novel Attribute Selection Mechanism for Video Captioning,
ICIP19(619-623)
IEEE DOI 1910
Attributes, Video captioning, Attention, Reinforcement learning BibRef

Lim, J.H., Chan, C.S.,
Mask Captioning Network,
ICIP19(1-5)
IEEE DOI 1910
Image captioning, Deep learning, Scene understanding BibRef

Wang, Q.Z.[Qing-Zhong], Chan, A.B.[Antoni B.],
Gated Hierarchical Attention for Image Captioning,
ACCV18(IV:21-37).
Springer DOI 1906
BibRef

Wang, W.X.[Wei-Xuan], Chen, Z.H.[Zhi-Hong], Hu, H.F.[Hai-Feng],
Multivariate Attention Network for Image Captioning,
ACCV18(VI:587-602).
Springer DOI 1906
BibRef

Ghanimifard, M.[Mehdi], Dobnik, S.[Simon],
Knowing When to Look for What and Where: Evaluating Generation of Spatial Descriptions with Adaptive Attention,
VL18(IV:153-161).
Springer DOI 1905

See also Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning. BibRef

Kim, B.[Boeun], Lee, Y.H.[Young Han], Jung, H.[Hyedong], Cho, C.[Choongsang],
Distinctive-Attribute Extraction for Image Captioning,
VL18(IV:133-144).
Springer DOI 1905
BibRef

Tanti, M.[Marc], Gatt, A.[Albert], Muscat, A.[Adrian],
Pre-gen Metrics: Predicting Caption Quality Metrics Without Generating Captions,
VL18(IV:114-123).
Springer DOI 1905
BibRef

Tanti, M.[Marc], Gatt, A.[Albert], Camilleri, K.P.[Kenneth P.],
Quantifying the Amount of Visual Information Used by Neural Caption Generators,
VL18(IV:124-132).
Springer DOI 1905
BibRef

Ren, L., Qi, G., Hua, K.,
Improving Diversity of Image Captioning Through Variational Autoencoders and Adversarial Learning,
WACV19(263-272)
IEEE DOI 1904
image classification, image coding, image segmentation, learning (artificial intelligence), Maximum likelihood estimation BibRef

Zhou, Y., Sun, Y., Honavar, V.,
Improving Image Captioning by Leveraging Knowledge Graphs,
WACV19(283-293)
IEEE DOI 1904
graph theory, image capture, image retrieval, performance measure, image captioning systems, knowledge graphs, Generators BibRef

Lu, J.S.[Jia-Sen], Yang, J.W.[Jian-Wei], Batra, D.[Dhruv], Parikh, D.[Devi],
Neural Baby Talk,
CVPR18(7219-7228)
IEEE DOI 1812
Detectors, Visualization, Grounding, Pediatrics, Natural languages, Dogs, Task analysis BibRef

Khademi, M., Schulte, O.,
Image Caption Generation with Hierarchical Contextual Visual Spatial Attention,
Cognitive18(2024-20248)
IEEE DOI 1812
Feature extraction, Visualization, Logic gates, Computer architecture, Task analysis, Context modeling, Computational modeling BibRef

Yan, S., Wu, F., Smith, J.S., Lu, W., Zhang, B.,
Image Captioning using Adversarial Networks and Reinforcement Learning,
ICPR18(248-253)
IEEE DOI 1812
Generators, Generative adversarial networks, Monte Carlo methods, Maximum likelihood estimation, Task analysis BibRef

Wang, F., Gong, X., Huang, L.,
Time-Dependent Pre-attention Model for Image Captioning,
ICPR18(3297-3302)
IEEE DOI 1812
Decoding, Task analysis, Semantics, Visualization, Feature extraction, Computational modeling, Computer science BibRef

Luo, R., Shakhnarovich, G., Cohen, S., Price, B.,
Discriminability Objective for Training Descriptive Captions,
CVPR18(6964-6974)
IEEE DOI 1812
Training, Task analysis, Visualization, Measurement, Computational modeling, Generators, Airplanes BibRef

Cui, Y., Yang, G., Veit, A., Huang, X., Belongie, S.,
Learning to Evaluate Image Captioning,
CVPR18(5804-5812)
IEEE DOI 1812
Measurement, Pathology, Training, Correlation, SPICE, Robustness, Task analysis BibRef

Aneja, J., Deshpande, A., Schwing, A.G.,
Convolutional Image Captioning,
CVPR18(5561-5570)
IEEE DOI 1812
Training, Computer architecture, Task analysis, Hidden Markov models, Microprocessors, Computational modeling, Indexing BibRef

Chen, F., Ji, R., Sun, X., Wu, Y., Su, J.,
GroupCap: Group-Based Image Captioning with Structured Relevance and Diversity Constraints,
CVPR18(1345-1353)
IEEE DOI 1812
Visualization, Correlation, Semantics, Feature extraction, Training, Adaptation models, Task analysis BibRef

Chen, X., Ma, L., Jiang, W., Yao, J., Liu, W.,
Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present,
CVPR18(7995-8003)
IEEE DOI 1812
Pattern recognition BibRef

Yao, T.[Ting], Pan, Y.W.[Ying-Wei], Li, Y.[Yehao], Mei, T.[Tao],
Exploring Visual Relationship for Image Captioning,
ECCV18(XIV: 711-727).
Springer DOI 1810
BibRef

Shah, S.A.A.[Syed Afaq Ali],
NNEval: Neural Network Based Evaluation Metric for Image Captioning,
ECCV18(VIII: 39-55).
Springer DOI 1810
BibRef

Jiang, W.H.[Wen-Hao], Ma, L.[Lin], Jiang, Y.G.[Yu-Gang], Liu, W.[Wei], Zhang, T.[Tong],
Recurrent Fusion Network for Image Captioning,
ECCV18(II: 510-526).
Springer DOI 1810
BibRef

Chatterjee, M.[Moitreya], Schwing, A.G.[Alexander G.],
Diverse and Coherent Paragraph Generation from Images,
ECCV18(II: 747-763).
Springer DOI 1810
BibRef

Chen, S.[Shi], Zhao, Q.[Qi],
Boosted Attention: Leveraging Human Attention for Image Captioning,
ECCV18(XI: 72-88).
Springer DOI 1810
BibRef

Dai, B.[Bo], Ye, D.[Deming], Lin, D.[Dahua],
Rethinking the Form of Latent States in Image Captioning,
ECCV18(VI: 294-310).
Springer DOI 1810
BibRef

Liu, X.H.[Xi-Hui], Li, H.S.[Hong-Sheng], Shao, J.[Jing], Chen, D.P.[Da-Peng], Wang, X.G.[Xiao-Gang],
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data,
ECCV18(XV: 353-369).
Springer DOI 1810
BibRef

Fang, F., Wang, H., Tang, P.,
Image Captioning with Word Level Attention,
ICIP18(1278-1282)
IEEE DOI 1809
Visualization, Feature extraction, Task analysis, Training, Recurrent neural networks, Semantics, Computational modeling, bidirectional spatial embedding BibRef

Zhu, Z., Xue, Z., Yuan, Z.,
Topic-Guided Attention for Image Captioning,
ICIP18(2615-2619)
IEEE DOI 1809
Visualization, Semantics, Feature extraction, Training, Decoding, Generators, Measurement, Image captioning, Attention, Topic, Attribute, Deep Neural Network BibRef

Gomez-Garay, A.[Alejandro], Raducanu, B.[Bogdan], Salas, J.[Joaquín],
Dense Captioning of Natural Scenes in Spanish,
MCPR18(145-154).
Springer DOI 1807
BibRef

Yao, L.[Li], Ballas, N.[Nicolas], Cho, K.[Kyunghyun], Smith, J.[John], Bengio, Y.[Yoshua],
Oracle Performance for Visual Captioning,
BMVC16(xx-yy).
HTML Version. 1805
BibRef

Dong, H.[Hao], Zhang, J.Q.[Jing-Qing], McIlwraith, D.[Douglas], Guo, Y.[Yike],
I2T2I: Learning text to image synthesis with textual data augmentation,
ICIP17(2015-2019)
IEEE DOI 1803
Birds, Generators, Image generation, Recurrent neural networks, Shape, Training, Deep learning, GAN, Image Synthesis BibRef

Jia, Y.H.[Yu-Hua], Bai, L.[Liang], Wang, P.[Peng], Guo, J.L.[Jin-Lin], Xie, Y.X.[Yu-Xiang],
Deep Convolutional Neural Network for Correlating Images and Sentences,
MMMod18(I:154-165).
Springer DOI 1802
BibRef

Liu, J.Y.[Jing-Yu], Wang, L.[Liang], Yang, M.H.[Ming-Hsuan],
Referring Expression Generation and Comprehension via Attributes,
ICCV17(4866-4874)
IEEE DOI 1802
Language Descriptions for objects. learning (artificial intelligence), object detection, RefCOCO, RefCOCO+, RefCOCOg, attribute learning model, common space model, Visualization BibRef

Dai, B., Fidler, S., Urtasun, R., Lin, D.,
Towards Diverse and Natural Image Descriptions via a Conditional GAN,
ICCV17(2989-2998)
IEEE DOI 1802
image retrieval, image sequences, inference mechanisms, learning (artificial intelligence), Visualization BibRef

Liang, X., Hu, Z., Zhang, H., Gan, C., Xing, E.P.,
Recurrent Topic-Transition GAN for Visual Paragraph Generation,
ICCV17(3382-3391)
IEEE DOI 1802
document image processing, inference mechanisms, natural scenes, recurrent neural nets, text analysis, RTT-GAN, Visualization BibRef

Shetty, R., Rohrbach, M., Hendricks, L.A., Fritz, M., Schiele, B.,
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training,
ICCV17(4155-4164)
IEEE DOI 1802
image matching, learning (artificial intelligence), sampling methods, vocabulary, adversarial training, Visualization BibRef

Liu, S., Zhu, Z., Ye, N., Guadarrama, S., Murphy, K.,
Improved Image Captioning via Policy Gradient optimization of SPIDEr,
ICCV17(873-881)
IEEE DOI 1802
Maximum likelihood estimation, Measurement, Mixers, Robustness, SPICE, Training BibRef

Gu, J.X.[Jiu-Xiang], Joty, S.[Shafiq], Cai, J.F.[Jian-Fei], Wang, G.[Gang],
Unpaired Image Captioning by Language Pivoting,
ECCV18(I: 519-535).
Springer DOI 1810
BibRef

Gu, J.X.[Jiu-Xiang], Wang, G.[Gang], Cai, J.F.[Jian-Fei], Chen, T.H.[Tsu-Han],
An Empirical Study of Language CNN for Image Captioning,
ICCV17(1231-1240)
IEEE DOI 1802
convolution, learning (artificial intelligence), natural language processing, recurrent neural nets, Recurrent neural networks BibRef

Pedersoli, M., Lucas, T., Schmid, C., Verbeek, J.,
Areas of Attention for Image Captioning,
ICCV17(1251-1259)
IEEE DOI 1802
image segmentation, inference mechanisms, natural language processing, object detection, Visualization BibRef

Zhang, Z., Wu, J.J., Li, Q., Huang, Z., Traer, J., McDermott, J.H., Tenenbaum, J.B., Freeman, W.T.,
Generative Modeling of Audible Shapes for Object Perception,
ICCV17(1260-1269)
IEEE DOI 1802
audio recording, audio signal processing, audio-visual systems, feature extraction, inference mechanisms, interactive systems, Visualization BibRef

Liu, Z.J.[Zhi-Jian], Freeman, W.T.[William T.], Tenenbaum, J.B.[Joshua B.], Wu, J.J.[Jia-Jun],
Physical Primitive Decomposition,
ECCV18(XII: 3-20).
Springer DOI 1810
BibRef

Wu, J.J.[Jia-Jun], Lim, J.[Joseph], Zhang, H.Y.[Hong-Yi], Tenenbaum, J.B.[Joshua B.], Freeman, W.T.[William T.],
Physics 101: Learning Physical Object Properties from Unlabeled Videos,
BMVC16(xx-yy).
HTML Version. 1805
BibRef

Tavakoliy, H.R., Shetty, R., Borji, A., Laaksonen, J.,
Paying Attention to Descriptions Generated by Image Captioning Models,
ICCV17(2506-2515)
IEEE DOI 1802
feature extraction, image processing, human descriptions, human-written descriptions, image captioning model, Visualization BibRef

Krause, J.[Jonathan], Johnson, J.[Justin], Krishna, R.[Ranjay], Fei-Fei, L.[Li],
A Hierarchical Approach for Generating Descriptive Image Paragraphs,
CVPR17(3337-3345)
IEEE DOI 1711
Feature extraction, Natural languages, Pragmatics, Recurrent neural networks, Speech, Visualization BibRef

Vedantam, R., Bengio, S., Murphy, K., Parikh, D., Chechik, G.,
Context-Aware Captions from Context-Agnostic Supervision,
CVPR17(1070-1079)
IEEE DOI 1711
Birds, Cats, Cognition, Context modeling, Pragmatics, Training BibRef

Gan, Z., Gan, C., He, X., Pu, Y., Tran, K., Gao, J., Carin, L., Deng, L.,
Semantic Compositional Networks for Visual Captioning,
CVPR17(1141-1150)
IEEE DOI 1711
Feature extraction, Mouth, Pediatrics, Semantics, Tensile stress, Training, Visualization BibRef

Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.J.,
Deep Reinforcement Learning-Based Image Captioning with Embedding Reward,
CVPR17(1151-1159)
IEEE DOI 1711
Decision making, Learning (artificial intelligence), Measurement, Neural networks, Training, Visualization BibRef

Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.,
Self-Critical Sequence Training for Image Captioning,
CVPR17(1179-1195)
IEEE DOI 1711
Inference algorithms, Learning (artificial intelligence), Logic gates, Measurement, Predictive models, Training BibRef

Yang, L., Tang, K., Yang, J., Li, L.J.,
Dense Captioning with Joint Inference and Visual Context,
CVPR17(1978-1987)
IEEE DOI 1711
Bioinformatics, Genomics, Object detection, Proposals, Semantics, Training, Visualization BibRef

Lu, J., Xiong, C., Parikh, D., Socher, R.,
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning,
CVPR17(3242-3250)
IEEE DOI 1711
Adaptation models, Computational modeling, Context modeling, Decoding, Logic gates, Mathematical model, Visualization BibRef

Yao, T., Pan, Y., Li, Y., Mei, T.,
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects,
CVPR17(5263-5271)
IEEE DOI 1711
Decoding, Hidden Markov models, Object recognition, Recurrent neural networks, Standards, Training, Visualization BibRef

Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S.,
SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning,
CVPR17(6298-6306)
IEEE DOI 1711
Detectors, Feature extraction, Image coding, Neural networks, Semantics, Visualization BibRef

Sun, Q., Lee, S., Batra, D.,
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning,
CVPR17(7215-7223)
IEEE DOI 1711
Approximation algorithms, Computational modeling, Decoding, History, Inference algorithms, Recurrent, neural, networks BibRef

Wang, Y., Lin, Z., Shen, X., Cohen, S., Cottrell, G.W.,
Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition,
CVPR17(7378-7387)
IEEE DOI 1711
Measurement, Recurrent neural networks, SPICE, Semantics, Skeleton, Training BibRef

Zanfir, M.[Mihai], Marinoiu, E.[Elisabeta], Sminchisescu, C.[Cristian],
Spatio-Temporal Attention Models for Grounded Video Captioning,
ACCV16(IV: 104-119).
Springer DOI 1704
BibRef

Chen, T.H.[Tseng-Hung], Zeng, K.H.[Kuo-Hao], Hsu, W.T.[Wan-Ting], Sun, M.[Min],
Video Captioning via Sentence Augmentation and Spatio-Temporal Attention,
Assist16(I: 269-286).
Springer DOI 1704
BibRef

Weiland, L.[Lydia], Hulpus, I.[Ioana], Ponzetto, S.P.[Simone Paolo], Dietz, L.[Laura],
Using Object Detection, NLP, and Knowledge Bases to Understand the Message of Images,
MMMod17(II: 405-418).
Springer DOI 1701
BibRef

Liu, Y.[Yu], Guo, Y.M.[Yan-Ming], Lew, M.S.[Michael S.],
What Convnets Make for Image Captioning?,
MMMod17(I: 416-428).
Springer DOI 1701
BibRef

Tran, K., He, X., Zhang, L., Sun, J.,
Rich Image Captioning in the Wild,
DeepLearn-C16(434-441)
IEEE DOI 1612
BibRef

Wang, Y.L.[Yi-Lin], Wang, S.H.[Su-Hang], Tang, J.L.[Ji-Liang], Liu, H.[Huan], Li, B.X.[Bao-Xin],
PPP: Joint Pointwise and Pairwise Image Label Prediction,
CVPR16(6005-6013)
IEEE DOI 1612
BibRef

Yatskar, M.[Mark], Ordonez, V., Zettlemoyer, L.[Luke], Farhadi, A.[Ali],
Commonly Uncommon: Semantic Sparsity in Situation Recognition,
CVPR17(6335-6344)
IEEE DOI 1711
BibRef
Earlier: A1, A3, A4, Only:
Situation Recognition: Visual Semantic Role Labeling for Image Understanding,
CVPR16(5534-5542)
IEEE DOI 1612
Image recognition, Image representation, Predictive models, Semantics, Tensile stress, Training BibRef

Sadhu, A.[Arka], Gupta, T.[Tanmay], Yatskar, M.[Mark], Nevatia, R.[Ram], Kembhavi, A.[Aniruddha],
Visual Semantic Role Labeling for Video Understanding,
CVPR21(5585-5596)
IEEE DOI 2111
Visualization, Annotations, Semantics, Benchmark testing, Motion pictures BibRef

Kottur, S.[Satwik], Vedantam, R.[Ramakrishna], Moura, J.M.F.[José M. F.], Parikh, D.[Devi],
VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes,
CVPR16(4985-4994)
IEEE DOI 1612
BibRef

Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.,
Visual7W: Grounded Question Answering in Images,
CVPR16(4995-5004)
IEEE DOI 1612
BibRef

Zhang, P., Goyal, Y., Summers-Stay, D., Batra, D., Parikh, D.,
Yin and Yang: Balancing and Answering Binary Visual Questions,
CVPR16(5014-5022)
IEEE DOI 1612
BibRef

Park, D.H., Darrell, T.J., Rohrbach, A.,
Robust Change Captioning,
ICCV19(4623-4632)
IEEE DOI 2004
feature extraction, learning (artificial intelligence), natural language processing, object-oriented programming, Predictive models BibRef

Venugopalan, S.[Subhashini], Hendricks, L.A.[Lisa Anne], Rohrbach, M.[Marcus], Mooney, R.[Raymond], Darrell, T.J.[Trevor J.], Saenko, K.[Kate],
Captioning Images with Diverse Objects,
CVPR17(1170-1178)
IEEE DOI 1711
BibRef
Earlier: A2, A1, A3, A4, A6, A5:
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data,
CVPR16(1-10)
IEEE DOI 1612
Data models, Image recognition, Predictive models, Semantics, Training, Visualization. Novel objects not in training data. BibRef

Johnson, J.[Justin], Karpathy, A.[Andrej], Fei-Fei, L.[Li],
DenseCap: Fully Convolutional Localization Networks for Dense Captioning,
CVPR16(4565-4574)
IEEE DOI 1612
Both localize and describe salient regions in images in natural language. BibRef

Lin, X.[Xiao], Parikh, D.[Devi],
Leveraging Visual Question Answering for Image-Caption Ranking,
ECCV16(II: 261-277).
Springer DOI 1611
BibRef
Earlier:
Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks,
CVPR15(2984-2993)
IEEE DOI 1510
BibRef

Chen, T.L.[Tian-Lang], Zhang, Z.P.[Zhong-Ping], You, Q.Z.[Quan-Zeng], Fang, C.[Chen], Wang, Z.W.[Zhao-Wen], Jin, H.L.[Hai-Lin], Luo, J.B.[Jie-Bo],
'Factual' or 'Emotional': Stylized Image Captioning with Adaptive Learning and Attention,
ECCV18(X: 527-543).
Springer DOI 1810
BibRef

You, Q.Z.[Quan-Zeng], Jin, H.L.[Hai-Lin], Wang, Z.W.[Zhao-Wen], Fang, C.[Chen], Luo, J.B.[Jie-Bo],
Image Captioning with Semantic Attention,
CVPR16(4651-4659)
IEEE DOI 1612
BibRef

Jia, X.[Xu], Gavves, E.[Efstratios], Fernando, B.[Basura], Tuytelaars, T.[Tinne],
Guiding the Long-Short Term Memory Model for Image Caption Generation,
ICCV15(2407-2415)
IEEE DOI 1602
Computer architecture BibRef

Chen, X.L.[Xin-Lei], Zitnick, C.L.[C. Lawrence],
Mind's eye: A recurrent visual representation for image caption generation,
CVPR15(2422-2431)
IEEE DOI 1510
BibRef

Vedantam, R.[Ramakrishna], Zitnick, C.L.[C. Lawrence], Parikh, D.[Devi],
CIDEr: Consensus-based image description evaluation,
CVPR15(4566-4575)
IEEE DOI 1510
BibRef

Fang, H.[Hao], Gupta, S.[Saurabh], Iandola, F.[Forrest], Srivastava, R.K.[Rupesh K.], Deng, L.[Li], Dollar, P.[Piotr], Gao, J.F.[Jian-Feng], He, X.D.[Xiao-Dong], Mitchell, M.[Margaret], Platt, J.C.[John C.], Zitnick, C.L.[C. Lawrence], Zweig, G.[Geoffrey],
From captions to visual concepts and back,
CVPR15(1473-1482)
IEEE DOI 1510
BibRef

Ramnath, K.[Krishnan], Baker, S.[Simon], Vanderwende, L.[Lucy], El-Saban, M.[Motaz], Sinha, S.N.[Sudipta N.], Kannan, A.[Anitha], Hassan, N.[Noran], Galley, M.[Michel], Yang, Y.[Yi], Ramanan, D.[Deva], Bergamo, A.[Alessandro], Torresani, L.[Lorenzo],
AutoCaption: Automatic caption generation for personal photos,
WACV14(1050-1057)
IEEE DOI 1406
Clouds BibRef

Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Image Annotation .


Last update:Nov 26, 2024 at 16:40:19