13.6.11 Transformer for Captioning, Image Captioning

Chapter Contents (Back)
Image Captioning. Captioning. Transformers.

Yu, J., Li, J., Yu, Z., Huang, Q.,
Multimodal Transformer With Multi-View Visual Representation for Image Captioning,
CirSysVideo(30), No. 12, December 2020, pp. 4467-4480.
IEEE DOI 2012
Visualization, Feature extraction, Hidden Markov models, Adaptation models, Task analysis, Decoding, Computational modeling, deep learning BibRef

Zhang, Y.[Yu], Shi, X.Y.[Xin-Yu], Mi, S.[Siya], Yang, X.[Xu],
Image captioning with transformer and knowledge graph,
PRL(143), 2021, pp. 43-49.
Elsevier DOI 2102
Image captioning, Transformer, Knowledge graph BibRef

Yan, C.G.[Cheng-Gang], Hao, Y.M.[Yi-Ming], Li, L.[Liang], Yin, J.[Jian], Liu, A.[Anan], Mao, Z.[Zhendong], Chen, Z.Y.[Zhen-Yu], Gao, X.Y.[Xing-Yu],
Task-Adaptive Attention for Image Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 43-51.
IEEE DOI 2201
Task analysis, Visualization, Feature extraction, Decoding, Computational modeling, Adaptation models, Feeds, Image captioning, transformer BibRef

Yuan, J.[Jin], Zhu, S.[Shuai], Huang, S.Y.[Shu-Yin], Zhang, H.W.[Han-Wang], Xiao, Y.Q.[Yao-Qiang], Li, Z.Y.[Zhi-Yong], Wang, M.[Meng],
Discriminative Style Learning for Cross-Domain Image Captioning,
IP(31), 2022, pp. 1723-1736.
IEEE DOI 2202
Decoding, Visualization, Syntactics, Semantics, Training, Logic gates, Birds, Cross-domain, image captioning, style, instruction-based LSTM BibRef

Zhou, Y.[Yuanen], Zhang, Y.[Yong], Hu, Z.Z.[Zhen-Zhen], Wang, M.[Meng],
Semi-Autoregressive Transformer for Image Captioning,
CLVL21(3132-3136)
IEEE DOI 2112
Training, Degradation, Codes, Benchmark testing, Transformers BibRef

Ren, Z.H.[Zi-Hao], Gou, S.P.[Shui-Ping], Guo, Z.[Zhang], Mao, S.S.[Sha-Sha], Li, R.M.[Rui-Min],
A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning,
RS(14), No. 12, 2022, pp. xx-yy.
DOI Link 2206
BibRef

Ji, J.Y.[Jia-Yi], Ma, Y.[Yiwei], Sun, X.S.[Xiao-Shuai], Zhou, Y.[Yiyi], Wu, Y.J.[Yong-Jian], Ji, R.R.[Rong-Rong],
Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning,
IP(31), 2022, pp. 4321-4335.
IEEE DOI 2207
Integrated circuit modeling, Visualization, Training, Task analysis, Measurement, Transformers, Computational modeling, Effective CIDEr BibRef

Li, X.[Xuan], Zhang, W.K.[Wen-Kai], Sun, X.[Xian], Gao, X.[Xin],
Semantic-meshed and content-guided transformer for image captioning,
IET-CV(16), No. 5, 2022, pp. 431-444.
DOI Link 2207
computer vision, image annotation, natural language processing BibRef

Xian, T.T.[Tian-Tao], Li, Z.X.[Zhi-Xin], Tang, Z.J.[Zhen-Jun], Ma, H.F.[Hui-Fang],
Adaptive Path Selection for Dynamic Image Captioning,
CirSysVideo(32), No. 9, September 2022, pp. 5762-5775.
IEEE DOI 2209
Visualization, Feature extraction, Transformers, Semantics, Computational modeling, Adaptation models, Computer architecture, dynamic routing mechanism BibRef


Vo, D.M.[Duc Minh], Chen, H.[Hong], Sugimoto, A.[Akihiro], Nakayama, H.[Hideki],
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge,
CVPR22(17979-17987)
IEEE DOI 2210
Training, Vocabulary, Pipelines, Training data, Object detection, Transformers, Vision + language BibRef

Yuan, Z.H.[Zhi-Hao], Yan, X.[Xu], Liao, Y.H.[Ying-Hong], Guo, Y.[Yao], Li, G.B.[Guan-Bin], Cui, S.G.[Shu-Guang], Li, Z.[Zhen],
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning,
CVPR22(8553-8563)
IEEE DOI 2210
Point cloud compression, Training, Visualization, Natural languages, Network architecture, Transformers, Visual reasoning BibRef

Liu, B.[Bing], Wang, D.[Dong], Yang, X.[Xu], Zhou, Y.[Yong], Yao, R.[Rui], Shao, Z.W.[Zhi-Wen], Zhao, J.Q.[Jia-Qi],
Show, Deconfound and Tell: Image Captioning with Causal Inference,
CVPR22(18020-18029)
IEEE DOI 2210
Training, Visualization, Correlation, Object detection, Linguistics, Transformers, Encoding, Vision + language, Computer vision theory, Visual reasoning BibRef

Fang, Z.Y.[Zhi-Yuan], Wang, J.F.[Jian-Feng], Hu, X.W.[Xiao-Wei], Liang, L.[Lin], Gan, Z.[Zhe], Wang, L.J.[Li-Juan], Yang, Y.Z.[Ye-Zhou], Liu, Z.C.[Zi-Cheng],
Injecting Semantic Concepts into End-to-End Image Captioning,
CVPR22(17988-17998)
IEEE DOI 2210
Training, Computational modeling, Semantics, Computer architecture, Feature extraction, Transformers, Market research, Vision applications and systems BibRef

Li, Y.[Yehao], Pan, Y.[Yingwei], Yao, T.[Ting], Mei, T.[Tao],
Comprehending and Ordering Semantics for Image Captioning,
CVPR22(17969-17978)
IEEE DOI 2210
Visualization, Codes, Semantics, Computer architecture, Linguistics, Transformers, Vision + language BibRef

Hu, X.W.[Xiao-Wei], Gan, Z.[Zhe], Wang, J.F.[Jian-Feng], Yang, Z.Y.[Zheng-Yuan], Liu, Z.C.[Zi-Cheng], Lu, Y.[Yumao], Wang, L.J.[Li-Juan],
Scaling Up Vision-Language Pretraining for Image Captioning,
CVPR22(17959-17968)
IEEE DOI 2210
Training, Visualization, Computational modeling, Training data, Benchmark testing, Transformers, Feature extraction, Vision + language BibRef

Fei, Z.C.[Zheng-Cong], Yan, X.[Xu], Wang, S.H.[Shu-Hui], Tian, Q.[Qi],
DeeCap: Dynamic Early Exiting for Efficient Image Captioning,
CVPR22(12206-12216)
IEEE DOI 2210
Learning systems, Computational modeling, Semantics, Merging, Predictive models, Transformers, Decoding, Vision + language BibRef

Wu, M.R.[Ming-Rui], Zhang, X.Y.[Xu-Ying], Sun, X.S.[Xiao-Shuai], Zhou, Y.[Yiyi], Chen, C.[Chao], Gu, J.X.[Jia-Xin], Sun, X.[Xing], Ji, R.R.[Rong-Rong],
DIFNet: Boosting Visual Information Flow for Image Captioning,
CVPR22(17999-18008)
IEEE DOI 2210
Integrated circuits, Visualization, Image segmentation, Feature extraction, Boosting, Transformers, Decoding, Vision + language BibRef

Rio-Torto, I.[Isabel], Cardoso, J.S.[Jaime S.], Teixeira, L.F.[Luís F.],
From Captions to Explanations: A Multimodal Transformer-based Architecture for Natural Language Explanation Generation,
IbPRIA22(54-65).
Springer DOI 2205
BibRef

Chen, H.S.[Hai-Shun], Wang, Y.[Ying], Yang, X.[Xin], Li, J.[Jie],
Captioning Transformer With Scene Graph Guiding,
ICIP21(2538-2542)
IEEE DOI 2201
Measurement, Visualization, Image processing, Semantics, Neural networks, Decoding, Image captioning, Scene graph, Attention, Deep Neural Network BibRef

Zhang, P.C.[Peng-Chuan], Li, X.J.[Xiu-Jun], Hu, X.W.[Xiao-Wei], Yang, J.W.[Jian-Wei], Zhang, L.[Lei], Wang, L.J.[Li-Juan], Choi, Y.J.[Ye-Jin], Gao, J.F.[Jian-Feng],
VinVL: Revisiting Visual Representations in Vision-Language Models,
CVPR21(5575-5584)
IEEE DOI 2111
Training, Visualization, Computational modeling, Object detection, Benchmark testing, Feature extraction, Transformers BibRef

Zhang, X.Y.[Xu-Ying], Sun, X.S.[Xiao-Shuai], Luo, Y.P.[Yun-Peng], Ji, J.Y.[Jia-Yi], Zhou, Y.[Yiyi], Wu, Y.J.[Yong-Jian], Huang, F.Y.[Fei-Yue], Ji, R.R.[Rong-Rong],
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words,
CVPR21(15460-15469)
IEEE DOI 2111
Geometry, Visualization, Adaptation models, Predictive models, Transformers, Time measurement, Servers BibRef

He, S.[Sen], Liao, W.T.[Wen-Tong], Tavakoli, H.R.[Hamed R.], Yang, M.[Michael], Rosenhahn, B.[Bodo], Pugeault, N.[Nicolas],
Image Captioning Through Image Transformer,
ACCV20(IV:153-169).
Springer DOI 2103
BibRef

Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.,
Meshed-Memory Transformer for Image Captioning,
CVPR20(10575-10584)
IEEE DOI 2008
Decoding, Encoding, Visualization, Image coding, Computer architecture, Proposals, Task analysis BibRef

Li, G., Zhu, L., Liu, P., Yang, Y.,
Entangled Transformer for Image Captioning,
ICCV19(8927-8936)
IEEE DOI 2004
image retrieval, learning (artificial intelligence), natural language processing, recurrent neural nets, robot vision, Proposals BibRef

Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Semantic Correspondence, Semantic Alignment .


Last update:May 22, 2023 at 22:32:27