Yu, J.,
Li, J.,
Yu, Z.,
Huang, Q.,
Multimodal Transformer With Multi-View Visual Representation for
Image Captioning,
CirSysVideo(30), No. 12, December 2020, pp. 4467-4480.
IEEE DOI
2012
Visualization, Feature extraction, Hidden Markov models,
Adaptation models, Task analysis, Decoding, Computational modeling,
deep learning
BibRef
Zhang, Y.[Yu],
Shi, X.Y.[Xin-Yu],
Mi, S.[Siya],
Yang, X.[Xu],
Image captioning with transformer and knowledge graph,
PRL(143), 2021, pp. 43-49.
Elsevier DOI
2102
Image captioning, Transformer, Knowledge graph
BibRef
Yan, C.G.[Cheng-Gang],
Hao, Y.M.[Yi-Ming],
Li, L.[Liang],
Yin, J.[Jian],
Liu, A.[Anan],
Mao, Z.[Zhendong],
Chen, Z.Y.[Zhen-Yu],
Gao, X.Y.[Xing-Yu],
Task-Adaptive Attention for Image Captioning,
CirSysVideo(32), No. 1, January 2022, pp. 43-51.
IEEE DOI
2201
Task analysis, Visualization, Feature extraction, Decoding,
Computational modeling, Adaptation models, Feeds, Image captioning,
transformer
BibRef
Ren, Z.H.[Zi-Hao],
Gou, S.P.[Shui-Ping],
Guo, Z.[Zhang],
Mao, S.S.[Sha-Sha],
Li, R.M.[Rui-Min],
A Mask-Guided Transformer Network with Topic Token for Remote Sensing
Image Captioning,
RS(14), No. 12, 2022, pp. xx-yy.
DOI Link
2206
BibRef
Ji, J.Y.[Jia-Yi],
Ma, Y.W.[Yi-Wei],
Sun, X.S.[Xiao-Shuai],
Zhou, Y.[Yiyi],
Wu, Y.J.[Yong-Jian],
Ji, R.R.[Rong-Rong],
Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image
Captioning,
IP(31), 2022, pp. 4321-4335.
IEEE DOI
2207
Integrated circuit modeling, Visualization, Training,
Task analysis, Measurement, Transformers, Computational modeling,
Effective CIDEr
BibRef
Li, X.[Xuan],
Zhang, W.K.[Wen-Kai],
Sun, X.[Xian],
Gao, X.[Xin],
Semantic-meshed and content-guided transformer for image captioning,
IET-CV(16), No. 5, 2022, pp. 431-444.
DOI Link
2207
computer vision, image annotation, natural language processing
BibRef
Xian, T.T.[Tian-Tao],
Li, Z.X.[Zhi-Xin],
Tang, Z.J.[Zhen-Jun],
Ma, H.F.[Hui-Fang],
Adaptive Path Selection for Dynamic Image Captioning,
CirSysVideo(32), No. 9, September 2022, pp. 5762-5775.
IEEE DOI
2209
Visualization, Feature extraction, Transformers, Semantics,
Computational modeling, Adaptation models, Computer architecture,
dynamic routing mechanism
BibRef
Cao, S.[Shan],
An, G.[Gaoyun],
Zheng, Z.X.[Zhen-Xing],
Wang, Z.Y.[Zhi-Yong],
Vision-Enhanced and Consensus-Aware Transformer for Image Captioning,
CirSysVideo(32), No. 10, October 2022, pp. 7005-7018.
IEEE DOI
2210
Transformers, Visualization, Decoding, Semantics, Task analysis,
Convolution, Visual perception, Image captioning,
consensus knowledge
BibRef
Jiang, W.T.[Wei-Tao],
Zhou, W.[Wei],
Hu, H.F.[Hai-Feng],
Double-Stream Position Learning Transformer Network for Image
Captioning,
CirSysVideo(32), No. 11, November 2022, pp. 7706-7718.
IEEE DOI
2211
Transformers, Feature extraction, Visualization, Decoding,
Convolutional neural networks, Task analysis, Semantics, attention mechanism
BibRef
Hu, J.T.[Jun-Tao],
Yang, Y.[You],
Yao, L.[Lu],
An, Y.Z.[Yong-Zhi],
Pan, L.[Longyue],
Position-guided transformer for image captioning,
IVC(128), 2022, pp. 104575.
Elsevier DOI
2212
Image captioning, Bi-positional attention, Position encoding,
Group normalization, Transformer, Self-attention
BibRef
Wang, Z.G.[Zhon-Gan],
Shi, S.[Shuai],
Zhai, Z.R.[Zi-Rong],
Wu, Y.[Yingna],
Yang, R.[Rui],
ArCo: Attention-reinforced transformer with contrastive learning for
image captioning,
IVC(128), 2022, pp. 104570.
Elsevier DOI
2212
Image captioning, Visual attention, Transformer, Contrastive learning
BibRef
Li, Z.X.[Zhi-Xin],
Wei, J.[Jiahui],
Huang, F.C.[Fei-Cheng],
Ma, H.F.[Hui-Fang],
Modeling graph-structured contexts for image captioning,
IVC(129), 2023, pp. 104591.
Elsevier DOI
2301
Image captioning, Transformer, Scene graph,
Reinforcement learning, Attention mechanism
BibRef
Zhang, J.[Jing],
Xie, Y.S.[Ying-Shuai],
Ding, W.C.[Wei-Chao],
Wang, Z.[Zhe],
Cross on Cross Attention: Deep Fusion Transformer for Image
Captioning,
CirSysVideo(33), No. 8, August 2023, pp. 4257-4268.
IEEE DOI
2308
Visualization, Decoding, Feature extraction, Semantics, Transformers,
Encoding, Cognition, Image captioning, deep fusion transformer,
cross on cross attention
BibRef
Lim, J.H.[Jian Han],
Chan, C.S.[Chee Seng],
Mask-guided network for image captioning,
PRL(173), 2023, pp. 79-86.
Elsevier DOI
2310
Image captioning, Deep learning, Scene understanding, Mask RCNN, Transformer
BibRef
Li, Z.X.[Zhi-Xin],
Su, Q.[Qiang],
Chen, T.Y.[Tian-Yu],
External knowledge-assisted Transformer for image captioning,
IVC(140), 2023, pp. 104864.
Elsevier DOI
2312
Image captioning, Knowledge reasoning, Object relation, Visual Transformer
BibRef
Chen, J.Q.[Jing-Qiang],
Transform, contrast and tell:
Coherent entity-aware multi-image captioning,
CVIU(238), 2024, pp. 103878.
Elsevier DOI
2312
Entity-aware image captioning, Coherence mechanisms,
Transformer, Contrastive learning
BibRef
Lou, L.S.[Liang-Shan],
Lu, K.[Ke],
Xue, J.[Jian],
Improved Transformer with Parallel Encoders for Image Captioning,
ICPR22(4072-4075)
IEEE DOI
2212
Measurement, Fuses, Transformers, Decoding, Task analysis
BibRef
Wang, Y.H.[Ye-Huan],
Shang, L.[Lin],
Generating Spatial-aware Captions for TextCaps,
ICPR22(379-385)
IEEE DOI
2212
Visualization, Analytical models, Head, Optical character recognition,
Transformer cores, Transformers
BibRef
Feng, Y.[Yuhu],
Maeda, K.[Keisuke],
Ogawa, T.[Takahiro],
Haseyama, M.[Miki],
Human-Centric Image Retrieval with Gaze-Based Image Captioning,
ICIP22(3828-3832)
IEEE DOI
2211
Image retrieval, Semantics, Focusing, Transformers, Gaze trace,
transformer, human-centric, cross-modal retrieval, image captioning
BibRef
Yang, X.[Xin],
Wang, Y.[Ying],
Chen, H.[Haishun],
Li, J.[Jie],
CSTNET: Enhancing Global-To-Local Interactions for Image Captioning,
ICIP22(1861-1865)
IEEE DOI
2211
Neural networks, Transformers, Task analysis, Context modeling,
Image captioning, Gate mechanism, Vision transformer, Deep Neural Network
BibRef
Nguyen, V.Q.[Van-Quang],
Suganuma, M.[Masanori],
Okatani, T.[Takayuki],
GRIT: Faster and Better Image Captioning Transformer Using Dual Visual
Features,
ECCV22(XXXVI:167-184).
Springer DOI
2211
BibRef
Vo, D.M.[Duc Minh],
Chen, H.[Hong],
Sugimoto, A.[Akihiro],
Nakayama, H.[Hideki],
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from
External Knowledge,
CVPR22(17979-17987)
IEEE DOI
2210
Training, Vocabulary, Pipelines, Training data, Object detection,
Transformers, Vision + language
BibRef
Yuan, Z.H.[Zhi-Hao],
Yan, X.[Xu],
Liao, Y.H.[Ying-Hong],
Guo, Y.[Yao],
Li, G.B.[Guan-Bin],
Cui, S.G.[Shu-Guang],
Li, Z.[Zhen],
X-Trans2Cap:
Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning,
CVPR22(8553-8563)
IEEE DOI
2210
Point cloud compression, Training, Visualization,
Natural languages, Network architecture, Transformers, Visual reasoning
BibRef
Liu, B.[Bing],
Wang, D.[Dong],
Yang, X.[Xu],
Zhou, Y.[Yong],
Yao, R.[Rui],
Shao, Z.W.[Zhi-Wen],
Zhao, J.Q.[Jia-Qi],
Show, Deconfound and Tell: Image Captioning with Causal Inference,
CVPR22(18020-18029)
IEEE DOI
2210
Training, Visualization, Correlation, Object detection, Linguistics,
Transformers, Encoding, Vision + language, Computer vision theory,
Visual reasoning
BibRef
Fang, Z.Y.[Zhi-Yuan],
Wang, J.F.[Jian-Feng],
Hu, X.W.[Xiao-Wei],
Liang, L.[Lin],
Gan, Z.[Zhe],
Wang, L.J.[Li-Juan],
Yang, Y.Z.[Ye-Zhou],
Liu, Z.C.[Zi-Cheng],
Injecting Semantic Concepts into End-to-End Image Captioning,
CVPR22(17988-17998)
IEEE DOI
2210
Training, Computational modeling, Semantics, Computer architecture,
Feature extraction, Transformers, Market research,
Vision applications and systems
BibRef
Li, Y.[Yehao],
Pan, Y.W.[Ying-Wei],
Yao, T.[Ting],
Mei, T.[Tao],
Comprehending and Ordering Semantics for Image Captioning,
CVPR22(17969-17978)
IEEE DOI
2210
Visualization, Codes, Semantics, Computer architecture, Linguistics,
Transformers, Vision + language
BibRef
Fei, Z.C.[Zheng-Cong],
Yan, X.[Xu],
Wang, S.H.[Shu-Hui],
Tian, Q.[Qi],
DeeCap: Dynamic Early Exiting for Efficient Image Captioning,
CVPR22(12206-12216)
IEEE DOI
2210
Learning systems, Computational modeling, Semantics, Merging,
Predictive models, Transformers, Decoding,
Vision + language
BibRef
Wu, M.R.[Ming-Rui],
Zhang, X.Y.[Xu-Ying],
Sun, X.S.[Xiao-Shuai],
Zhou, Y.[Yiyi],
Chen, C.[Chao],
Gu, J.X.[Jia-Xin],
Sun, X.[Xing],
Ji, R.R.[Rong-Rong],
DIFNet: Boosting Visual Information Flow for Image Captioning,
CVPR22(17999-18008)
IEEE DOI
2210
Integrated circuits, Visualization, Image segmentation,
Feature extraction, Boosting, Transformers, Decoding, Vision + language
BibRef
Rio-Torto, I.[Isabel],
Cardoso, J.S.[Jaime S.],
Teixeira, L.F.[Luís F.],
From Captions to Explanations: A Multimodal Transformer-based
Architecture for Natural Language Explanation Generation,
IbPRIA22(54-65).
Springer DOI
2205
BibRef
Chen, H.S.[Hai-Shun],
Wang, Y.[Ying],
Yang, X.[Xin],
Li, J.[Jie],
Captioning Transformer With Scene Graph Guiding,
ICIP21(2538-2542)
IEEE DOI
2201
Measurement, Visualization, Image processing, Semantics,
Neural networks, Decoding, Image captioning, Scene graph, Attention,
Deep Neural Network
BibRef
Zhang, X.Y.[Xu-Ying],
Sun, X.S.[Xiao-Shuai],
Luo, Y.P.[Yun-Peng],
Ji, J.Y.[Jia-Yi],
Zhou, Y.[Yiyi],
Wu, Y.J.[Yong-Jian],
Huang, F.Y.[Fei-Yue],
Ji, R.R.[Rong-Rong],
RSTNet:
Captioning with Adaptive Attention on Visual and Non-Visual Words,
CVPR21(15460-15469)
IEEE DOI
2111
Geometry, Visualization, Adaptation models, Predictive models,
Transformers, Time measurement, Servers
BibRef
He, S.[Sen],
Liao, W.T.[Wen-Tong],
Tavakoli, H.R.[Hamed R.],
Yang, M.[Michael],
Rosenhahn, B.[Bodo],
Pugeault, N.[Nicolas],
Image Captioning Through Image Transformer,
ACCV20(IV:153-169).
Springer DOI
2103
BibRef
Cornia, M.,
Stefanini, M.,
Baraldi, L.,
Cucchiara, R.,
Meshed-Memory Transformer for Image Captioning,
CVPR20(10575-10584)
IEEE DOI
2008
Decoding, Encoding, Visualization, Image coding,
Computer architecture, Proposals, Task analysis
BibRef
Tran, A.,
Mathews, A.,
Xie, L.,
Transform and Tell: Entity-Aware News Image Captioning,
CVPR20(13032-13042)
IEEE DOI
2008
Decoding, Vocabulary, Transforms, Linguistics, Performance gain,
Neural networks, Training
BibRef
Li, G.,
Zhu, L.,
Liu, P.,
Yang, Y.,
Entangled Transformer for Image Captioning,
ICCV19(8927-8936)
IEEE DOI
2004
image retrieval, learning (artificial intelligence),
natural language processing, recurrent neural nets, robot vision, Proposals
BibRef
Chapter on Matching and Recognition Using Volumes, High Level Vision Techniques, Invariants continues in
Semantic Correspondence, Semantic Alignment .