Agrawal, A.[Aishwarya],
Lu, J.[Jiasen],
Antol, S.[Stanislaw],
Mitchell, M.[Margaret],
Zitnick, C.L.[C. Lawrence],
Parikh, D.[Devi],
Batra, D.[Dhruv],
VQA: Visual Question Answering,
IJCV(123), No. 1, May 2017, pp. 4-31.
Springer DOI
1705
BibRef
Lioutas, V.[Vasileios],
Passalis, N.[Nikolaos],
Tefas, A.[Anastasios],
Explicit ensemble attention learning for improving visual question
answering,
PRL(111), 2018, pp. 51-57.
Elsevier DOI
1808
Visual question answering, Explicit attention, Pictorial superiority effect
BibRef
Garg, S.[Shivam],
Srivastava, R.[Rajeev],
Object sequences: encoding categorical and spatial information for a
yes/no visual question answering task,
IET-CV(12), No. 8, December 2018, pp. 1141-1150.
DOI Link
1812
BibRef
Goyal, Y.[Yash],
Khot, T.[Tejas],
Agrawal, A.[Aishwarya],
Summers-Stay, D.[Douglas],
Batra, D.[Dhruv],
Parikh, D.[Devi],
Making the V in VQA Matter: Elevating the Role of Image Understanding
in Visual Question Answering,
IJCV(127), No. 4, April 2019, pp. 398-414.
Springer DOI
1903
BibRef
Earlier: A1, A2, A4, A5, A6, Only:
CVPR17(6325-6334)
IEEE DOI
1711
Benchmark testing, Data collection, Data models,
Knowledge discovery, Protocols, Visualization
BibRef
Fang, Z.W.[Zhi-Wei],
Liu, J.[Jing],
Li, Y.[Yong],
Qiao, Y.Y.[Yan-Yuan],
Lu, H.Q.[Han-Qing],
Improving visual question answering using dropout and enhanced
question encoder,
PR(90), 2019, pp. 404-414.
Elsevier DOI
1903
Visual question answering, Coherent dropout, Siamese dropout,
Enhanced question encoder
BibRef
Osman, A.[Ahmed],
Samek, W.[Wojciech],
DRAU: Dual Recurrent Attention Units for Visual Question Answering,
CVIU(185), 2019, pp. 24-30.
Elsevier DOI
1906
Visual Question Answering, Attention Mechanisms,
Multi-modal Learning, Machine Vision, Natural Language Processing
BibRef
Toor, A.S.[Andeep S.],
Wechsler, H.[Harry],
Nappi, M.[Michele],
Biometric surveillance using visual question answering,
PRL(126), 2019, pp. 111-118.
Elsevier DOI
1909
Biometrics, Forensics, Visual question answering,
Question relevance, Surveillance, Deep learning, Visual turing test
BibRef
Li, W.W.[Wen-Wen],
Song, M.M.[Miao-Miao],
Tian, Y.Y.[Yuan-Yuan],
An Ontology-Driven Cyberinfrastructure for Intelligent Spatiotemporal
Question Answering and Open Knowledge Discovery,
IJGI(8), No. 11, 2019, pp. xx-yy.
DOI Link
1912
BibRef
Xi, Y.L.[Yu-Ling],
Zhang, Y.N.[Yan-Ning],
Ding, S.T.[Song-Tao],
Wan, S.H.[Shao-Hua],
Visual Question Answering Model Based on Visual Relationship
Detection,
SP:IC(80), 2020, pp. 115648.
Elsevier DOI
1912
Visual question answering, Appearance features,
Relationship predicate, Word vector similarity
BibRef
Wu, Y.,
Jiang, L.,
Yang, Y.,
Revisiting EmbodiedQA: A Simple Baseline and Beyond,
IP(29), 2020, pp. 3984-3992.
IEEE DOI
2002
Embodied question answering, vision and language, visual question answering
BibRef
Huang, C.R.[Chao-Ran],
Yao, L.[Lina],
Wang, X.Z.[Xian-Zhi],
Benatallah, B.[Boualem],
Zhang, X.[Xiang],
Software expert discovery via knowledge domain embeddings in a
collaborative network,
PRL(130), 2020, pp. 46-53.
Elsevier DOI
2002
Knowledge discovery, Stack overflow, Expertise finding,
Question answering, Expert as a Service
BibRef
Li, W.[Wei],
Sun, J.H.[Jian-Hui],
Liu, G.[Ge],
Zhao, L.[Linglan],
Fang, X.Z.[Xiang-Zhong],
Visual question answering with attention transfer and a cross-modal
gating mechanism,
PRL(133), 2020, pp. 334-340.
Elsevier DOI
2005
Attention, Visual question answering, Gating
BibRef
Messina, N.[Nicola],
Amato, G.[Giuseppe],
Carrara, F.[Fabio],
Falchi, F.[Fabrizio],
Gennaro, C.[Claudio],
Learning visual features for relational CBIR,
MultInfoRetr(9), No. 2, June 2020, pp. 113-124.
Springer DOI
2005
BibRef
Earlier:
Learning Relationship-Aware Visual Features,
CEFR-LCV18(IV:486-501).
Springer DOI
1905
BibRef
Methani, N.,
Ganguly, P.,
Khapra, M.M.,
Kumar, P.,
PlotQA: Reasoning over Scientific Plots,
WACV20(1516-1525)
IEEE DOI
2006
Vocabulary, Cognition, Bars, Numerical models,
Optical character recognition software, Data mining, Image color analysis
BibRef
Yu, J.[Jing],
Zhu, Z.H.[Zi-Hao],
Wang, Y.J.[Yu-Jing],
Zhang, W.F.[Wei-Feng],
Hu, Y.[Yue],
Tan, J.L.[Jian-Long],
Cross-modal knowledge reasoning for knowledge-based visual question
answering,
PR(108), 2020, pp. 107563.
Elsevier DOI
2008
Cross-modal knowledge reasoning, Multimodal knowledge graphs,
Compositional reasoning module, Explainable reasoning
BibRef
Yang, Z.Q.[Zhuo-Qian],
Qin, Z.C.[Zeng-Chang],
Yu, J.[Jing],
Wan, T.[Tao],
Prior Visual Relationship Reasoning For Visual Question Answering,
ICIP20(1411-1415)
IEEE DOI
2011
Visualization, Semantics, Convolution, Cognition,
Knowledge discovery, Benchmark testing, Measurement, VQA,
GCN, Attention Mechanism
BibRef
Farazi, M.R.[Moshiur R.],
Khan, S.H.[Salman H.],
Barnes, N.M.[Nick M.],
From known to the unknown: Transferring knowledge to answer questions
about novel visual and semantic concepts,
IVC(103), 2020, pp. 103985.
Elsevier DOI
2011
Visual Question Answering, Deep learning,
Natural language processing, Dataset bias
BibRef
Terao, K.[Kento],
Tamaki, T.[Toru],
Raytchev, B.[Bisser],
Kaneda, K.[Kazufumi],
Satoh, S.[Shin'ichi],
Rephrasing Visual Questions by Specifying the Entropy of the Answer
Distribution,
IEICE(E103-D), No. 11, November 2020, pp. 2362-2370.
WWW Link.
2011
BibRef
Yu, J.[Jing],
Zhang, W.F.[Wei-Feng],
Lu, Y.H.[Yu-Hang],
Qin, Z.C.[Zeng-Chang],
Hu, Y.[Yue],
Tan, J.L.[Jian-Long],
Wu, Q.[Qi],
Reasoning on the Relation: Enhancing Visual Representation for Visual
Question Answering and Cross-Modal Retrieval,
MultMed(22), No. 12, December 2020, pp. 3196-3209.
IEEE DOI
2011
Visualization, Cognition, Task analysis, Knowledge discovery,
Semantics, Correlation, Information retrieval,
cross-modal information retrieval
BibRef
Lobry, S.,
Marcos, D.,
Murray, J.,
Tuia, D.,
RSVQA: Visual Question Answering for Remote Sensing Data,
GeoRS(58), No. 12, December 2020, pp. 8555-8566.
IEEE DOI
2012
Remote sensing, Task analysis, Visualization, Data models,
Feature extraction, Knowledge discovery,
visual question answering (VQA)
BibRef
Faure, M.[Maxime],
Lobry, S.[Sylvain],
Kurtz, C.[Camille],
Wendling, L.[Laurent],
Embedding Spatial Relations in Visual Question Answering for Remote
Sensing,
ICPR22(310-316)
IEEE DOI
2212
Training, Visualization, Histograms, Feature extraction,
Question answering (information retrieval), Spatial databases.
BibRef
Chappuis, C.[Christel],
Zermatten, V.[Valérie],
Lobry, S.[Sylvain],
Le Saux, B.[Bertrand],
Tuia, D.[Devis],
Prompt-RSVQA: Prompting visual context to a language model for Remote
Sensing Visual Question Answering,
EarthVision22(1371-1380)
IEEE DOI
2210
Training, Visualization, Natural languages, Feature extraction,
Transformers, Question answering (information retrieval), Data mining
BibRef
Sun, B.[Bo],
Yao, Z.[Zeng],
Zhang, Y.H.[Ying-Hui],
Yu, L.J.[Le-Jun],
Local relation network with multilevel attention for visual question
answering,
JVCIR(73), 2020, pp. 102762.
Elsevier DOI
2012
Visual question answering, Relation network, Attention mechanism
BibRef
Li, X.,
Yuan, A.,
Lu, X.,
Vision-to-Language Tasks Based on Attributes and Attention Mechanism,
Cyber(51), No. 2, February 2021, pp. 913-926.
IEEE DOI
2101
Semantics, Task analysis, Visualization, Cats, Natural languages,
Knowledge discovery, Feature extraction, Deep learning,
visual question answering (VQA)
BibRef
Shao, Y.[Yinan],
Lin, J.C.W.[Jerry Chun-Wei],
Srivastava, G.[Gautam],
Jolfaei, A.[Alireza],
Guo, D.D.[Dong-Dong],
Hu, Y.[Yi],
Self-attention-based conditional random fields latent variables model
for sequence labeling,
PRL(145), 2021, pp. 157-164.
Elsevier DOI
2104
Latent CRF, Sequence labeling, Encoding schema,
Natural language processing, VQA, Big data
BibRef
Wu, Y.[Yirui],
Ma, Y.T.[Yun-Tao],
Wan, S.H.[Shao-Hua],
Multi-scale relation reasoning for multi-modal Visual Question
Answering,
SP:IC(96), 2021, pp. 116319.
Elsevier DOI
2106
Multi-modal data, Visual Question Answering,
Multi-scale relation reasoning, Attention model
BibRef
Ma, Y.T.[Yun-Tao],
Lu, T.[Tong],
Wu, Y.[Yirui],
Multi-scale Relational Reasoning with Regional Attention for Visual
Question Answering,
ICPR21(5642-5649)
IEEE DOI
2105
Visualization, Neural networks, Knowledge discovery, Cognition,
Robustness, Data mining, Visual question learning, Attention,
Multi-scale relational reasoning
BibRef
dos S-Silva, F.H.[Francisco H.],
Bezerra, G.M.[Gabriel M.],
Holanda, G.B.[Gabriel B.],
de Souza, J.W.M.[J. Wellington M.],
Rego, P.A.L.[Paulo A.L.],
Lira Neto, A.V.[Aloísio V.],
de Albuquerque, V.H.C.[Victor Hugo C.],
Rebouças Filho, P.P.[Pedro P.],
A novel feature extractor for human action recognition in visual
question answering,
PRL(147), 2021, pp. 41-47.
Elsevier DOI
2106
BibRef
Guo, W.[Wenya],
Zhang, Y.[Ying],
Yang, J.F.[Ju-Feng],
Yuan, X.J.[Xiao-Jie],
Re-Attention for Visual Question Answering,
IP(30), 2021, pp. 6730-6743.
IEEE DOI
2108
Visualization, Tires, Task analysis, Feature extraction, Training,
Knowledge discovery, Image reconstruction, gating mechanism
BibRef
Hu, J.[Jun],
Qian, S.S.[Sheng-Sheng],
Fang, Q.[Quan],
Xu, C.S.[Chang-Sheng],
Heterogeneous Community Question Answering via Social-Aware
Multi-Modal Co-Attention Convolutional Matching,
MultMed(23), 2021, pp. 2321-2334.
IEEE DOI
2108
Visualization, Semantics, Knowledge discovery, Context modeling,
Portable computers, Task analysis, Object detection, social multimedia
BibRef
Zhang, X.[Xi],
Zhang, F.F.[Fei-Fei],
Xu, C.S.[Chang-Sheng],
NExT-OOD: Overcoming Dual Multiple-Choice VQA Biases,
PAMI(46), No. 4, April 2024, pp. 1913-1931.
IEEE DOI
2403
Visualization, Feature extraction, Benchmark testing, Correlation,
Predictive models, Cognition, Training, Benchmark, bias, multiple -choice VQA
BibRef
Farazi, M.[Moshiur],
Khan, S.[Salman],
Barnes, N.M.[Nick M.],
Accuracy vs. complexity: A trade-off in visual question answering
models,
PR(120), 2021, pp. 108106.
Elsevier DOI
2109
Visual question answering, Visual feature extraction,
Language features, Multi-modal fusion, Speed-accuracy trade-off
BibRef
Barra, S.[Silvio],
Bisogni, C.[Carmen],
de Marsico, M.[Maria],
Ricciardi, S.[Stefano],
Visual question answering: Which investigated applications?,
PRL(151), 2021, pp. 325-331.
Elsevier DOI
2110
Visual question answering, Real-world VQA,
VQA for medical applicatons, VQA for assistive applications,
VQA in cultural heritage and education
BibRef
Manmadhan, S.[Sruthy],
Kovoor, B.C.[Binsu C.],
Multi-Tier Attention Network using Term-weighted Question Features
for Visual Question Answering,
IVC(115), 2021, pp. 104291.
Elsevier DOI
2110
Attention mechanism, Deep learning, Semantic similarity,
Supervised term weighting, Visual Question Answering
BibRef
Liu, A.A.[An-An],
Lu, Z.[Zimu],
Xu, N.[Ning],
Nie, W.Z.[Wei-Zhi],
Li, W.H.[Wen-Hui],
Multi-type decision fusion network for visual Q&A,
IVC(115), 2021, pp. 104281.
Elsevier DOI
2110
Visual question answering, Multi-type question, Scene graph
BibRef
Patro, B.N.[Badri N.],
Kurmi, V.K.[Vinod K.],
Kumar, S.[Sandeep],
Namboodiri, V.P.[Vinay P.],
MUMC: Minimizing uncertainty of mixture of cues,
IVC(115), 2021, pp. 104280.
Elsevier DOI
2110
Uncertainty estimation, Mixture of cues,
Visual Question Answering, Paraphrase, Encoder-decoder
BibRef
Liu, F.[Fei],
Liu, J.[Jing],
Fang, Z.W.[Zhi-Wei],
Hong, R.C.[Ri-Chang],
Lu, H.Q.[Han-Qing],
Visual Question Answering With Dense Inter- and Intra-Modality
Interactions,
MultMed(23), 2021, pp. 3518-3529.
IEEE DOI
2110
Visualization, Knowledge discovery, Connectors, Encoding,
Task analysis, Image coding, Stacking, Visual question answering,
dense interactions
BibRef
Wu, J.J.[Jia-Jia],
Du, J.[Jun],
Wang, F.[Fengren],
Yang, C.[Chen],
Jiang, X.Z.[Xin-Zhe],
Hu, J.[Jinshui],
Yin, B.[Bing],
Zhang, J.S.[Jian-Shu],
Dai, L.R.[Li-Rong],
A multimodal attention fusion network with a dynamic vocabulary for
TextVQA,
PR(122), 2022, pp. 108214.
Elsevier DOI
2112
Dynamic vocabulary, Attention map, Multimodal fusion, ST-VQA
BibRef
Narayanan, A.[Abhishek],
Rao, A.[Abijna],
Prasad, A.[Abhishek],
Natarajan, S.,
VQA as a factoid question answering problem: A novel approach for
knowledge-aware and explainable visual question answering,
IVC(116), 2021, pp. 104328.
Elsevier DOI
2112
Visual question answering, Factoid question answering,
Knowledge based reasoning, Explainable VQA
BibRef
Guo, Y.Y.[Yang-Yang],
Nie, L.Q.[Li-Qiang],
Cheng, Z.Y.[Zhi-Yong],
Tian, Q.[Qi],
Zhang, M.[Min],
Loss Re-Scaling VQA: Revisiting the Language Prior Problem From a
Class-Imbalance View,
IP(31), 2022, pp. 227-238.
IEEE DOI
2112
Visualization, Training, Computational modeling, Benchmark testing,
Predictive models, Cognition, Task analysis,
loss re-scaling
BibRef
Peng, L.[Liang],
Yang, Y.[Yang],
Wang, Z.[Zheng],
Huang, Z.[Zi],
Shen, H.T.[Heng Tao],
MRA-Net: Improving VQA Via Multi-Modal Relation Attention Network,
PAMI(44), No. 1, January 2022, pp. 318-329.
IEEE DOI
2112
Visualization, Feature extraction, Semantics, Knowledge discovery,
Cognition, Task analysis, Natural languages,
relation attention
BibRef
Manogaran, G.[Gunasekaran],
Shakeel, P.M.[P. Mohamed],
Burhanuddin, M.A.,
Baskar, S.,
Saravanan, V.[Vijayalakshmi],
Crespo, R.G.[Rubén González],
Martínez, O.S.[Oscar Sanjuán],
ADCCF: Adaptive deep concatenation coder framework for visual
question answering,
PRL(152), 2021, pp. 348-355.
Elsevier DOI
2112
BibRef
Zhou, Y.[Yiyi],
Ji, R.R.[Rong-Rong],
Sun, X.S.[Xiao-Shuai],
Su, J.S.[Jin-Song],
Meng, D.Y.[De-Yu],
Gao, Y.[Yue],
Shen, C.H.[Chun-Hua],
Plenty is Plague: Fine-Grained Learning for Visual Question Answering,
PAMI(44), No. 2, February 2022, pp. 697-709.
IEEE DOI
2201
Training, Visualization, Knowledge discovery, Redundancy,
Data models, Feature extraction, Training data,
visual question answering
BibRef
E, W.N.[Wei-Nan],
Zhou, Y.J.[Ya-Jun],
A Mathematical Model for Universal Semantics,
PAMI(44), No. 3, March 2022, pp. 1124-1132.
IEEE DOI
2202
Semantics, Numerical models, Pattern analysis, Markov processes,
Statistical analysis, Exponential distribution,
question answering
BibRef
Li, X.P.[Xiang-Peng],
Wu, B.[Bo],
Song, J.K.[Jing-Kuan],
Gao, L.L.[Lian-Li],
Zeng, P.P.[Peng-Peng],
Gan, C.[Chuang],
Text-instance graph: Exploring the relational semantics for
text-based visual question answering,
PR(124), 2022, pp. 108455.
Elsevier DOI
2203
Text-based visual question answering, Spatial overlapping,
Text-Instance graph, Copy mechanism
BibRef
Shao, X.J.[Xiang-Jun],
Xiang, Z.L.[Zheng-Long],
Li, Y.X.[Yuan-Xiang],
Visual question answering with gated relation-aware auxiliary,
IET-IPR(16), No. 5, 2022, pp. 1424-1432.
DOI Link
2203
BibRef
Liu, Y.[Yun],
Zhang, X.M.[Xiao-Ming],
Zhao, Z.Y.[Zhi-Yun],
Zhang, B.[Bo],
Cheng, L.[Lei],
Li, Z.J.[Zhou-Jun],
ALSA: Adversarial Learning of Supervised Attentions for Visual
Question Answering,
Cyber(52), No. 6, June 2022, pp. 4520-4533.
IEEE DOI
2207
Visualization, Correlation, Generators, Feature extraction,
Task analysis, Knowledge discovery, Fuses, Adversarial learning,
visual question answering (VQA)
BibRef
Ouyang, N.L.[Ning-Lin],
Huang, Q.B.[Qing-Bao],
Li, P.J.[Pi-Jian],
Cai, Y.[Yi],
Liu, B.[Bin],
Leung, H.F.[Ho-Fung],
Li, Q.[Qing],
Suppressing Biased Samples for Robust VQA,
MultMed(24), 2022, pp. 3405-3415.
IEEE DOI
2207
Training, Visualization, Training data, Image color analysis, Sports,
Knowledge discovery, Annotations, Visual Question Answering,
Robust VQA
BibRef
Shuang, K.[Kai],
Guo, J.[Jinyu],
Wang, Z.H.[Zi-Han],
Comprehensive-perception dynamic reasoning for visual question
answering,
PR(131), 2022, pp. 108878.
Elsevier DOI
2208
Cross-modal information fusion, Visual question answering,
Comprehensive perception, Relational reasoning
BibRef
Gouthaman, K.V.,
Mittal, A.[Anurag],
On the role of question encoder sequence model in robust visual
question answering,
PR(131), 2022, pp. 108883.
Elsevier DOI
2208
Visual question answering, Out-of-distribution performance,
Gated recurrent unit, Transformer, Graph attention network
BibRef
Chen, C.Q.[Chong-Qing],
Han, D.Z.[De-Zhi],
Chang, C.C.[Chin-Chen],
CAAN: Context-Aware attention network for visual question answering,
PR(132), 2022, pp. 108980.
Elsevier DOI
2209
Visual question answering, Attention mechanism,
Understanding bias, Absolute position, Contextual information
BibRef
Xie, J.Y.[Jia-Yuan],
Fang, W.H.[Wen-Hao],
Cai, Y.[Yi],
Huang, Q.B.[Qing-Bao],
Li, Q.[Qing],
Knowledge-Based Visual Question Generation,
CirSysVideo(32), No. 11, November 2022, pp. 7547-7558.
IEEE DOI
2211
Visualization, Feature extraction, Task analysis,
Knowledge based systems, Knowledge representation, Decoding, multimodal
BibRef
Gao, C.Y.[Chen-Yu],
Zhu, Q.[Qi],
Wang, P.[Peng],
Li, H.[Hui],
Liu, Y.L.[Yu-Liang],
van den Hengel, A.J.[Anton J.],
Wu, Q.[Qi],
Structured Multimodal Attentions for TextVQA,
PAMI(44), No. 12, December 2022, pp. 9603-9614.
IEEE DOI
2212
Optical character recognition software, Cognition, Visualization,
Text recognition, Task analysis, Knowledge discovery, Annotations, transformer
BibRef
Jin, Z.X.[Zan-Xia],
Wu, H.[Heran],
Yang, C.[Chun],
Zhou, F.[Fang],
Qin, J.Y.[Jing-Yan],
Xiao, L.[Lei],
Yin, X.C.[Xu-Cheng],
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question
Answering,
MultMed(25), 2023, pp. 1-12.
IEEE DOI
2301
Optical character recognition software, Semantics, Visualization,
Cognition, Knowledge discovery, Task analysis, Attention mechanism,
visual question answering
BibRef
Beckham, C.[Christopher],
Weiss, M.[Martin],
Golemo, F.[Florian],
Honari, S.[Sina],
Nowrouzezahrai, D.[Derek],
Pal, C.[Christopher],
Visual question answering from another perspective: CLEVR mental
rotation tests,
PR(136), 2023, pp. 109209.
Elsevier DOI
2301
Deep learning, Computer vision, Visual question answering,
Contrastive learning, Clevr
BibRef
Zhang, H.N.[Hao-Nan],
Zeng, P.P.[Peng-Peng],
Hu, Y.X.[Yu-Xuan],
Qian, J.[Jin],
Song, J.K.[Jing-Kuan],
Gao, L.[Lianli],
Learning visual question answering on controlled semantic noisy
labels,
PR(138), 2023, pp. 109339.
Elsevier DOI
2303
Visual question answering, Noisy datasets, Semantic labels, Contrastive learning
BibRef
Zeng, G.[Gangyan],
Zhang, Y.[Yuan],
Zhou, Y.[Yu],
Yang, X.M.[Xiao-Meng],
Jiang, N.[Ning],
Zhao, G.Q.[Guo-Qing],
Wang, W.P.[Wei-Ping],
Yin, X.C.[Xu-Cheng],
Beyond OCR + VQA: Towards end-to-end reading and reasoning for robust
and accurate textvqa,
PR(138), 2023, pp. 109337.
Elsevier DOI
2303
Textvqa, End-to-end, Scene text reading, Scene text reasoning
BibRef
Gao, D.F.[Di-Fei],
Wang, R.P.[Rui-Ping],
Shan, S.G.[Shi-Guang],
Chen, X.L.[Xi-Lin],
CRIC: A VQA Dataset for Compositional Reasoning on Vision and
Commonsense,
PAMI(45), No. 5, May 2023, pp. 5561-5578.
IEEE DOI
2304
Visualization, Task analysis, Tail, Head, Annotations, Magnetic heads,
Mouth, Visual question answering, compositional reasoning,
dataset construction
BibRef
Xu, F.Z.[Fang-Zhi],
Lin, Q.[Qika],
Liu, J.[Jun],
Zhang, L.L.[Ling-Ling],
Zhao, T.Z.[Tian-Zhe],
Chai, Q.[Qi],
Pan, Y.[Yudai],
Huang, Y.[Yi],
Wang, Q.[Qianying],
MoCA: Incorporating domain pretraining and cross attention for
textbook question answering,
PR(140), 2023, pp. 109588.
Elsevier DOI
2305
Textbook question answering, Multimodal, Pretraining, Attention
BibRef
Li, P.[Pengju],
Tan, Z.[Zhiyi],
Bao, B.K.[Bing-Kun],
Multiview Language Bias Reduction for Visual Question Answering,
MultMedMag(30), No. 1, January 2023, pp. 91-99.
IEEE DOI
2305
Visualization, Training, Image color analysis, Predictive models,
Task analysis, Visualization, inter-question type bias
BibRef
Li, H.M.[Hui-Min],
Han, D.Z.[De-Zhi],
Chen, C.Q.[Chong-Qing],
Chang, C.C.[Chin-Chen],
Li, K.C.[Kuan-Ching],
Li, D.[Dun],
A Visual Question Answering Network Merging High- and Low-Level
Semantic Information,
IEICE(E106-D), No. 5, May 2023, pp. 581-589.
WWW Link.
2305
BibRef
Liu, B.[Bo],
Zhan, L.M.[Li-Ming],
Xu, L.[Li],
Wu, X.M.[Xiao-Ming],
Medical Visual Question Answering via Conditional Reasoning and
Contrastive Learning,
MedImg(42), No. 5, May 2023, pp. 1532-1545.
IEEE DOI
2305
Task analysis, Feature extraction, Visualization, Cognition,
Question answering (information retrieval), Training, Radiology,
contrastive learning
BibRef
Wu, J.M.[Jin-Meng],
Ge, F.[Fulin],
Hong, H.Y.[Han-Yu],
Shi, Y.[Yu],
Hao, Y.B.[Yan-Bin],
Ma, L.[Lei],
Question-aware dynamic scene graph of local semantic representation
learning for visual question answering,
PRL(170), 2023, pp. 93-99.
Elsevier DOI
2306
Interactive semantic representation, Dynamic scene graph,
Local feature detection, Attention mechanism
BibRef
Li, H.[Hao],
Huang, J.[Jinfa],
Jin, P.[Peng],
Song, G.[Guoli],
Wu, Q.[Qi],
Chen, J.[Jie],
Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question
Answering,
IP(32), 2023, pp. 3367-3382.
IEEE DOI
2307
Three-dimensional displays, Cognition, Solid modeling,
Visualization, Optical character recognition, Task analysis, transformer
BibRef
Li, Z.Y.[Zhen-Yang],
Guo, Y.Y.[Yang-Yang],
Wang, K.[Kejie],
Wei, Y.W.[Yin-Wei],
Nie, L.Q.[Li-Qiang],
Kankanhalli, M.[Mohan],
Joint Answering and Explanation for Visual Commonsense Reasoning,
IP(32), 2023, pp. 3836-3846.
IEEE DOI
2307
Video recording, Visualization,
Question answering (information retrieval), Task analysis,
knowledge distillation
BibRef
Yang, X.F.[Xiao-Feng],
Lv, F.[Fengmao],
Liu, F.[Fayao],
Lin, G.S.[Guo-Sheng],
Self-Training Vision Language BERTs With a Unified Conditional Model,
CirSysVideo(33), No. 8, August 2023, pp. 3560-3569.
IEEE DOI
2308
Data models, Task analysis, Bidirectional control, Training, Predictive models,
Bit error rate, Visualization, semi-supervised learning
BibRef
Chen, L.[Long],
Zheng, Y.H.[Yu-Hang],
Niu, Y.[Yulei],
Zhang, H.W.[Han-Wang],
Xiao, J.[Jun],
Counterfactual Samples Synthesizing and Training for Robust Visual
Question Answering,
PAMI(45), No. 11, November 2023, pp. 13218-13234.
IEEE DOI
2310
BibRef
Chen, L.[Long],
Yan, X.,
Xiao, J.[Jun],
Zhang, H.W.[Han-Wang],
Pu, S.,
Zhuang, Y.,
Counterfactual Samples Synthesizing for Robust Visual Question
Answering,
CVPR20(10797-10806)
IEEE DOI
2008
Training, Cascading style sheets, Predictive models, Visualization,
Image color analysis, Linguistics, Computational modeling
BibRef
Wang, B.Y.[Bo-Yue],
Ma, Y.J.[Yu-Jian],
Li, X.Y.[Xiao-Yan],
Liu, H.[Heng],
Hu, Y.L.[Yong-Li],
Yin, B.C.[Bao-Cai],
DSGEM: Dual scene graph enhancement module-based visual question
answering,
IET-CV(17), No. 6, 2023, pp. 638-651.
DOI Link
2310
image representation, question answering (information retrieval)
BibRef
Bi, Y.D.[Yan-Dong],
Jiang, H.[Huajie],
Zhang, H.[Hanfu],
Hu, Y.L.[Yong-Li],
Yin, B.C.[Bao-Cai],
Self-supervised knowledge distillation in counterfactual learning for
VQA,
PRL(177), 2024, pp. 33-39.
Elsevier DOI
2401
Visual question answering, Counterfactual learning,
Self-supervised learning, Language bias
BibRef
Tan, S.[Sinan],
Ge, M.M.[Meng-Meng],
Guo, D.[Di],
Liu, H.P.[Hua-Ping],
Sun, F.C.[Fu-Chun],
Knowledge-Based Embodied Question Answering,
PAMI(45), No. 10, October 2023, pp. 11948-11960.
IEEE DOI
2310
BibRef
Tan, S.[Sinan],
Xiang, W.L.[Wei-Lai],
Liu, H.P.[Hua-Ping],
Guo, D.[Di],
Sun, F.C.[Fu-Chun],
Multi-agent Embodied Question Answering in Interactive Environments,
ECCV20(XIII:663-678).
Springer DOI
2011
BibRef
Mohamud, S.A.M.[Safaa Abdullahi Moallim],
Jalali, A.[Amin],
Lee, M.H.[Min-Ho],
Encoder-decoder cycle for visual question answering based on
perception-action cycle,
PR(144), 2023, pp. 109848.
Elsevier DOI
2310
Visual question answering, Vision language tasks,
Multi-modality fusion, Attention, Bilinear fusion, Brain-inspired frameworks
BibRef
Tito, R.[Rubèn],
Karatzas, D.[Dimosthenis],
Valveny, E.[Ernest],
Hierarchical multimodal transformers for Multipage DocVQA,
PR(144), 2023, pp. 109834.
Elsevier DOI
2310
Multipage document Visual Question Answering,
Document Visual Question Answering, Multipage documents, Document Intelligence
BibRef
Wang, Y.X.[Ya-Xian],
Wei, B.[Bifan],
Liu, J.[Jun],
Zhang, L.L.[Ling-Ling],
Wang, J.X.[Jia-Xin],
Wang, Q.Y.[Qian-Ying],
DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram
Question Answering,
IP(32), 2023, pp. 4812-4827.
IEEE DOI
2310
BibRef
Han, Y.D.[Yu-Dong],
Yin, J.H.[Jian-Hua],
Wu, J.L.[Jian-Long],
Wei, Y.W.[Yin-Wei],
Nie, L.Q.[Li-Qiang],
Semantic-Aware Modular Capsule Routing for Visual Question Answering,
IP(32), 2023, pp. 5537-5549.
IEEE DOI
2310
BibRef
Qian, T.W.[Tian-Wen],
Chen, J.J.[Jing-Jing],
Chen, S.X.[Shao-Xiang],
Wu, B.[Bo],
Jiang, Y.G.[Yu-Gang],
Scene Graph Refinement Network for Visual Question Answering,
MultMed(25), 2023, pp. 3950-3961.
IEEE DOI
2310
BibRef
Qin, B.S.[Bo-Sheng],
Hu, H.J.[Hao-Ji],
Zhuang, Y.T.[Yue-Ting],
Deep Residual Weight-Sharing Attention Network With Low-Rank
Attention for Visual Question Answering,
MultMed(25), 2023, pp. 4282-4295.
IEEE DOI Code:
WWW Link.
2310
BibRef
Zhou, S.[Sheng],
Guo, D.[Dan],
Li, J.[Jia],
Yang, X.[Xun],
Wang, M.[Meng],
Exploring Sparse Spatial Relation in Graph Inference for Text-Based
VQA,
IP(32), 2023, pp. 5060-5074.
IEEE DOI
2310
BibRef
Biswas, K.[Kunal],
Shivakumara, P.[Palaiahnakote],
Pal, U.[Umapada],
Liu, C.L.[Cheng-Lin],
Lu, Y.[Yue],
VQAPT: A New visual question answering model for personality traits
in social media images,
PRL(175), 2023, pp. 66-73.
Elsevier DOI
2311
Personality trait images, Multimodal concept, Text recognition,
Social media images, Natural language processing, Visual question answering
BibRef
Cho, J.W.[Jae Won],
Argaw, D.M.[Dawit Mureja],
Oh, Y.[Youngtaek],
Kim, D.J.[Dong-Jin],
Kweon, I.S.[In So],
Empirical study on using adapters for debiased Visual Question
Answering,
CVIU(237), 2023, pp. 103842.
Elsevier DOI
2311
Visual Question Answering, Model Robustness, Biased Data, Adapters
BibRef
Cho, J.W.[Jae Won],
Kim, D.J.[Dong-Jin],
Choi, J.[Jinsoo],
Jung, Y.[Yunjae],
Kweon, I.S.[In So],
Dealing with Missing Modalities in the Visual Question
Answer-Difference Prediction Task through Knowledge Distillation,
MULA21(1592-1601)
IEEE DOI
2109
Visualization, Knowledge discovery,
Pattern recognition, Task analysis, Bars
BibRef
Cho, J.W.[Jae Won],
Kim, D.J.[Dong-Jin],
Ryu, H.[Hyeonggon],
Kweon, I.S.[In So],
Generative Bias for Robust Visual Question Answering,
CVPR23(11681-11690)
IEEE DOI
2309
BibRef
Liu, Y.H.[Yu-Hang],
Wei, W.[Wei],
Peng, D.[Daowan],
Mao, X.L.[Xian-Ling],
He, Z.Y.[Zhi-Yong],
Zhou, P.[Pan],
Depth-Aware and Semantic Guided Relational Attention Network for
Visual Question Answering,
MultMed(25), 2023, pp. 5344-5357.
IEEE DOI
2311
BibRef
Mao, A.[Aihua],
Yang, Z.[Zhi],
Lin, K.[Ken],
Xuan, J.[Jun],
Liu, Y.J.[Yong-Jin],
Positional Attention Guided Transformer-Like Architecture for Visual
Question Answering,
MultMed(25), 2023, pp. 6997-7009.
IEEE DOI
2311
BibRef
Sun, H.[Hao],
Wang, S.[Shu],
Zhu, Y.Q.[Yun-Qiang],
Yuan, W.[Wen],
Zou, Z.Q.[Zhi-Qiang],
Question Classification for Intelligent Question Answering:
A Comprehensive Survey,
IJGI(12), No. 10, 2023, pp. 415.
DOI Link
2311
BibRef
Cao, B.W.[Bi-Wei],
Cao, J.X.[Jiu-Xin],
Gui, J.[Jie],
Shen, J.[Jiayun],
Liu, B.[Bo],
He, L.[Lei],
Tang, Y.Y.[Yuan Yan],
Kwok, J.T.Y.[James Tin-Yau],
AlignVE: Visual Entailment Recognition Based on Alignment Relations,
MultMed(25), 2023, pp. 7378-7387.
IEEE DOI
2311
Recognize whether the semantics of a hypothesis text can be inferred
from the given premise image.
BibRef
Mashrur, A.[Akib],
Luo, W.[Wei],
Zaidi, N.A.[Nayyar A.],
Robles-Kelly, A.[Antonio],
Robust visual question answering via semantic cross modal
augmentation,
CVIU(238), 2024, pp. 103862.
Elsevier DOI
2312
Visual question answering, Transformers, Multimodal learning,
Model Robustness, Data augmentation
BibRef
Yu, Z.[Zhou],
Jin, Z.[Zitian],
Yu, J.[Jun],
Xu, M.L.[Ming-Liang],
Wang, H.B.[Hong-Bo],
Fan, J.P.[Jian-Ping],
Bilaterally Slimmable Transformer for Elastic and Efficient Visual
Question Answering,
MultMed(25), 2023, pp. 9543-9556.
IEEE DOI
2312
BibRef
Yao, H.B.[Hai-Bo],
Wang, L.P.[Li-Peng],
Cai, C.T.[Cheng-Tao],
Sun, Y.X.[Yu-Xin],
Zhang, Z.[Zhi],
Luo, Y.K.[Yong-Kang],
Multi-modal spatial relational attention networks for visual question
answering,
IVC(140), 2023, pp. 104840.
Elsevier DOI
2312
Visual question answering, Spatial relation,
Attention mechanism, Pre-training strategy
BibRef
Huang, X.F.[Xiao-Fei],
Gong, H.F.[Hong-Fang],
A Dual-Attention Learning Network With Word and Sentence Embedding
for Medical Visual Question Answering,
MedImg(43), No. 2, February 2024, pp. 832-845.
IEEE DOI
2402
Feature extraction, Visualization, Medical diagnostic imaging,
Data mining, Question answering (information retrieval), visual reasoning
BibRef
Zheng, W.B.[Wen-Bo],
Yan, L.[Lan],
Wang, F.Y.[Fei-Yue],
So Many Heads, So Many Wits: Multimodal Graph Reasoning for
Text-Based Visual Question Answering,
SMCS(54), No. 2, February 2024, pp. 854-865.
IEEE DOI
2402
Visualization, Cognition,
Question answering (information retrieval), Feature extraction,
text-based visual question answering
BibRef
Bi, Y.D.[Yan-Dong],
Jiang, H.[Huajie],
Hu, Y.L.[Yong-Li],
Sun, Y.F.[Yan-Feng],
Yin, B.C.[Bao-Cai],
See and Learn More: Dense Caption-Aware Representation for Visual
Question Answering,
CirSysVideo(34), No. 2, February 2024, pp. 1135-1146.
IEEE DOI
2402
Visualization, Cognition, Question answering (information retrieval),
Feature extraction, cross-modal fusion
BibRef
Song, Y.[Yaguang],
Yang, X.S.[Xiao-Shan],
Wang, Y.[Yaowei],
Xu, C.S.[Chang-Sheng],
Recovering Generalization via Pre-Training-Like Knowledge
Distillation for Out-of-Distribution Visual Question Answering,
MultMed(26), 2024, pp. 837-851.
IEEE DOI
2402
Data models, Training, Task analysis, Training data, Robustness,
Visualization, Question answering (information retrieval),
Knowledge Distillation
BibRef
Wu, S.[Sen],
Zhao, G.[Guoshuai],
Qian, X.M.[Xue-Ming],
Resolving Zero-Shot and Fact-Based Visual Question Answering via
Enhanced Fact Retrieval,
MultMed(26), 2024, pp. 1790-1800.
IEEE DOI
2402
Visualization, Task analysis, Knowledge based systems,
Question answering (information retrieval), Predictive models,
knowledge graph
BibRef
Wen, Z.Q.[Zhi-Quan],
Niu, S.C.[Shuai-Cheng],
Li, G.[Ge],
Wu, Q.Y.[Qing-Yao],
Tan, M.K.[Ming-Kui],
Wu, Q.[Qi],
Test-Time Model Adaptation for Visual Question Answering With
Debiased Self-Supervisions,
MultMed(26), 2024, pp. 2137-2147.
IEEE DOI
2402
Adaptation models, Training, Visualization, Entropy, Task analysis,
Question answering (information retrieval), Data models,
test-time debiased self-supervised
BibRef
Huai, T.Y.[Tian-Yu],
Yang, S.W.[Shu-Wen],
Zhang, J.H.[Jun-Hang],
Zhao, J.B.[Jia-Bao],
He, L.[Liang],
Debiased Visual Question Answering via the perspective of question
types,
PRL(178), 2024, pp. 181-187.
Elsevier DOI
2402
Visual Question Answering, De-biasing, Self-supervised
BibRef
Jiang, J.J.[Jing-Jing],
Liu, Z.Y.[Zi-Yi],
Zheng, N.N.[Nan-Ning],
Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering,
IJCV(132), No. 1, January 2024, pp. 185-207.
Springer DOI
2402
BibRef
Xu, N.[Ning],
Lu, Z.[Zimu],
Tian, H.[Hongshuo],
Kang, R.[Rongbao],
Cao, J.[Jinbo],
Zhang, Y.D.[Yong-Dong],
Liu, A.A.[An-An],
Learning to Supervise Knowledge Retrieval Over a Tree Structure for
Visual Question Answering,
MultMed(26), 2024, pp. 6689-6700.
IEEE DOI
2404
Knowledge based systems, Task analysis, Uncertainty, Visualization,
Knowledge engineering, History, supervised knowledge retrieva
BibRef
Pan, Y.H.[Yong-Hua],
Liu, J.[Jing],
Jin, L.[Lu],
Li, Z.C.[Ze-Chao],
Unbiased Visual Question Answering by Leveraging Instrumental
Variable,
MultMed(26), 2024, pp. 6648-6662.
IEEE DOI
2404
Visualization, Correlation, Instruments, Training, Predictive models,
Color, Generators, Visual question answering, out of distribution
BibRef
Zhang, S.[Siyu],
Chen, Y.[Yeming],
Sun, Y.[Yaoru],
Wang, F.[Fang],
Shi, H.B.[Hai-Bo],
Wang, H.R.[Hao-Ran],
LOIS: Looking Out of Instance Semantics for Visual Question Answering,
MultMed(26), 2024, pp. 6202-6214.
IEEE DOI
2404
Visualization, Semantics, Task analysis, Feature extraction,
Question answering (information retrieval), Cognition, Detectors,
multimodal relation attention
BibRef
Xie, J.Y.[Jia-Yuan],
Cai, Y.[Yi],
Chen, J.L.[Jia-Li],
Xu, R.[Ruohang],
Wang, J.[Jiexin],
Li, Q.[Qing],
Knowledge-Augmented Visual Question Answering With Natural Language
Explanation,
IP(33), 2024, pp. 2652-2664.
IEEE DOI Code:
WWW Link.
2404
Task analysis, Visualization, Feature extraction,
Question answering (information retrieval), Iterative methods,
multimodal
BibRef
Hu, Z.J.[Zhong-Jian],
Yang, P.[Peng],
Jiang, Y.S.[Yuan-Shuang],
Bai, Z.J.[Zi-Jian],
Prompting large language model with context and pre-answer for
knowledge-based VQA,
PR(151), 2024, pp. 110399.
Elsevier DOI
2404
Visual question answering, Large language model,
Knowledge-based VQA, Fine-tuning, In-context learning
BibRef
Wang, Q.[Qunbo],
Liu, J.[Jing],
Wu, W.J.[Wen-Jun],
Coordinating explicit and implicit knowledge for knowledge-based VQA,
PR(151), 2024, pp. 110368.
Elsevier DOI
2404
Pre-trained model, Knowledge-based VQA, Knowledge retrieval
BibRef
Wei, M.[Meng],
Chen, L.[Long],
Ji, W.[Wei],
Yue, X.Y.[Xiao-Yu],
Zimmermann, R.[Roger],
In Defense of Clip-Based Video Relation Detection,
IP(33), 2024, pp. 2759-2769.
IEEE DOI
2404
Context modeling, Visualization, Proposals, Image coding,
Trajectory, Training, hierarchical context modeling
BibRef
Ma, J.[Jie],
Liu, J.[Jun],
Chai, Q.[Qi],
Wang, P.H.[Ping-Hui],
Tao, J.[Jing],
Diagram Perception Networks for Textbook Question Answering via Joint
Optimization,
IJCV(132), No. 5, May 2024, pp. 1578-1591.
Springer DOI
2405
BibRef
Wang, J.[Junjue],
Ma, A.[Ailong],
Chen, Z.H.[Zi-Hang],
Zheng, Z.[Zhuo],
Wan, Y.T.[Yu-Ting],
Zhang, L.P.[Liang-Pei],
Zhong, Y.F.[Yan-Fei],
EarthVQANet: Multi-task visual question answering for remote sensing
image understanding,
PandRS(212), 2024, pp. 422-439.
Elsevier DOI Code:
HTML Version.
2406
Visual question answering, Semantic segmentation,
Multi-modal fusion, Multi-task learning, Knowledge reasoning
BibRef
Uehara, K.[Kohei],
Harada, T.[Tatsuya],
Learning by Asking Questions for Knowledge-Based Novel Object
Recognition,
IJCV(132), No. 6, June 2024, pp. 2290-2309.
Springer DOI
2406
BibRef
Earlier:
K-VQG: Knowledge-aware Visual Question Generation for Common-sense
Acquisition,
WACV23(4390-4398)
IEEE DOI
2302
Recognize novel objects.
Learning systems, Visualization, Knowledge acquisition,
Benchmark testing, Task analysis, visual reasoning)
BibRef
Uehara, K.[Kohei],
Duan, N.[Nan],
Harada, T.[Tatsuya],
Learning to Ask Informative Sub-Questions for Visual Question
Answering,
MULA22(4680-4689)
IEEE DOI
2210
Training, Visualization, Computational modeling,
Reinforcement learning, Predictive models
BibRef
Li, Y.K.[Yi-Kang],
Duan, N.[Nan],
Zhou, B.L.[Bo-Lei],
Chu, X.[Xiao],
Ouyang, W.L.[Wan-Li],
Wang, X.G.[Xiao-Gang],
Zhou, M.[Ming],
Visual Question Generation as Dual Task of Visual Question Answering,
CVPR18(6116-6124)
IEEE DOI
1812
Task analysis, Visualization, Knowledge discovery, Training,
Computational modeling
BibRef
Gao, P.[Peng],
Li, H.S.[Hong-Sheng],
Li, S.[Shuang],
Lu, P.[Pan],
Li, Y.K.[Yi-Kang],
Hoi, S.C.H.[Steven C. H.],
Wang, X.G.[Xiao-Gang],
Question-Guided Hybrid Convolution for Visual Question Answering,
ECCV18(I: 485-501).
Springer DOI
1810
BibRef
Gao, P.[Peng],
Jiang, Z.K.[Zheng-Kai],
You, H.X.[Hao-Xuan],
Lu, P.[Pan],
Hoi, S.C.H.[Steven C. H.],
Wang, X.G.[Xiao-Gang],
Li, H.S.[Hong-Sheng],
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual
Question Answering,
CVPR19(6632-6641).
IEEE DOI
2002
BibRef
Wang, J.[Jialou],
Zhu, M.[Manli],
Li, Y.[Yulei],
Li, H.L.[Hong-Lei],
Yang, L.Z.[Long-Zhi],
Woo, W.L.[Wai Lok],
Detect2Interact: Localizing Object Key Field in Visual Question
Answering with LLMs,
IEEE_Int_Sys(39), No. 3, May 2024, pp. 35-44.
IEEE DOI
2407
Visualization, Semantics, Object detection, Image segmentation,
Task analysis, Computational modeling, Chatbots, Spatial resolution
BibRef
Qian, S.[Shun],
Liu, B.Q.[Bing-Quan],
Sun, C.J.[Cheng-Jie],
Xu, Z.[Zhen],
Ma, L.[Lin],
Wang, B.[Baoxun],
CroMIC-QA: The Cross-Modal Information Complementation Based Question
Answering,
MultMed(26), 2024, pp. 8348-8359.
IEEE DOI
2408
Task analysis, Visualization, Semantics, Crops,
Question answering (information retrieval), Diseases, multi-modal tasks
BibRef
Li, L.J.[Lin-Jun],
Jin, T.[Tao],
Lin, W.[Wang],
Jiang, H.[Hao],
Pan, W.W.[Wen-Wen],
Wang, J.[Jian],
Xiao, S.W.[Shu-Wen],
Xia, Y.[Yan],
Jiang, W.H.[Wei-Hao],
Zhao, Z.[Zhou],
Multi-Granularity Relational Attention Network for Audio-Visual
Question Answering,
CirSysVideo(34), No. 8, August 2024, pp. 7080-7094.
IEEE DOI
2408
Visualization, Question answering (information retrieval),
Labeling, Manuals, Electronic commerce, Task analysis, Cognition,
e-commerce dataset
BibRef
Vosoughi, A.[Ali],
Deng, S.J.[Shi-Jian],
Zhang, S.Y.[Song-Yang],
Tian, Y.[Yapeng],
Xu, C.L.[Chen-Liang],
Luo, J.B.[Jie-Bo],
Cross Modality Bias in Visual Question Answering:
A Causal View With Possible Worlds VQA,
MultMed(26), 2024, pp. 8609-8624.
IEEE DOI
2408
Visualization, Faces, Training, Linguistics, Cultural differences,
Question answering (information retrieval), Cognition,
visual question answering (VQA)
BibRef
Shi, X.X.[Xiang-Xi],
Lee, S.[Stefan],
Benchmarking Out-of-Distribution Detection in Visual Question
Answering,
WACV24(5473-5483)
IEEE DOI
2404
Visualization, Computational modeling, Estimation,
Benchmark testing, Predictive models, Feature extraction,
Vision + language and/or other modalities
BibRef
Venkataraman, S.R.[Sai Raam],
Rao, R.S.[Rishi Sridhar],
Balasubramanian, S.,
Sarma, R.R.[R. Raghunatha],
Vorugunti, C.S.[Chandra Sekhar],
Can you even tell left from right? Presenting a new challenge for VQA,
WACV24(4486-4495)
IEEE DOI
2404
Training, Visualization, Games, Benchmark testing, Cognition,
Question answering (information retrieval), Algorithms,
Vision + language and/or other modalities
BibRef
Sahu, P.P.[Pragya Paramita],
Raut, A.[Abhishek],
Samant, J.S.[Jagdish Singh],
Gorijala, M.[Mahesh],
Lakshminarayanan, V.[Vignesh],
Bhaskar, P.[Pinaki],
POP-VQA: Privacy preserving, On-device, Personalized Visual Question
Answering,
WACV24(8455-8464)
IEEE DOI
2404
Training, Visualization, Privacy, Biological system modeling,
Computational modeling, System performance,
Vision + language and/or other modalities
BibRef
Li, J.P.[Jia-Peng],
Wei, P.[Ping],
Han, W.J.[Wen-Juan],
Fan, L.F.[Li-Feng],
IntentQA: Context-aware Video Intent Reasoning,
ICCV23(11929-11940)
IEEE DOI Code:
WWW Link.
2401
BibRef
Hu, Y.S.[Yu-Shi],
Hua, H.[Hang],
Yang, Z.Y.[Zheng-Yuan],
Shi, W.J.[Wei-Jia],
Smith, N.A.[Noah A.],
Luo, J.B.[Jie-Bo],
PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3,
ICCV23(2951-2963)
IEEE DOI
2401
BibRef
Reichman, B.[Benjamin],
Heck, L.[Larry],
Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual
Question Answering,
CLVL23(2829-2834)
IEEE DOI
2401
BibRef
Naik, N.[Nandita],
Potts, C.[Christopher],
Kreiss, E.[Elisa],
Context-VQA: Towards Context-Aware and Purposeful Visual Question
Answering,
CLVL23(2813-2817)
IEEE DOI
2401
BibRef
Hu, Y.S.[Yu-Shi],
Liu, B.[Benlin],
Kasai, J.[Jungo],
Wang, Y.Z.[Yi-Zhong],
Ostendorf, M.[Mari],
Krishna, R.[Ranjay],
Smith, N.A.[Noah A.],
TIFA: Accurate and Interpretable Text-to-Image Faithfulness
Evaluation with Question Answering,
ICCV23(20349-20360)
IEEE DOI
2401
BibRef
Zhang, Y.W.[Yu-Wei],
Ho, C.H.[Chih-Hui],
Vasconcelos, N.M.[Nuno M.],
Toward Unsupervised Realistic Visual Question Answering,
ICCV23(15567-15578)
IEEE DOI Code:
WWW Link.
2401
BibRef
Liang, K.[Kaiqu],
Albanie, S.[Samuel],
Simple Baselines for Interactive Video Retrieval with Questions and
Answers,
ICCV23(11057-11067)
IEEE DOI Code:
WWW Link.
2401
BibRef
Mensink, T.[Thomas],
Uijlings, J.[Jasper],
Castrejon, L.[Lluis],
Goel, A.[Arushi],
Cadar, F.[Felipe],
Zhou, H.[Howard],
Sha, F.[Fei],
Araujo, A.[André],
Ferrari, V.[Vittorio],
Encyclopedic VQA: Visual questions about detailed properties of
fine-grained categories,
ICCV23(3090-3101)
IEEE DOI
2401
BibRef
Qian, Z.[Zi],
Wang, X.[Xin],
Duan, X.G.[Xu-Guang],
Qin, P.[Pengda],
Li, Y.H.[Yu-Hong],
Zhu, W.W.[Wen-Wu],
Decouple Before Interact: Multi-Modal Prompt Learning for Continual
Visual Question Answering,
ICCV23(2941-2950)
IEEE DOI
2401
BibRef
Xue, D.[Dizhan],
Qian, S.S.[Sheng-Sheng],
Xu, C.S.[Chang-Sheng],
Variational Causal Inference Network for Explanatory Visual Question
Answering,
ICCV23(2515-2525)
IEEE DOI
2401
BibRef
Bruni, P.[Pierfrancesco],
Falcon, A.[Alex],
Radeva, P.[Petia],
Time-aware Circulant Matrices for Question-based Temporal Localization,
CIAP23(II:182-195).
Springer DOI
2312
BibRef
Ferreira, B.C.L.[Bruno Carlos Luís],
Oliveira, H.G.[Hugo Gonçalo],
Silva, C.[Catarina],
Leveraging Question Answering for Domain-Agnostic Information
Extraction,
CIARP23(I:244-256).
Springer DOI
2312
BibRef
Wu, Z.H.[Zi-Heng],
Shu, X.Y.[Xin-Yao],
Yan, S.Y.[Shi-Yang],
Lu, Z.Y.[Zhen-Yu],
FGCVQA: Fine-Grained Cross-Attention for Medical VQA,
ICIP23(975-979)
IEEE DOI Code:
WWW Link.
2312
BibRef
Zhu, H.[He],
Togo, R.[Ren],
Ogawa, T.[Takahiro],
Haseyama, M.[Miki],
Interpretable Visual Question Answering Referring to Outside
Knowledge,
ICIP23(2140-2144)
IEEE DOI
2312
BibRef
Parelli, M.[Maria],
Mallis, D.[Dimitrios],
Diomataris, M.[Markos],
Pitsikalis, V.[Vassilis],
Interpretable Visual Question Answering Via Reasoning Supervision,
ICIP23(2525-2529)
IEEE DOI
2312
BibRef
Hegde, S.[Shamanthak],
Jahagirdar, S.[Soumya],
Gangisetty, S.[Shankar],
Making the V in Text-VQA Matter,
ODRUM23(5580-5588)
IEEE DOI
2309
BibRef
Suo, W.[Wei],
Sun, M.Y.[Meng-Yang],
Liu, W.S.[Wei-Song],
Gao, Y.Q.[Yi-Qi],
Wang, P.[Peng],
Zhang, Y.N.[Yan-Ning],
Wu, Q.[Qi],
S3C: Semi-Supervised VQA Natural Language Explanation via
Self-Critical Learning,
CVPR23(2646-2656)
IEEE DOI
2309
BibRef
Alampalle, C.[Charani],
Hegde, S.[Shamanthak],
Jahagirdar, S.[Soumya],
Gangisetty, S.[Shankar],
Weakly Supervised Visual Question Answer Generation,
ODRUM23(5589-5597)
IEEE DOI
2309
BibRef
Jiang, J.J.[Jing-Jing],
Zheng, N.N.[Nan-Ning],
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource
Visual Question Answering,
CVPR23(24203-24213)
IEEE DOI
2309
BibRef
Wang, Y.[Ying],
Pfeiffer, J.[Jonas],
Carion, N.[Nicolas],
Le Cun, Y.L.[Yann L.],
Kamath, A.[Aishwarya],
Adapting Grounded Visual Question Answering Models to Low Resource
Languages,
MULA23(2596-2605)
IEEE DOI
2309
BibRef
Wang, M.[Min],
Mahjoubfar, A.[Ata],
Joshi, A.[Anupama],
FashionVQA: A Domain-Specific Visual Question Answering System,
CVFAD23(3514-3519)
IEEE DOI
2309
BibRef
Shao, Z.W.[Zhen-Wei],
Yu, Z.[Zhou],
Wang, M.[Meng],
Yu, J.[Jun],
Prompting Large Language Models with Answer Heuristics for
Knowledge-Based Visual Question Answering,
CVPR23(14974-14983)
IEEE DOI
2309
BibRef
Tascon-Morales, S.[Sergio],
Márquez-Neila, P.[Pablo],
Sznitman, R.[Raphael],
Logical Implications for Visual Question Answering Consistency,
CVPR23(6725-6735)
IEEE DOI
2309
BibRef
Chen, S.[Shi],
Zhao, Q.[Qi],
Divide and Conquer: Answering Questions with Object Factorization and
Compositional Reasoning,
CVPR23(6736-6745)
IEEE DOI
2309
BibRef
Guo, J.X.[Jia-Xian],
Li, J.[Junnan],
Li, D.X.[Dong-Xu],
Tiong, A.M.H.[Anthony Meng Huat],
Li, B.Y.[Bo-Yang],
Tao, D.C.[Da-Cheng],
Hoi, S.[Steven],
From Images to Textual Prompts: Zero-shot Visual Question Answering
with Frozen Large Language Models,
CVPR23(10867-10877)
IEEE DOI
2309
BibRef
Basu, A.[Abhipsa],
Addepalli, S.[Sravanti],
Babu, R.V.[R. Venkatesh],
RMLVQA: A Margin Loss Approach For Visual Question Answering with
Language Biases,
CVPR23(11671-11680)
IEEE DOI
2309
BibRef
Li, B.J.[Bing-Jia],
Wang, J.[Jie],
Zhao, M.[Minyi],
Zhou, S.[Shuigeng],
Two-stage Multimodality Fusion for High-performance Text-based Visual
Question Answering,
ACCV22(IV:658-674).
Springer DOI
2307
BibRef
Vivoli, E.[Emanuele],
Biten, A.F.[Ali Furkan],
Mafla, A.[Andres],
Karatzas, D.[Dimosthenis],
Gomez, L.[Lluis],
MUST-VQA: Multilingual Scene-Text VQA,
TextEvery22(345-358).
Springer DOI
2304
BibRef
Chai, Z.[Zi],
Wan, X.J.[Xiao-Jun],
Han, S.C.[Soyeon Caren],
Poon, J.[Josiah],
Visual Question Generation Under Multi-granularity Cross-Modal
Interaction,
MMMod23(I: 255-266).
Springer DOI
2304
BibRef
Wang, J.H.[Jiang-Hai],
Hu, M.H.[Meng-Hao],
Song, Y.G.[Ya-Guang],
Yang, X.S.[Xiao-Shan],
Health-Oriented Multimodal Food Question Answering,
MMMod23(I: 191-203).
Springer DOI
2304
BibRef
Bongini, P.[Pietro],
Becattini, F.[Federico],
del Bimbo, A.[Alberto],
Is GPT-3 All You Need for Visual Question Answering in Cultural
Heritage?,
VisArt22(268-281).
Springer DOI
2304
BibRef
Jha, A.[Abhishek],
Patro, B.[Badri],
Van Gool, L.J.[Luc J.],
Tuytelaars, T.[Tinne],
Barlow constrained optimization for Visual Question Answering,
WACV23(1084-1093)
IEEE DOI
2302
Training, Visualization, Computational modeling, Redundancy,
Semantics, Minimization, visual reasoning.
BibRef
Ravi, S.[Sahithya],
Chinchure, A.[Aditya],
Sigal, L.[Leonid],
Liao, R.J.[Ren-Jie],
Shwartz, V.[Vered],
VLC-BERT: Visual Question Answering with Contextualized Commonsense
Knowledge,
WACV23(1155-1165)
IEEE DOI
2302
Comets, Visualization, Analytical models, Knowledge based systems,
Linguistics, Transformers, visual reasoning)
BibRef
Etesam, Y.[Yasaman],
Kochiev, L.[Leon],
Chang, A.X.[Angel X.],
3DVQA: Visual Question Answering for 3D Environments,
CRV22(233-240)
IEEE DOI
2301
Point cloud compression, Surface reconstruction, Lighting,
Question answering (information retrieval), Noise measurement, 3D
BibRef
Ramamurthy, P.[Priyadharsini],
Aakur, S.N.[Sathyanarayanan N.],
ISD-QA: Iterative Distillation of Commonsense Knowledge from General
Language Models for Unsupervised Question Answering,
ICPR22(1229-1235)
IEEE DOI
2212
Transfer learning, Training data,
Question answering (information retrieval), Data models, Iterative methods
BibRef
Zhang, H.T.[Hao-Tian],
Wu, W.[Wei],
CAT: Re-Conv Attention in Transformer for Visual Question Answering,
ICPR22(1471-1477)
IEEE DOI
2212
Representation learning, Visualization, Predictive models,
Performance gain, Transformers, Feature extraction, Multi-modal task
BibRef
Liu, L.[Lei],
Su, X.D.[Xiang-Dong],
Guo, H.[Hui],
Zhu, D.[Daobin],
A Transformer-based Medical Visual Question Answering Model,
ICPR22(1712-1718)
IEEE DOI
2212
Training, Visualization, Transformers, Feature extraction,
Question answering (information retrieval), Stability analysis, Data mining
BibRef
Wu, X.Y.[Xiang-Yu],
Lu, J.F.[Jian-Feng],
Li, Z.F.[Zhuan-Feng],
Xiong, F.C.[Feng-Chao],
Ques-to-Visual Guided Visual Question Answering,
ICIP22(4193-4197)
IEEE DOI
2211
Location awareness, Visualization, Fuses, Semantics,
Benchmark testing, Question answering (information retrieval), channel attention
BibRef
Sarkar, A.[Argho],
Rahnemoonfar, M.[Maryam],
Grad-Cam Aware Supervised Attention for Visual Question Answering for
Post-Disaster Damage Assessment,
ICIP22(3783-3787)
IEEE DOI
2211
Training, Visualization, Annotations, Pipelines,
Question answering (information retrieval), Hurricanes, Grad-Cam
BibRef
Whitehead, S.[Spencer],
Petryk, S.[Suzanne],
Shakib, V.[Vedaad],
Gonzalez, J.[Joseph],
Darrell, T.J.[Trevor J.],
Rohrbach, A.[Anna],
Rohrbach, M.[Marcus],
Reliable Visual Question Answering: Abstain Rather Than Answer
Incorrectly,
ECCV22(XXXVI:148-166).
Springer DOI
2211
BibRef
Chen, L.[Long],
Zheng, Y.H.[Yu-Hang],
Xiao, J.[Jun],
Rethinking Data Augmentation for Robust Visual Question Answering,
ECCV22(XXXVI:95-112).
Springer DOI
2211
BibRef
Zhang, H.T.[Hao-Tian],
Wu, W.[Wei],
Context Relation Fusion Model for Visual Question Answering,
ICIP22(2112-2116)
IEEE DOI
2211
Visualization, Question answering (information retrieval),
Task analysis, Context modeling, Visual question answering, language bias
BibRef
Biten, A.F.[Ali Furkan],
Litman, R.[Ron],
Xie, Y.S.[Yu-Sheng],
Appalaraju, S.[Srikar],
Manmatha, R.,
LaTr: Layout-Aware Transformer for Scene-Text VQA,
CVPR22(16527-16537)
IEEE DOI
2210
Training, Symbiosis, Visualization, Vocabulary, Layout, Transformers,
Feature extraction, Vision + language,
Scene analysis and understanding
BibRef
Lu, J.Y.[Jia-Ying],
Ye, X.[Xin],
Ren, Y.[Yi],
Yang, Y.Z.[Ye-Zhou],
Good, Better, Best: Textual Distractors Generation for
Multiple-Choice Visual Question Answering via Reinforcement Learning,
ODRUM22(4917-4926)
IEEE DOI
2210
Training, Visualization, Computational modeling,
Knowledge based systems, Training data, Reinforcement learning, Data models
BibRef
Ding, Y.H.[Yi-Hao],
Huang, Z.[Zhe],
Wang, R.[Runlin],
Zhang, Y.H.[Yan-Hang],
Chen, X.[Xianru],
Ma, Y.Z.[Yu-Zhong],
Chung, H.[Hyunsuk],
Han, S.C.[Soyeon Caren],
V-Doc: Visual questions answers with Documents,
CVPR22(21460-21466)
IEEE DOI
2210
Deep learning, Visualization, Computational modeling,
Predictive models, Portable document format,
Question answering (information retrieval)
BibRef
Azuma, D.[Daichi],
Miyanishi, T.[Taiki],
Kurita, S.H.[Shu-Hei],
Kawanabe, M.[Motoaki],
ScanQA: 3D Question Answering for Spatial Scene Understanding,
CVPR22(19107-19117)
IEEE DOI
2210
Location awareness, Measurement, Solid modeling, Visualization,
Question answering (information retrieval), Vision + language,
Scene analysis and understanding
BibRef
Li, G.Y.[Guang-Yao],
Wei, Y.[Yake],
Tian, Y.[Yapeng],
Xu, C.L.[Chen-Liang],
Wen, J.R.[Ji-Rong],
Hu, D.[Di],
Learning to Answer Questions in Dynamic Audio-Visual Scenarios,
CVPR22(19086-19096)
IEEE DOI
2210
Visualization, Image analysis, Codes, Computational modeling,
Cognition, Question answering (information retrieval),
Vision + language
BibRef
Chen, C.[Chongyan],
Anjum, S.[Samreen],
Gurari, D.[Danna],
Grounding Answers for Visual Questions Asked by Visually Impaired
People,
CVPR22(19076-19085)
IEEE DOI
2210
Visualization, Correlation, Grounding, Text recognition,
Computational modeling, Visual impairment,
Vision + language
BibRef
Jing, C.C.[Chen-Chen],
Jia, Y.D.[Yun-De],
Wu, Y.W.[Yu-Wei],
Liu, X.Y.[Xin-Yu],
Wu, Q.[Qi],
Maintaining Reasoning Consistency in Compositional Visual Question
Answering,
CVPR22(5089-5098)
IEEE DOI
2210
Visualization, Birds, Cognition,
Question answering (information retrieval), Visual reasoning
BibRef
Cascante-Bonilla, P.[Paola],
Wu, H.[Hui],
Wang, L.[Letao],
Feris, R.S.[Rogerio S.],
Ordonez, V.[Vicente],
Sim VQA: Exploring Simulated Environments for Visual Question
Answering,
CVPR22(5046-5056)
IEEE DOI
2210
Training, Visualization, Solid modeling, Computational modeling,
Pipelines, Switches, Vision + language, Visual reasoning
BibRef
Gupta, V.[Vipul],
Li, Z.W.[Zhuo-Wan],
Kortylewski, A.[Adam],
Zhang, C.Y.[Chen-Yu],
Li, Y.W.[Ying-Wei],
Yuille, A.L.[Alan L.],
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual
Context in Visual Question Answering,
CVPR22(5068-5078)
IEEE DOI
2210
Training, Visualization, Perturbation methods,
Computational modeling, Predictive models, Robustness,
Visual reasoning
BibRef
Burghouts, G.J.[Gertjan J.],
Huizinga, W.[Wyke],
Coarse-to-Fine Visual Question Answering by Iterative, Conditional
Refinement,
CIAP22(II:418-428).
Springer DOI
2205
BibRef
Kant, Y.[Yash],
Moudgil, A.[Abhinav],
Batra, D.[Dhruv],
Parikh, D.[Devi],
Agrawal, H.[Harsh],
Contrast and Classify: Training Robust VQA Models,
ICCV21(1584-1593)
IEEE DOI
2203
Training, Visualization, Perturbation methods, Linguistics,
Benchmark testing, Boosting, Vision + language,
BibRef
Han, X.Z.[Xin-Zhe],
Wang, S.H.[Shu-Hui],
Su, C.[Chi],
Huang, Q.M.[Qing-Ming],
Tian, Q.[Qi],
Greedy Gradient Ensemble for Robust Visual Question Answering,
ICCV21(1564-1573)
IEEE DOI
2203
Visualization, Analytical models, Annotations,
Computational modeling, Feature extraction, Data models,
BibRef
Dancette, C.[Corentin],
Cadène, R.[Rémi],
Teney, D.[Damien],
Cord, M.[Matthieu],
Beyond Question-Based Biases:
Assessing Multimodal Shortcut Learning in Visual Question Answering,
ICCV21(1554-1563)
IEEE DOI
2203
Training, Visualization, Protocols, Codes, Image color analysis,
Computational modeling, Vision + language, Explainable AI,
Visual reasoning and logical representation
BibRef
Zhou, Y.[Yiyi],
Ren, T.[Tianhe],
Zhu, C.Y.[Chao-Yang],
Sun, X.S.[Xiao-Shuai],
Liu, J.Z.[Jian-Zhuang],
Ding, X.H.[Xing-Hao],
Xu, M.L.[Ming-Liang],
Ji, R.R.[Rong-Rong],
TRAR: Routing the Attention Spans in Transformer for Visual Question
Answering,
ICCV21(2054-2064)
IEEE DOI
2203
Visualization, Schedules, Computational modeling, Transforms,
Benchmark testing, Performance gain, Transformers,
BibRef
Yang, X.[Xu],
Gao, C.Y.[Chong-Yang],
Zhang, H.W.[Han-Wang],
Cai, J.F.[Jian-Fei],
Auto-Parsing Network for Image Captioning and Visual Question
Answering,
ICCV21(2177-2187)
IEEE DOI
2203
Training, Visualization, Graphical models, Stacking, Probability,
Transformers, Vision + language,
BibRef
Banerjee, P.[Pratyay],
Gokhale, T.[Tejas],
Yang, Y.Z.[Ye-Zhou],
Baral, C.[Chitta],
Weakly Supervised Relative Spatial Reasoning for Visual Question
Answering,
ICCV21(1888-1898)
IEEE DOI
2203
Geometry, Visualization, Grounding, Semantics, Estimation,
Predictive models, Vision + language,
Visual reasoning and logical representation
BibRef
Li, L.J.[Lin-Jie],
Lei, J.[Jie],
Gan, Z.[Zhe],
Liu, J.J.[Jing-Jing],
Adversarial VQA:
A New Benchmark for Evaluating the Robustness of VQA Models,
ICCV21(2022-2031)
IEEE DOI
2203
Training, Visualization, Analytical models, Computational modeling,
Benchmark testing, Robustness, Vision + language,
BibRef
Askarian, N.[Narjes],
Abbasnejad, E.[Ehsan],
Zukerman, I.[Ingrid],
Buntine, W.[Wray],
Haffari, G.[Gholamreza],
Inductive Biases for Low Data VQA: A Data Augmentation Approach,
Novelty22(231-240)
IEEE DOI
2202
Training, Visualization, Conferences,
Natural languages, Image annotation, Data models
BibRef
Mathew, M.[Minesh],
Bagal, V.[Viraj],
Tito, R.[Rubèn],
Karatzas, D.[Dimosthenis],
Valveny, E.[Ernest],
Jawahar, C.V.,
InfographicVQA,
WACV22(2582-2591)
IEEE DOI
2202
Visualization, Computational modeling, Layout,
Data visualization, Benchmark testing, Brain modeling,
Vision and Languages
BibRef
Kumar, S.[Sumit],
Patro, B.N.[Badri N.],
Namboodiri, V.P.[Vinay P.],
Auto QA: The Question Is Not Only What, but Also Where,
Novelty22(272-281)
IEEE DOI
2202
Location awareness, Visualization, Laser radar,
Conferences, Semantics, Sensor systems
BibRef
Kolling, C.[Camila],
More, M.[Martin],
Gavenski, N.[Nathan],
Pooch, E.[Eduardo],
Parraga, O.[Otávio],
Barros, R.C.[Rodrigo C.],
Efficient Counterfactual Debiasing for Visual Question Answering,
WACV22(2572-2581)
IEEE DOI
2202
Training, Visualization, Frequency synthesizers, Correlation,
Grounding, Computational modeling, Synthesizers,
Analysis and Understanding
BibRef
Jung, S.J.[Seung-Jun],
Byun, J.Y.[Jun-Young],
Shim, K.[Kyujin],
Hwang, S.Y.[Sangh-Yun],
Kim, C.[Changick],
Understanding VQA for Negative Answers Through Visual and Linguistic
Inference,
ICIP21(2873-2877)
IEEE DOI
2201
Visualization, Image processing, Linguistics, Knowledge discovery,
Inference algorithms, Reliability, Image Captioning,
Constrained Beam Search
BibRef
Felix, R.[Rafael],
Repasky, B.[Boris],
Hodge, S.[Samuel],
Zolfaghari, R.[Reza],
Abbasnejad, E.[Ehsan],
Sherrah, J.[Jamie],
Cross-Modal Visual Question Answering for Remote Sensing Data: the
International Conference on Digital Image Computing: Techniques and
Applications (DICTA 2021),
DICTA21(1-9)
IEEE DOI
2201
Earth, Visualization, Satellites, Digital images, Natural languages,
Machine learning, Transformers, Visual Question Answering,
OpenStreetMap
BibRef
Le, T.[Tung],
Nguyen, H.T.[Huy Tien],
Nguyen, M.L.[Minh Le],
Vision and Text Transformer for Predicting Answerability on Visual
Question Answering,
ICIP21(934-938)
IEEE DOI
2201
Visualization, Image processing, Predictive models,
Knowledge discovery, Robustness, Task analysis, Answerability,
Multi-head Attention
BibRef
Huang, Z.Q.[Zi-Qi],
Zhu, H.Y.[Hong-Yuan],
Sun, Y.[Ying],
Choi, D.[Dongkyu],
Tan, C.[Cheston],
Lim, J.H.[Joo-Hwee],
A Diagnostic Study of Visual Question Answering With Analogical
Reasoning,
ICIP21(2463-2467)
IEEE DOI
2201
Location awareness, Visualization, Image processing,
Natural languages, Benchmark testing, Tools, Knowledge discovery,
benchmark
BibRef
Chen, H.Y.[Hong-Yu],
Liu, R.F.[Rui-Fang],
Peng, B.[Bo],
Cross-modal Relational Reasoning Network for Visual Question
Answering,
MAIR2-21(3939-3948)
IEEE DOI
2112
Bridges, Visualization, Semantics,
Knowledge discovery, Linear programming
BibRef
Wang, Z.X.[Zi-Xu],
Miao, Y.[Yishu],
Specia, L.[Lucia],
Latent Variable Models for Visual Question Answering,
CLVL21(3137-3141)
IEEE DOI
2112
Training, Visualization,
Computer aided instruction, Benchmark testing, Knowledge discovery
BibRef
Hirota, Y.[Yusuke],
Garcia, N.[Noa],
Otani, M.[Mayu],
Chu, C.[Chenhui],
Nakashima, Y.[Yuta],
Taniguchi, I.[Ittetsu],
Onoye, T.[Takao],
Visual Question Answering with Textual Representations for Images,
CLVL21(3147-3150)
IEEE DOI
2112
Visualization,
Computational modeling, Knowledge discovery, Feature extraction,
Object recognition
BibRef
Ye, K.[Keren],
Kovashka, A.[Adriana],
Linguistic Structures as Weak Supervision for Visual Scene Graph
Generation,
CVPR21(8285-8295)
IEEE DOI
2111
Location awareness, Visualization, Blogs,
Linguistics, Pattern recognition, Noise measurement
BibRef
Xiao, J.B.[Jun-Bin],
Shang, X.[Xindi],
Yao, A.[Angela],
Chua, T.S.[Tat-Seng],
NExT-QA: Next Phase of Question-Answering to Explaining Temporal
Actions,
CVPR21(9772-9781)
IEEE DOI
2111
Adaptation models, Benchmark testing,
Knowledge discovery, Cognition, Pattern recognition, Task analysis
BibRef
Chen, X.Y.[Xian-Yu],
Jiang, M.[Ming],
Zhao, Q.[Qi],
Predicting Human Scanpaths in Visual Question Answering,
CVPR21(10871-10880)
IEEE DOI
2111
Training, Visualization, Reinforcement learning,
Predictive models, Tools, Knowledge discovery
BibRef
Qi, Y.G.[Yong-Gang],
Zhang, K.[Kai],
Sain, A.[Aneeshan],
Song, Y.Z.[Yi-Zhe],
PQA: Perceptual Question Answering,
CVPR21(12051-12059)
IEEE DOI
2111
Visualization, Training data, Psychology, Organizations,
Visual systems, Knowledge discovery, Data models
BibRef
Yuan, Y.Y.[Yuan-Yuan],
Wang, S.[Shuai],
Jiang, M.Y.[Ming-Yue],
Chen, T.Y.[Tsong Yueh],
Perception Matters: Detecting Perception Failures of VQA Models Using
Metamorphic Testing,
CVPR21(16903-16912)
IEEE DOI
2111
Visualization, Computational modeling, Transforms,
Benchmark testing, Knowledge discovery, Cognition
BibRef
Marino, K.[Kenneth],
Chen, X.L.[Xin-Lei],
Parikh, D.[Devi],
Gupta, A.[Abhinav],
Rohrbach, M.[Marcus],
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA,
CVPR21(14106-14116)
IEEE DOI
2111
Training, Vocabulary, Knowledge based systems, Semantics,
Training data, Knowledge representation, Predictive models
BibRef
Niu, Y.[Yulei],
Tang, K.[Kaihua],
Zhang, H.W.[Han-Wang],
Lu, Z.W.[Zhi-Wu],
Hua, X.S.[Xian-Sheng],
Wen, J.R.[Ji-Rong],
Counterfactual VQA: A Cause-Effect Look at Language Bias,
CVPR21(12695-12705)
IEEE DOI
2111
Codes, Linguistics, Robustness, Cognition, Pattern recognition
BibRef
Yang, Z.Y.[Zheng-Yuan],
Lu, Y.J.[Yi-Juan],
Wang, J.F.[Jian-Feng],
Yin, X.[Xi],
Florencio, D.[Dinei],
Wang, L.J.[Li-Juan],
Zhang, C.[Cha],
Zhang, L.[Lei],
Luo, J.B.[Jie-Bo],
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption,
CVPR21(8747-8757)
IEEE DOI
2111
Visualization, Training data, Predictive models,
Knowledge discovery, Pattern recognition, Optical character recognition software
BibRef
Kervadec, C.[Corentin],
Jaunet, T.[Théo],
Antipov, G.[Grigory],
Baccouche, M.[Moez],
Vuillemot, R.[Romain],
Wolf, C.[Christian],
How Transferable are Reasoning Patterns in VQA?,
CVPR21(4205-4214)
IEEE DOI
2111
Visualization, Analytical models, Data visualization, Tools,
Transformers, Cognition, Data models
BibRef
Kervadec, C.[Corentin],
Antipov, G.[Grigory],
Baccouche, M.[Moez],
Wolf, C.[Christian],
Roses are Red, Violets are Blue… But Should VQA expect Them To?,
CVPR21(2775-2784)
IEEE DOI
2111
Training, Measurement, Visualization,
Computational modeling, Benchmark testing, Knowledge discovery
BibRef
Dua, R.[Radhika],
Kancheti, S.S.[Sai Srinivas],
Balasubramanian, V.N.[Vineeth N],
Beyond VQA: Generating Multi-word Answers and Rationales to Visual
Questions,
MULA21(1623-1632)
IEEE DOI
2109
Deep learning, Visualization, Vocabulary, Computational modeling,
Knowledge discovery
BibRef
Rahman, T.[Tanzila],
Chou, S.H.[Shih-Han],
Sigal, L.[Leonid],
Carenini, G.[Giuseppe],
An Improved Attention for Visual Question Answering,
MULA21(1653-1662)
IEEE DOI
2109
Visualization,
Computational modeling, Natural languages, Logic gates
BibRef
Jolly, S.[Shailza],
Palacio, S.[Sebastian],
Folz, J.[Joachim],
Raue, F.[Federico],
Hees, J.[Jörn],
Dengel, A.[Andreas],
P ˜ NP, at least in Visual Question Answering,
ICPR21(2748-2754)
IEEE DOI
2105
Training, Visualization, Upper bound, Knowledge discovery, Pattern recognition
BibRef
Farazi, M.[Moshiur],
Khan, S.[Salman],
Barnes, N.M.[Nick M.],
Question-Agnostic Attention for Visual Question Answering,
ICPR21(3542-3549)
IEEE DOI
2105
Training, Visualization, Image resolution, Preforms,
Computational modeling, Semantics, Focusing,
Multimodal Fusion
BibRef
Li, Y.[Yanan],
Lin, Y.[Yuetan],
Zhao, H.H.[Hong-Hui],
Wang, D.H.[Dong-Hui],
Dual Path Multi-Modal High-Order Features for Textual Content based
Visual Question Answering,
ICPR21(4324-4331)
IEEE DOI
2105
Visualization, Image recognition, Image coding, Correlation,
Text recognition, Fuses, Semantics
BibRef
Mishra, A.[Aakansha],
Anand, A.[Ashish],
Guha, P.[Prithwijit],
Multi-stage Attention based Visual Question Answering,
ICPR21(9407-9414)
IEEE DOI
2105
Visualization, Analytical models, Bidirectional control,
Benchmark testing, Knowledge discovery, Pattern recognition,
Attention Network
BibRef
Bozinis, T.[Theodoros],
Passalis, N.[Nikolaos],
Tefas, A.[Anastasios],
Improving Visual Question Answering using Active Perception on Static
Images,
ICPR21(879-884)
IEEE DOI
2105
Deep learning, Visualization, Analytical models, Image resolution,
Active perception, Reinforcement learning, Knowledge discovery
BibRef
Huang, H.T.[Han-Tao],
Han, T.[Tao],
Han, W.[Wei],
Yap, D.[Deep],
Chiang, C.M.[Cheng-Ming],
Answer-checking in Context:
A Multi-modal Fully Attention Network for Visual Question Answering,
ICPR21(1173-1180)
IEEE DOI
2105
Visualization, Bit error rate, Image representation,
Knowledge discovery, Pattern recognition
BibRef
Sun, Q.[Qiang],
Xie, B.H.[Bing-Hui],
Fu, Y.W.[Yan-Wei],
Second Order Enhanced Multi-Glimpse Attention in Visual Question
Answering,
ACCV20(IV:87-103).
Springer DOI
2103
BibRef
Goel, V.[Vatsal],
Chandak, M.[Mohit],
Anand, A.[Ashish],
Guha, P.[Prithwijit],
IQ-VQA: Intelligent Visual Question Answering,
VTIUR20(357-370).
Springer DOI
2103
BibRef
Qiao, Y.,
Yu, Z.,
Liu, J.,
VC-VQA: Visual Calibration Mechanism For Visual Question Answering,
ICIP20(1481-1485)
IEEE DOI
2011
Visualization, Image reconstruction, Calibration, Task analysis,
Predictive models, Feature extraction, Knowledge discovery,
Feature Reconstruction
BibRef
Tang, R.X.[Rui-Xue],
Ma, C.[Chao],
Zhang, W.E.[Wei Emma],
Wu, Q.[Qi],
Yang, X.K.[Xiao-Kang],
Semantic Equivalent Adversarial Data Augmentation for Visual Question
Answering,
ECCV20(XIX:437-453).
Springer DOI
2011
BibRef
Gokhale, T.[Tejas],
Banerjee, P.[Pratyay],
Baral, C.[Chitta],
Yang, Y.Z.[Ye-Zhou],
VQA-LOL: Visual Question Answering Under the Lens of Logic,
ECCV20(XXI:379-396).
Springer DOI
2011
BibRef
Yang, X.F.[Xiao-Feng],
Lin, G.S.[Guo-Sheng],
Lv, F.M.[Feng-Mao],
Liu, F.Y.[Fa-Yao],
TRRNET:
Tiered Relation Reasoning for Compositional Visual Question Answering,
ECCV20(XXI:414-430).
Springer DOI
2011
BibRef
Bansal, A.[Ankan],
Zhang, Y.[Yuting],
Chellappa, R.[Rama],
Visual Question Answering on Image Sets,
ECCV20(XXI:51-67).
Springer DOI
2011
BibRef
Han, X.Z.[Xin-Zhe],
Wang, S.H.[Shu-Hui],
Su, C.[Chi],
Zhang, W.G.[Wei-Gang],
Huang, Q.M.[Qing-Ming],
Tian, Q.[Qi],
Interpretable Visual Reasoning via Probabilistic Formulation Under
Natural Supervision,
ECCV20(IX:553-570).
Springer DOI
2011
BibRef
Kant, Y.[Yash],
Batra, D.[Dhruv],
Anderson, P.[Peter],
Schwing, A.[Alexander],
Parikh, D.[Devi],
Lu, J.[Jiasen],
Agrawal, H.[Harsh],
Spatially Aware Multimodal Transformers for TextVQA,
ECCV20(IX:715-732).
Springer DOI
2011
BibRef
Li, Q.[Qing],
Huang, S.Y.[Si-Yuan],
Hong, Y.[Yining],
Zhu, S.C.[Song-Chun],
A Competence-aware Curriculum for Visual Concepts Learning via Question
Answering,
ECCV20(II:141-157).
Springer DOI
2011
BibRef
Bajaj, G.,
Bandyopadhyay, B.,
Schmidt, D.,
Maneriker, P.,
Myers, C.,
Parthasarathy, S.,
Understanding Knowledge Gaps in Visual Question Answering:
Implications for Gap Identification and Testing,
MVM20(1563-1566)
IEEE DOI
2008
Cognition, Training, Task analysis, Artificial intelligence,
Global communication, Taxonomy, Semantics
BibRef
Vatashsky, B.,
Ullman, S.,
VQA With No Questions-Answers Training,
CVPR20(10373-10383)
IEEE DOI
2008
Visualization, Training, Image color analysis, Knowledge discovery,
Boats, Image analysis, Task analysis
BibRef
Jiang, H.,
Misra, I.,
Rohrbach, M.,
Learned-Miller, E.G.,
Chen, X.,
In Defense of Grid Features for Visual Question Answering,
CVPR20(10264-10273)
IEEE DOI
2008
Feature extraction, Visualization, Task analysis, Detectors,
Object detection, Training, Pipelines
BibRef
Wang, X.,
Liu, Y.,
Shen, C.,
Ng, C.C.,
Luo, C.,
Jin, L.,
Chan, C.S.,
van den Hengel, A.,
Wang, L.,
On the General Value of Evidence, and Bilingual Scene-Text Visual
Question Answering,
CVPR20(10123-10132)
IEEE DOI
2008
Measurement, Cognition, Knowledge discovery, Correlation,
Task analysis, Visualization, Optical character recognition software
BibRef
Xiong, P.,
Wu, Y.,
TA-Student VQA: Multi-Agents Training by Self-Questioning,
CVPR20(10062-10072)
IEEE DOI
2008
Visualization, Training, Knowledge discovery, Standards,
Task analysis, Boosting
BibRef
Agarwal, V.,
Shetty, R.,
Fritz, M.,
Towards Causal VQA: Revealing and Reducing Spurious Correlations by
Invariant and Covariant Semantic Editing,
CVPR20(9687-9695)
IEEE DOI
2008
Data models, Robustness, Predictive models, Semantics, Correlation,
Vocabulary, Visualization
BibRef
Hu, R.,
Singh, A.,
Darrell, T.J.,
Rohrbach, M.,
Iterative Answer Prediction With Pointer-Augmented Multimodal
Transformers for TextVQA,
CVPR20(9989-9999)
IEEE DOI
2008
Optical character recognition software, Task analysis,
Feature extraction, Visualization, Iterative decoding, Vocabulary,
Predictive models
BibRef
Kafle, K.,
Shrestha, R.,
Price, B.,
Cohen, S.,
Kanan, C.,
Answering Questions about Data Visualizations using Efficient Bimodal
Fusion,
WACV20(1487-1496)
IEEE DOI
2006
Bars, Data visualization, Image color analysis, Visualization,
Task analysis, Optical character recognition software, Training
BibRef
Patro, B.N.,
Patel, S.,
Namboodiri, V.P.,
Robust Explanations for Visual Question Answering,
WACV20(1566-1575)
IEEE DOI
2006
Visualization, Robustness, Perturbation methods,
Knowledge discovery, Collaboration, Task analysis, Coherence
BibRef
Chou, S.,
Chao, W.,
Lai, W.,
Sun, M.,
Yang, M.,
Visual Question Answering on 360° Images,
WACV20(1596-1605)
IEEE DOI
2006
Visualization, Task analysis, Feature extraction, Distortion,
Cognition, Image color analysis, Spatial resolution
BibRef
Chaudhry, R.,
Shekhar, S.,
Gupta, U.,
Maneriker, P.,
Bansal, P.,
Joshi, A.,
LEAF-QA: Locate, Encode Attend for Figure Question Answering,
WACV20(3501-3510)
IEEE DOI
2006
Bars, Knowledge discovery, Image color analysis, Training,
Vocabulary, Data mining, Data visualization
BibRef
Liang, Y.Z.[Yuan-Zhi],
Bai, Y.L.[Ya-Long],
Zhang, W.[Wei],
Qian, X.M.[Xue-Ming],
Zhu, L.[Li],
Mei, T.[Tao],
VrR-VG: Refocusing Visually-Relevant Relationships,
ICCV19(10402-10411)
IEEE DOI
2004
bioinformatics, data mining, data visualisation,
feature extraction, genomics, graph theory, image annotation, Cognition
BibRef
Bhattacharya, N.,
Li, Q.,
Gurari, D.,
Why Does a Visual Question Have Different Answers?,
ICCV19(4270-4279)
IEEE DOI
2004
Code, Visual Q-A.
WWW Link. question answering (information retrieval),
visual question answering, Visualization, Powders, Task analysis,
Computer vision
BibRef
Li, L.J.[Lin-Jie],
Gan, Z.[Zhe],
Cheng, Y.[Yu],
Liu, J.J.[Jing-Jing],
Relation-Aware Graph Attention Network for Visual Question Answering,
ICCV19(10312-10321)
IEEE DOI
2004
data visualisation, graph theory,
learning (artificial intelligence), object detection, Computational modeling
BibRef
Peng, G.[Gao],
You, H.X.[Hao-Xuan],
Zhang, Z.P.[Zhan-Peng],
Wang, X.G.[Xiao-Gang],
Li, H.S.[Hong-Sheng],
Multi-Modality Latent Interaction Network for Visual Question
Answering,
ICCV19(5824-5834)
IEEE DOI
2004
data visualisation, image representation, image retrieval,
learning (artificial intelligence),
Object detection
BibRef
Do, T.,
Tran, H.,
Do, T.,
Tjiputra, E.,
Tran, Q.,
Compact Trilinear Interaction for Visual Question Answering,
ICCV19(392-401)
IEEE DOI
2004
learning (artificial intelligence),
matrix decomposition,
Correlation
BibRef
Schwartz, I.[Idan],
Yu, S.[Seunghak],
Hazan, T.[Tamir],
Schwing, A.G.[Alexander G.],
Factor Graph Attention,
CVPR19(2039-2048).
IEEE DOI
2002
BibRef
Kolesnikov, A.[Alexander],
Beyer, L.[Lucas],
Zhai, X.H.[Xiao-Hua],
Puigcerver, J.[Joan],
Yung, J.[Jessica],
Gelly, S.[Sylvain],
Houlsby, N.[Neil],
Big Transfer (BIT): General Visual Representation Learning,
ECCV20(V:491-507).
Springer DOI
2011
BibRef
Kolesnikov, A.[Alexander],
Zhai, X.H.[Xiao-Hua],
Beyer, L.[Lucas],
Revisiting Self-Supervised Visual Representation Learning,
CVPR19(1920-1929).
IEEE DOI
2002
BibRef
Xiong, P.X.[Pei-Xi],
Zhan, H.Y.[Hua-Yi],
Wang, X.[Xin],
Sinha, B.[Baivab],
Wu, Y.[Ying],
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning,
CVPR19(8349-8358).
IEEE DOI
2002
BibRef
Singh, A.[Amanpreet],
Natarajan, V.[Vivek],
Shah, M.[Meet],
Jiang, Y.[Yu],
Chen, X.L.[Xin-Lei],
Batra, D.[Dhruv],
Parikh, D.[Devi],
Rohrbach, M.[Marcus],
Towards VQA Models That Can Read,
CVPR19(8309-8318).
IEEE DOI
2002
BibRef
Manjunatha, V.[Varun],
Saini, N.[Nirat],
Davis, L.S.[Larry S.],
Explicit Bias Discovery in Visual Question Answering Models,
CVPR19(9554-9563).
IEEE DOI
2002
BibRef
Shrestha, R.[Robik],
Kafle, K.[Kushal],
Kanan, C.[Christopher],
Answer Them All! Toward Universal Visual Question Answering Models,
CVPR19(10464-10473).
IEEE DOI
2002
BibRef
Noh, H.[Hyeonwoo],
Kim, T.[Taehoon],
Mun, J.[Jonghwan],
Han, B.H.[Bo-Hyung],
Transfer Learning via Unsupervised Task Discovery for Visual Question
Answering,
CVPR19(8377-8386).
IEEE DOI
2002
BibRef
Wijmans, E.[Erik],
Datta, S.[Samyak],
Maksymets, O.[Oleksandr],
Das, A.[Abhishek],
Gkioxari, G.[Georgia],
Lee, S.[Stefan],
Essa, I.[Irfan],
Parikh, D.[Devi],
Batra, D.[Dhruv],
Embodied Question Answering in Photorealistic Environments With Point
Cloud Perception,
CVPR19(6652-6661).
IEEE DOI
2002
BibRef
Shah, M.[Meet],
Chen, X.L.[Xin-Lei],
Rohrbach, M.[Marcus],
Parikh, D.[Devi],
Cycle-Consistency for Robust Visual Question Answering,
CVPR19(6642-6651).
IEEE DOI
2002
BibRef
Li, H.[Hui],
Wang, P.[Peng],
Shen, C.H.[Chun-Hua],
van den Hengel, A.[Anton],
Visual Question Answering as Reading Comprehension,
CVPR19(6312-6321).
IEEE DOI
2002
BibRef
Yu, L.C.[Li-Cheng],
Chen, X.L.[Xin-Lei],
Gkioxari, G.[Georgia],
Bansal, M.[Mohit],
Berg, T.L.[Tamara L.],
Batra, D.[Dhruv],
Multi-Target Embodied Question Answering,
CVPR19(6302-6311).
IEEE DOI
2002
BibRef
Yu, Z.[Zhou],
Yu, J.[Jun],
Cui, Y.H.[Yu-Hao],
Tao, D.C.[Da-Cheng],
Tian, Q.[Qi],
Deep Modular Co-Attention Networks for Visual Question Answering,
CVPR19(6274-6283).
IEEE DOI
2002
BibRef
Abbasnejad, E.[Ehsan],
Wu, Q.[Qi],
Shi, Q.F.[Qin-Feng],
van den Hengel, A.[Anton],
What's to Know? Uncertainty as a Guide to Asking Goal-Oriented
Questions,
CVPR19(4150-4159).
IEEE DOI
2002
BibRef
Schwenk, D.[Dustin],
Khandelwal, A.[Apoorv],
Clark, C.[Christopher],
Marino, K.[Kenneth],
Mottaghi, R.[Roozbeh],
A-OKVQA: A Benchmark for Visual Question Answering Using World
Knowledge,
ECCV22(VIII:146-162).
Springer DOI
2211
BibRef
Marino, K.[Kenneth],
Rastegari, M.[Mohammad],
Farhadi, A.[Ali],
Mottaghi, R.[Roozbeh],
OK-VQA: A Visual Question Answering Benchmark Requiring External
Knowledge,
CVPR19(3190-3199).
IEEE DOI
2002
BibRef
Krishna, R.[Ranjay],
Bernstein, M.[Michael],
Fei-Fei, L.[Li],
Information Maximizing Visual Question Generation,
CVPR19(2008-2018).
IEEE DOI
2002
BibRef
Cadene, R.[Remi],
Ben-younes, H.[Hedi],
Cord, M.[Matthieu],
Thome, N.[Nicolas],
MUREL: Multimodal Relational Reasoning for Visual Question Answering,
CVPR19(1989-1998).
IEEE DOI
2002
BibRef
Haurilet, M.[Monica],
Roitberg, A.[Alina],
Stiefelhagen, R.[Rainer],
It's Not About the Journey; It's About the Destination: Following Soft
Paths Under Question-Guidance for Visual Reasoning,
CVPR19(1930-1939).
IEEE DOI
2002
BibRef
Qiu, Y.,
Satoh, Y.,
Suzuki, R.,
Kataoka, H.,
Incorporating 3D Information Into Visual Question Answering,
3DV19(756-765)
IEEE DOI
1911
Feature extraction, Task analysis,
Visualization, Natural language processing, Cognition,
Human computer interaction
BibRef
Haurilet, M.[Monica],
Al-Halah, Z.[Ziad],
Stiefelhagen, R.[Rainer],
DynGraph: Visual Question Answering via Dynamic Scene Graphs,
GCPR19(428-441).
Springer DOI
1911
BibRef
Earlier:
MoQA: A Multi-modal Question Answering Architecture,
VL18(IV:106-113).
Springer DOI
1905
BibRef
Liu, F.,
Liu, J.,
Fang, Z.,
Lu, H.,
Language and Visual Relations Encoding for Visual Question Answering,
ICIP19(3307-3311)
IEEE DOI
1910
Visual question answering, Relations, Attention
BibRef
Fang, Z.W.[Zhi-Wei],
Liu, J.[Jing],
Tang, Q.[Qu],
Li, Y.[Yong],
Lu, H.Q.[Han-Qing],
Answer Distillation for Visual Question Answering,
ACCV18(I:72-87).
Springer DOI
1906
BibRef
Kuhnle, A.[Alexander],
Xie, H.Y.[Hui-Yuan],
Copestake, A.[Ann],
How Clever Is the FiLM Model, and How Clever Can it Be?,
VL18(IV:162-172).
Springer DOI
1905
BibRef
Li, W.[Wei],
Yuan, Z.H.[Ze-Huan],
Fang, X.Z.[Xiang-Zhong],
Wang, C.[Changhu],
Knowing Where to Look? Analysis on Attention of Visual Question
Answering System,
VL18(IV:145-152).
Springer DOI
1905
BibRef
Wagner, M.[Misha],
Basevi, H.[Hector],
Shetty, R.[Rakshith],
Li, W.B.[Wen-Bin],
Malinowski, M.[Mateusz],
Fritz, M.[Mario],
Leonardis, A.[Aleš],
Answering Visual What-If Questions: From Actions to Predicted Scene
Descriptions,
VLEASE18(I:521-537).
Springer DOI
1905
BibRef
Duke, B.,
Taylor, G.W.,
Generalized Hadamard-Product Fusion Operators for Visual Question
Answering,
CRV18(39-46)
IEEE DOI
1812
Feature extraction, Visualization, Task analysis, Data models,
Mathematical model, Natural languages, Model Selection,
Visual Question-Answering
BibRef
Das, A.,
Datta, S.,
Gkioxari, G.,
Lee, S.,
Parikh, D.,
Batra, D.,
Embodied Question Answering,
CVPR18(1-10)
IEEE DOI
1812
Navigation, Visualization, Task analysis, Automobiles,
Knowledge discovery
BibRef
Misra, I.,
Girshick, R.,
Fergus, R.,
Hebert, M.,
Gupta, A.,
van der Maaten, L.[Laurens],
Learning by Asking Questions,
CVPR18(11-20)
IEEE DOI
1812
Training, Proposals, Visualization, Knowledge discovery, Standards,
Task analysis, Data models
BibRef
Gurari, D.,
Li, Q.,
Stangl, A.J.,
Guo, A.,
Lin, C.,
Grauman, K.,
Luo, J.,
Bigham, J.P.,
VizWiz Grand Challenge: Answering Visual Questions from Blind People,
CVPR18(3608-3617)
IEEE DOI
1812
Visualization, Blindness, Prediction algorithms, Lighting,
Mobile handsets, Shape
BibRef
Li, J.,
Su, H.,
Zhu, J.,
Wang, S.,
Zhang, B.,
Textbook Question Answering Under Instructor Guidance with Memory
Networks,
CVPR18(3655-3663)
IEEE DOI
1812
Task analysis, Cognition, Visualization, Feature extraction,
Semantics, Knowledge discovery, Drugs
BibRef
Gordon, D.,
Kembhavi, A.,
Rastegari, M.,
Redmon, J.,
Fox, D.,
Farhadi, A.,
IQA: Visual Question Answering in Interactive Environments,
CVPR18(4089-4098)
IEEE DOI
1812
Task analysis, Navigation, Visualization, Knowledge discovery,
Semantics, Planning
BibRef
Agrawal, A.,
Batra, D.,
Parikh, D.,
Kembhavi, A.,
Don't Just Assume; Look and Answer: Overcoming Priors for Visual
Question Answering,
CVPR18(4971-4980)
IEEE DOI
1812
Image color analysis, Visualization, Data models, Training data,
Training, Knowledge discovery, Dogs
BibRef
Sha, F.,
Chao, W.,
Hu, H.,
Learning Answer Embeddings for Visual Question Answering,
CVPR18(5428-5436)
IEEE DOI
1812
Visualization, Semantics, Probabilistic logic,
Computational modeling, Task analysis, Training, Adaptation models
BibRef
Kafle, K.,
Price, B.,
Cohen, S.,
Kanan, C.,
DVQA: Understanding Data Visualizations via Question Answering,
CVPR18(5648-5656)
IEEE DOI
1812
Bars, Cognition, Image color analysis, Visualization,
Data visualization, Data mining, Knowledge discovery
BibRef
Sha, F.,
Hu, H.,
Chao, W.,
Cross-Dataset Adaptation for Visual Question Answering,
CVPR18(5716-5725)
IEEE DOI
1812
Visualization, Task analysis, Adaptation models,
Knowledge discovery, Games, Training, Target recognition
BibRef
Anderson, P.,
He, X.,
Buehler, C.,
Teney, D.,
Johnson, M.,
Gould, S.,
Zhang, L.,
Bottom-Up and Top-Down Attention for Image Captioning and Visual
Question Answering,
CVPR18(6077-6086)
IEEE DOI
1812
Visualization, Task analysis, Proposals, Mathematical model, Servers,
Context modeling, Object detection
BibRef
Nguyen, D.,
Okatani, T.,
Improved Fusion of Visual and Language Representations by Dense
Symmetric Co-attention for Visual Question Answering,
CVPR18(6087-6096)
IEEE DOI
1812
Feature extraction, Visualization, Fuses,
Knowledge discovery, Bidirectional control
BibRef
Patro, B.,
Namboodiri, V.P.,
Differential Attention for Visual Question Answering,
CVPR18(7680-7688)
IEEE DOI
1812
Semantics, Task analysis, Visualization, Knowledge discovery,
Correlation, Measurement, Training
BibRef
Su, Z.[Zhou],
Zhu, C.[Chen],
Dong, Y.P.[Yin-Peng],
Cai, D.Q.[Dong-Qi],
Chen, Y.R.[Yu-Rong],
Li, J.G.[Jian-Guo],
Learning Visual Knowledge Memory Networks for Visual Question
Answering,
CVPR18(7736-7745)
IEEE DOI
1812
Visualization, Knowledge based systems, Task analysis,
Knowledge discovery, Cognition, Ovens
BibRef
Das, A.,
Datta, S.,
Gkioxari, G.,
Lee, S.,
Parikh, D.,
Batra, D.,
Embodied Question Answering,
DeepLearnRV18(2135-213509)
IEEE DOI
1812
Navigation, Visualization, Task analysis, Automobiles,
Knowledge discovery
BibRef
Cheng, W.,
Huang, Y.,
Wang, L.,
Towards Unconstrained Pointing Problem of Visual Question Answering:
A Retrieval-based Method,
ICPR18(3303-3308)
IEEE DOI
1812
Visualization, Task analysis, Feature extraction, Training,
Knowledge discovery, Proposals, Semantics
BibRef
Zhou, B.[Bolei],
Sun, Y.[Yiyou],
Bau, D.[David],
Torralba, A.B.[Antonio B.],
Interpretable Basis Decomposition for Visual Explanation,
ECCV18(VIII: 122-138).
Springer DOI
1810
BibRef
Shi, Y.[Yang],
Furlanello, T.[Tommaso],
Zha, S.[Sheng],
Anandkumar, A.[Animashree],
Question Type Guided Attention in Visual Question Answering,
ECCV18(II: 158-175).
Springer DOI
1810
BibRef
Narasimhan, M.[Medhini],
Schwing, A.G.[Alexander G.],
Straight to the Facts: Learning Knowledge Base Retrieval for Factual
Visual Question Answering,
ECCV18(VIII: 460-477).
Springer DOI
1810
BibRef
Malinowski, M.[Mateusz],
Doersch, C.[Carl],
Santoro, A.[Adam],
Battaglia, P.[Peter],
Learning Visual Question Answering by Bootstrapping Hard Attention,
ECCV18(VI: 3-20).
Springer DOI
1810
BibRef
Gu, J.X.[Jiu-Xiang],
Cai, J.F.[Jian-Fei],
Joty, S.[Shafiq],
Niu, L.[Li],
Wang, G.[Gang],
Look, Imagine and Match: Improving Textual-Visual Cross-Modal
Retrieval with Generative Models,
CVPR18(7181-7189)
IEEE DOI
1812
Visualization, Training, Decoding, Semantics, Measurement.
BibRef
Li, Q.[Qing],
Tao, Q.Y.[Qing-Yi],
Joty, S.[Shafiq],
Cai, J.F.[Jian-Fei],
Luo, J.B.[Jie-Bo],
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual
Questions,
ECCV18(VII: 570-586).
Springer DOI
1810
BibRef
Yu, D.,
Gao, X.,
Xiong, H.,
Structured Semantic Representation for Visual Question Answering,
ICIP18(2286-2290)
IEEE DOI
1809
Semantics, Training, Cognition, Visualization, Task analysis,
Linguistics, Computational modeling,
Visual question answering
BibRef
Huang, L.,
Kulkarni, K.,
Jha, A.,
Lohit, S.,
Jayasuriya, S.,
Turaga, P.K.,
CS-VQA: Visual Question Answering with Compressively Sensed Images,
ICIP18(1283-1287)
IEEE DOI
1809
Visualization, Image reconstruction, Image coding, Task analysis,
Feature extraction, Training, Multiplexing,
image reconstruction
BibRef
Desta, M.T.,
Chen, L.,
Kornuta, T.,
Object-Based Reasoning in VQA,
WACV18(1814-1823)
IEEE DOI
1806
data visualisation, inference mechanisms,
natural language processing, object detection, Visualization
BibRef
Zhao, H.,
Fan, Q.,
Gutfreund, D.,
Fu, Y.,
Semantically Guided Visual Question Answering,
WACV18(1852-1860)
IEEE DOI
1806
data visualisation, image colour analysis, image representation,
learning (artificial intelligence),
Visualization
BibRef
Wang, Z.,
Liu, X.,
Wang, L.,
Qiao, Y.,
Xie, X.,
Fowlkes, C.C.[Charless C.],
Structured Triplet Learning with POS-Tag Guided Attention for Visual
Question Answering,
WACV18(1888-1896)
IEEE DOI
1806
convolution, data visualisation,
learning (artificial intelligence),
Visualization
BibRef
Chowdhury, I.,
Nguyen, K.,
Fookes, C.,
Sridharan, S.,
A cascaded long short-term memory (LSTM) driven generic visual
question answering (VQA),
ICIP17(1842-1846)
IEEE DOI
1803
Feature extraction, Mathematical model, Natural languages,
Principal component analysis, Task analysis, Training,
scene understanding
BibRef
Sheng, S.[Shurong],
Venkitasubramanian, A.N.[Aparna Nurani],
Moens, M.F.[Marie-Francine],
A Markov Network Based Passage Retrieval Method for Multimodal Question
Answering in the Cultural Heritage Domain,
MMMod18(I:3-15).
Springer DOI
1802
BibRef
Yu, Z.,
Yu, J.,
Fan, J.,
Tao, D.,
Multi-modal Factorized Bilinear Pooling with Co-attention Learning
for Visual Question Answering,
ICCV17(1839-1848)
IEEE DOI
1802
computational complexity, feature extraction, image fusion,
learning (artificial intelligence), Visualization
BibRef
Ben-younes, H.,
Cadene, R.,
Cord, M.,
Thome, N.,
MUTAN: Multimodal Tucker Fusion for Visual Question Answering,
ICCV17(2631-2639)
IEEE DOI
1802
image fusion, image representation,
question answering (information retrieval), tensors, (VQA) tasks,
Visualization
BibRef
Jain, U.[Unnat],
Zhang, Z.Y.[Zi-Yu],
Schwing, A.[Alexander],
Creativity: Generating Diverse Questions Using Variational
Autoencoders,
CVPR17(5415-5424)
IEEE DOI
1711
Artificial intelligence, Creativity, Hidden Markov models,
Training, Transforms, Visualization
BibRef
Zhu, Y.,
Lim, J.J.,
Fei-Fei, L.[Li],
Knowledge Acquisition for Visual Question Answering via Iterative
Querying,
CVPR17(6146-6155)
IEEE DOI
1711
Computational modeling, Data models, Generators,
Knowledge discovery, Standards, Visualization
BibRef
Lin, Y.T.[Yue-Tan],
Pang, Z.Y.[Zhang-Yang],
Li, Y.[Yanan],
Wang, D.H.[Dong-Hui],
Simple and effective visual question answering in a single modality,
ICIP16(2276-2280)
IEEE DOI
1610
Benchmark testing. Not just add text to image questions.
BibRef
Kembhavi, A.,
Seo, M.,
Schwenk, D.,
Choi, J.,
Farhadi, A.,
Hajishirzi, H.,
Are You Smarter Than a Sixth Grader? Textbook Question Answering for
Multimodal Machine Comprehension,
CVPR17(5376-5384)
IEEE DOI
1711
Cognition, Knowledge discovery, Natural languages,
Training, Visualization
BibRef
Ganju, S.,
Russakovsky, O.,
Gupta, A.,
What's in a Question:
Using Visual Questions as a Form of Supervision,
CVPR17(6422-6431)
IEEE DOI
1711
Artificial intelligence, Computational modeling,
Dogs, Image color analysis, SPICE, Visualization
BibRef
Xu, H.J.[Hui-Juan],
Saenko, K.[Kate],
Ask, Attend and Answer:
Exploring Question-Guided Spatial Attention for Visual Question Answering,
ECCV16(VII: 451-466).
Springer DOI
1611
Visual Question Answering.
BibRef
Jabri, A.[Allan],
Joulin, A.[Armand],
van der Maaten, L.[Laurens],
Revisiting Visual Question Answering Baselines,
ECCV16(VIII: 727-739).
Springer DOI
1611
BibRef
Yang, Z.C.[Zi-Chao],
He, X.D.[Xiao-Dong],
Gao, J.F.[Jian-Feng],
Deng, L.[Li],
Smola, A.[Alex],
Stacked Attention Networks for Image Question Answering,
CVPR16(21-29)
IEEE DOI
1612
BibRef
Sadeghi, F.[Fereshteh],
Divvala, S.K.[Santosh K.],
Farhadi, A.[Ali],
VisKE: Visual knowledge extraction and question answering by visual
verification of relation phrases,
CVPR15(1456-1464)
IEEE DOI
1510
Visual verification of text relationships.
BibRef
Liu, Y.[Yang],
Liu, J.[Jie],
Wang, D.[Dong],
Cheng, J.[Jian],
A robust multivariate reranking algorithm for Question Answering
enrichment,
ICIP12(1917-1920).
IEEE DOI
1302
BibRef
Varekamp, C.[Chris],
van de Walle, P.[Patrick],
de Putter, M.[Marc],
Question interface for 3D picture creation on an autostereoscopic
digital picture frame,
3DTV09(1-4).
IEEE DOI
0905
BibRef
Chapter on Implementations and Applications, Databases, QBIC, Video Analysis, Hardware and Software, Inspection continues in
VQA, Visual Question Answering, Neural Networks .