Wu, J.X.[Jian-Xiong],
Chan, C.[Chorkin],
Recognition of phonetic labels of the TIMIT speech corpus by means of
an artificial neural network,
PR(24), No. 11, 1991, pp. 1085-1091.
Elsevier DOI
0401
BibRef
Wu, J.T.[Jian-Tong],
Tamura, S.[Shinichi],
Mitsumoto, H.[Hiroshi],
Kawai, H.[Hideo],
Kurosu, K.[Kenji],
Okazaki, K.[Kozo],
Neural network vowel-recognition jointly using voice features and mouth
shape image,
PR(24), No. 10, 1991, pp. 921-927.
Elsevier DOI
0401
BibRef
Lavagetto, F.,
Time-Delay Neural Networks for Estimating Lip Movements from
Speech Analysis:
A Useful Tool in Audio Video Synchronization,
CirSysVideo(7), No. 5, October 1997, pp. 786-800.
IEEE Top Reference.
9710
BibRef
Movellan, J.R.,
Mineiro, P.,
Robust Sensor Fusion:
Analysis and Application to Audio-Visual Speech Recognition,
MachLearn(32), No. 2, August 1998, pp. 85-100.
9810
BibRef
Wachsmuth, S.[Sven],
Socher, G.[Gudrun],
Brandt-Pook, H.[Hans],
Kummert, F.[Franz],
Sagerer, G.F.[Gerhard F.],
Integration of Vision and Speech Understanding Using Bayesian Networks,
Videre(1), No. 4, Winter 2000, pp. xx-yy.
0005
BibRef
Earlier: A1, A3, A2, A4, A5:
Multilevel Integration of Vision and Speech Understanding Using
Bayesian Networks,
CVS99(231 ff.).
Springer DOI
0209
BibRef
Chien, J.T.,
Lin, M.S.,
Frame-synchronous noise compensation for hands-free speech recognition
in car environments,
VISP(147), No. 6, December 2000, pp. 508-515.
0101
BibRef
Patel, D.,
Turner, L.F.,
Effects of ATM network impairments on audio-visual broadcast
applications,
VISP(147), No. 5, October 2000, pp. 436-444.
0101
BibRef
Aleksic, P.S.[Petar S.],
Williams, J.J.[Jay J.],
Wu, Z.L.[Zhi-Lin],
Katsaggelos, A.K.[Aggelos K.],
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features,
JASP(2002), No. 11, November 2002, pp. 1213.
WWW Link.
0304
BibRef
Earlier:
Audio-visual continuous speech recognition using MPEG-4 compliant
visual features,
ICIP02(I: 960-963).
IEEE DOI
0210
BibRef
Aleksic, P.S.[Petar S.],
Katsaggelos, A.K.[Aggelos K.],
Audio-Visual Biometrics,
PIEEE(94), No. 11, November 2006, pp. 2025-2044.
IEEE DOI
0611
BibRef
Aleksic, P.S.[Petar S.],
Katsaggelos, A.K.[Aggelos K.],
Speech-to-video synthesis using MPEG-4 compliant visual features,
CirSysVideo(14), No. 5, May 2004, pp. 682-692.
IEEE Abstract.
0407
BibRef
Earlier:
Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to
Audio-Visual Speech Recognition Performance,
ICIP05(III: 501-504).
IEEE DOI
0512
BibRef
Jiang, J.T.[Jin-Tao],
Alwan, A.[Abeer],
Keating, P.A.[Patricia A.],
Auer Jr., E.T.[Edward T.],
Bernstein, L.E.[Lynne E.],
On the Relationship between Face Movements, Tongue Movements, and
Speech Acoustics,
JASP(2002), No. 11, November 2002, pp. 1174.
WWW Link.
0304
BibRef
Sodoyer, D.[David],
Schwartz, J.L.[Jean-Luc],
Girin, L.[Laurent],
Klinkisch, J.[Jacob],
Jutten, C.[Christian],
Separation of Audio-Visual Speech Sources: A New Approach Exploiting
the Audio-Visual Coherence of Speech Stimuli,
JASP(2002), No. 11, November 2002, pp. 1165.
WWW Link.
0304
BibRef
Heckmann, M.[Martin],
Berthommier, F.[Frédéric],
Kroschel, K.[Kristian],
Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition,
JASP(2002), No. 11, November 2002, pp. 1260.
WWW Link.
0304
BibRef
Nefian, A.V.[Ara V.],
Liang, L.H.[Lu-Hong],
Pi, X.B.[Xiao-Bo],
Liu, X.X.[Xiao-Xing],
Murphy, K.P.[Kevin P.],
Dynamic Bayesian Networks for Audio-Visual Speech Recognition,
JASP(2002), No. 11, November 2002, pp. 1274.
WWW Link.
0304
BibRef
Nefian, A.V.[Ara V.],
Liang, L.H.[Lu Hong],
Fu, T.Y.[Tie-Yan],
Liu, X.X.[Xiao Xing],
A Bayesian Approach to Audio-Visual Speaker Identification,
AVBPA03(761-769).
Springer DOI
0310
BibRef
Patterson, E.K.[Eric K.],
Gurbuz, S.[Sabri],
Tufekci, Z.[Zekeriya],
Gowdy, J.N.[John N.],
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results
Using the CUAVE Multimodal Speech Corpus,
JASP(2002), No. 11, November 2002, pp. 1189.
WWW Link.
0304
BibRef
Gurbuz, S.[Sabri],
Patterson, E.K.[Eric K.],
Tufekci, Z.[Zekeriya],
Gowdy, J.N.[John N.],
Affine-Invariant Visual Features Contain Supplementary Information to
Enhance Speech Recognition,
AVBPA01(175).
Springer DOI
0310
BibRef
Kalberer, G.A.[Gregor A.],
Müller, P.[Pascal],
Van Gool, L.J.[Luc J.],
Visual speech, a trajectory in viseme space,
IJIST(13), No. 1, 2003, pp. 74-84.
DOI Link
0308
BibRef
Sharma, R.,
Yeasin, M.,
Krahnstoever, N.,
Rauschert, I.,
Cai, G.,
Brewer, I.,
MacEachren, A.M.,
Sengupta, K.,
Speech-gesture driven multimodal interfaces for crisis management,
PIEEE(91), No. 9, September 2003, pp. 1327-1354.
IEEE DOI
0309
BibRef
Potamianos, G.,
Neti, C.,
Gravier, G.,
Garg, A.,
Senior, A.W.,
Recent advances in the automatic recognition of audiovisual speech,
PIEEE(91), No. 9, September 2003, pp. 1306-1326.
IEEE DOI
0309
BibRef
Kaynak, M.N.,
Zhi, Q.,
Cheok, A.D.,
Sengupta, K.,
Jian, Z.,
Chung, K.C.,
Analysis of Lip Geometric Features for Audio-Visual Speech Recognition,
SMC-A(34), No. 4, July 2004, pp. 564-570.
IEEE Abstract.
0407
BibRef
Foo, S.W.[Say Wei],
Lian, Y.[Yong],
Dong, L.[Liang],
Recognition of visual speech elements using adaptively boosted hidden
Markov models,
CirSysVideo(14), No. 5, May 2004, pp. 693-705.
IEEE Abstract.
0407
BibRef
Albiol, A.[Alberto],
Torres, L.[Luis],
Delp, E.J.[Edward J.],
Fully automatic face recognition system using a combined audio-visual
approach,
VISP(152), No. 3, June 2005, pp. 318-326.
DOI Link
0510
BibRef
Earlier:
A Fast Anchor Person Searching Scheme in News Sequences,
AVBPA01(366).
Springer DOI
0310
BibRef
And:
An Unsupervised Color Image Segmentation Algorithm for Face Detection
Applications,
ICIP01(II: 681-684).
IEEE DOI
0108
BibRef
Earlier:
Optimum Color Spaces for Skin Detection,
ICIP01(I: 122-124).
IEEE DOI
0108
BibRef
Kleindienst, J.[Jan],
Macek, T.[Tomáš],
Serédi, L.[Ladislav],
Šedivý, J.[Jan],
Interaction framework for home environment using speech and vision,
IVC(25), No. 12, 3 December 2007, pp. 1836-1847.
Elsevier DOI
0710
BibRef
Earlier:
Djinn: Interaction Framework for Home Environment Using Speech and
Vision,
CVHCI04(153-164).
Springer DOI
0505
Multi-modal; Computer-vision; Context-aware; Speech recognition
BibRef
Palanivel, S.,
Yegnanarayana, B.,
Multimodal person authentication using speech, face and visual speech,
CVIU(109), No. 1, January 2008, pp. 44-55.
Elsevier DOI
0801
Multimodal person authentication; Face tracking; Eye location;
Visual speech; Multiscale morphological dilation and erosion;
Autoassociative neural network
BibRef
Chetty, G.[Girija],
Wagner, M.[Michael],
Robust face-voice based speaker identity verification using multilevel
fusion,
IVC(26), No. 9, 1 September 2008, pp. 1249-1260.
Elsevier DOI
0806
BibRef
Earlier:
Audio Visual Speaker Verification Based on Hybrid Fusion of Cross Modal
Features,
PReMI07(469-478).
Springer DOI
0712
BibRef
Earlier:
Face-Voice Authentication Based on 3D Face Models,
ACCV06(I:559-568).
Springer DOI
0601
Lip; 3D Face; Voice; Biometric; Identity verification; Robust;
Multilevel fusion
BibRef
Delakis, M.[Manolis],
Gravier, G.[Guillaume],
Gros, P.[Patrick],
Audiovisual integration with Segment Models for tennis video parsing,
CVIU(111), No. 2, August 2008, pp. 142-154.
Elsevier DOI
0808
Hidden Markov Models; Segment Models; Multimodal fusion;
Video indexing; Video summarization
BibRef
Gravier, G.[Guillaume],
Guinaudeau, C.[Camille],
Lecorvé, G.[Gwénolé],
Sébillot, P.[Pascale],
Exploiting Speech for Automatic TV Delinearization:
From Streams to Cross-Media Semantic Navigation,
JIVP(2011), No. 2011, pp. xx-yy.
DOI Link
1104
BibRef
Vajaria, H.[Himanshu],
Sankar, R.[Ravi],
Kasturi, R.[Ranga],
Exploring Co-Occurence Between Speech and Body Movement for
Audio-Guided Video Localization,
CirSysVideo(18), No. 11, November 2008, pp. 1608-1617.
IEEE DOI
0811
BibRef
Vajaria, H.[Himanshu],
Islam, T.[Tanmoy],
Sarkar, S.[Sudeep],
Sankar, R.[Ravi],
Kasturi, R.[Ranga],
Audio Segmentation and Speaker Localization in Meeting Videos,
ICPR06(II: 1150-1153).
IEEE DOI
0609
BibRef
Hospedales, T.M.[Timothy M.],
Vijayakumar, S.[Sethu],
Structure Inference for Bayesian Multisensory Scene Understanding,
PAMI(30), No. 12, December 2008, pp. 2140-2157.
IEEE DOI
0811
Audio-visual inputs. speakers in meetings.
BibRef
Liu, Z.C.[Zi-Cheng],
Cohen, M.,
Bhatnagar, D.,
Cutler, R.,
Zhang, Z.Y.[Zheng-You],
Head-Size Equalization for Improved Visual Perception in Video
Conferencing,
MultMed(9), No. 7, November 2007, pp. 1520-1527.
IEEE DOI
0905
BibRef
Liu, Z.C.[Zi-Cheng],
Cutler, R.[Ross],
Cohen, M.[Michael],
Zhang, Z.Y.[Zheng-You],
System and method for head size equalization in 360
degree panoramic images,
US_Patent7,184,609, Feb 27, 2007
WWW Link.
BibRef
0702
Cutler, R.[Ross],
User interface for a system and method for head size
equalization in 360 degree panoramic images,
US_Patent7,149,367, Dec 12, 2006
WWW Link.
BibRef
0612
Cutler, R.[Ross],
Kapoor, A.[Ashish],
System and method for audio/video speaker detection,
US_Patent7,343,289, Mar 11, 2008
WWW Link.
BibRef
0803
Heracleous, P.,
Aboutabit, N.,
Beautemps, D.,
Lip Shape and Hand Position Fusion for Automatic Vowel Recognition in
Cued Speech for French,
SPLetters(16), No. 5, May 2009, pp. 339-342.
IEEE DOI
0903
BibRef
Zhang, C.[Cha],
Yin, P.[Pei],
Rui, Y.[Yong],
Cutler, R.,
Viola, P.,
Sun, X.D.[Xin-Ding],
Pinto, N.,
Zhang, Z.Y.[Zheng-You],
Boosting-Based Multimodal Speaker Detection for Distributed Meeting
Videos,
MultMed(10), No. 8, December 2008, pp. 1541-1552.
IEEE DOI
0905
BibRef
Lee, J.S.[Jong-Seok],
Park, C.H.[Cheol Hoon],
Robust Audio-Visual Speech Recognition Based on Late Integration,
MultMed(10), No. 5, August 2008, pp. 767-779.
IEEE DOI
0905
BibRef
Saenko, K.[Kate],
Livescu, K.[Karen],
Glass, J.[James],
Darrell, T.J.[Trevor J.],
Multistream Articulatory Feature-Based Models for Visual Speech
Recognition,
PAMI(31), No. 9, September 2009, pp. 1700-1707.
IEEE DOI
0907
Lip opening, lip rounding features.
BibRef
Saenko, K.[Kate],
Livescu, K.[Karen],
Siracusa, M.[Michael],
Wilson, K.[Kevin],
Glass, J.[James],
Darrell, T.J.[Trevor J.],
Visual Speech Recognition with Loosely Synchronized Feature Streams,
ICCV05(II: 1424-1431).
IEEE DOI
0510
BibRef
Schuller, B.[Bjorn],
Muller, R.[Ronald],
Eyben, F.[Florian],
Gast, J.[Jurgen],
Hornler, B.[Benedikt],
Wollmer, M.[Martin],
Rigoll, G.[Gerhard],
Hothker, A.[Anja],
Konosu, H.[Hitoshi],
Being bored? Recognising natural interest by extensive audiovisual
integration for real-life application,
IVC(27), No. 12, November 2009, pp. 1760-1774.
Elsevier DOI
0910
Interest recognition; Affective computing; Audiovisual processing
BibRef
Eyben, F.[Florian],
Wollmer, M.[Martin],
Valstar, M.F.[Michel F.],
Gunes, H.[Hatice],
Schuller, B.[Bjorn],
Pantic, M.[Maja],
String-based audiovisual fusion of behavioural events for the
assessment of dimensional affect,
FG11(322-329).
IEEE DOI
1103
BibRef
Althoff, F.[Frank],
McGlaun, G.[Gregor],
Lang, M.K.[Manfred K.],
Rigoll, G.[Gerhard],
Evaluating Multimodal Interaction Patterns in Various Application
Scenarios,
GW03(421-435).
Springer DOI
0405
BibRef
Casanovas, A.L.[Anna Llagostera],
Monaci, G.[Gianluca],
Vandergheynst, P.[Pierre],
Gribonval, R.,
Blind Audiovisual Source Separation Based on Sparse Redundant
Representations,
MultMed(12), No. 5, 2010, pp. 358-371.
IEEE DOI
1008
BibRef
Earlier: A1, A2, A3, Only:
Blind Audiovisual Source Separation using Sparse Representations,
ICIP07(III: 301-304).
IEEE DOI
0709
BibRef
Esch, J.,
Audiovisual Information Fusion in Human-Computer Interfaces and
Intelligent Environments: A Survey,
PIEEE(98), No. 10, October 2010, pp. 1690-1691.
IEEE DOI
1003
Article intro.
BibRef
Shivappa, S.T.,
Trivedi, M.M.,
Rao, B.D.,
Audiovisual Information Fusion in Human-Computer Interfaces and
Intelligent Environments: A Survey,
PIEEE(98), No. 10, October 2010, pp. 1692-1715.
IEEE DOI
1003
Survey, Audio-Visual Fusion.
BibRef
Claussen, H.[Heiko],
Rosca, J.[Justinian],
Damper, R.I.[Robert I.],
Signature extraction using mutual interdependencies,
PR(44), No. 3, March 2011, pp. 650-661.
Elsevier DOI
1011
Algorithms; Signal processing; Pattern classification; Signal
analysis; Speaker recognition; Face recognition.
Mutual interdependence analysis for extracting face signatures or speech
signatures.
BibRef
Higgins, J.E.,
Damper, R.I.,
An HMM-Based Subband Processing Approach to Speaker Identification,
AVBPA01(169).
Springer DOI
0310
BibRef
El-Sallam, A.A.[Amar A.],
Mian, A.S.[Ajmal S.],
Correlation based speech-video synchronization,
PRL(32), No. 6, 15 April 2011, pp. 780-786.
Elsevier DOI
1103
BibRef
Earlier:
Speech-Video Synchronization Using Lips Movements and Speech Envelope
Correlation,
ICIAR09(397-407).
Springer DOI
0907
Correlation; Lip sync; Formants; Estimation; AM,FM
BibRef
Petridis, S.[Stavros],
Pantic, M.[Maja],
Audiovisual Discrimination Between Speech and Laughter:
Why and When Visual Information Might Help,
MultMed(13), No. 2, 2011, pp. 216-234.
IEEE DOI
1103
BibRef
Petridis, S.[Stavros],
Pantic, M.[Maja],
Prediction-Based Audiovisual Fusion for Classification of
Non-Linguistic Vocalisations,
AffCom(7), No. 1, January 2016, pp. 45-58.
IEEE DOI
1603
BibRef
Earlier:
Fusion of audio and visual cues for laughter detection,
CIVR08(329-338).
0807
Brain models
BibRef
Petridis, S.[Stavros],
Pantic, M.[Maja],
Cohn, J.F.[Jeffrey F.],
Prediction-based classification for audiovisual discrimination between
laughter and speech,
FG11(619-626).
IEEE DOI
1103
BibRef
Moustakas, K.[Konstantinos],
Tzovaras, D.[Dimitrios],
Dybkjaer, L.[Laila],
Bernsen, N.[Niels],
Aran, O.[Oya],
Using Modality Replacement to Facilitate Communication between Visually
and Hearing-Impaired People,
MultMedMag(18), No. 2, April-June 2011, pp. 26-37.
IEEE DOI
1105
BibRef
Tariquzzaman, M.,
Kim, J.Y.[Jin Young],
Na, S.Y.[Seung You],
Kim, H.G.[Hyoung-Gook],
Har, D.S.[Dong-Soo],
A Visual Signal Reliability for Robust Audio-Visual Speaker
Identification,
IEICE(E94-D), No. 10, October 2011, pp. 2052-2055.
WWW Link.
1110
BibRef
Lee, J.S.[Jong-Seok],
de Simone, F.[Francesca],
Ebrahimi, T.[Touradj],
Efficient video coding based on audio-visual focus of attention,
JVCIR(22), No. 8, November 2011, pp. 704-711.
Elsevier DOI
1110
Video coding; Audio-visual focus of attention; Quality of experience;
Audio-visual source localization; H.264/AVC; Flexible macroblock
ordering (FMO); Canonical correlation analysis; Subjective quality
assessment
BibRef
Tiawongsombat, P.,
Jeong, M.H.[Mun-Ho],
Yun, J.S.[Joo-Seop],
You, B.J.[Bum-Jae],
Oh, S.R.[Sang-Rok],
Robust visual speakingness detection using bi-level HMM,
PR(45), No. 2, February 2012, pp. 783-793.
Elsevier DOI
1110
Visual voice activity detection; Mouth image energy; Speakingness
detection; Bi-level HMM
BibRef
Noulas, A.[Athanasios],
Englebienne, G.[Gwenn],
Krose, B.J.A.[Ben J.A.],
Multimodal Speaker Diarization,
PAMI(34), No. 1, January 2012, pp. 79-93.
IEEE DOI
1112
Fuse audio and video. Meetings, news video.
BibRef
Blauth, D.A.[Dante A.],
Minotto, V.P.[Vicente P.],
Jung, C.R.[Claudio R.],
Lee, B.[Bowon],
Kalker, T.[Ton],
Voice activity detection and speaker localization using audiovisual
cues,
PRL(33), No. 4, March 2012, pp. 373-380.
Elsevier DOI
1201
User interfaces; Voice activity detection; Speaker localization;
Multimodal analysis; Hidden Markov Models
BibRef
Montazzolli, S.,
Jung, C.R.,
Gelb, D.[Dan],
Audiovisual voice activity detection using off-the-shelf cameras,
ICIP15(3886-3890)
IEEE DOI
1512
Lip Movement
BibRef
Minotto, V.P.[V. Peruffo],
Jung, C.R.[C. Rosito],
Lee, B.[Bowon],
Simultaneous-Speaker Voice Activity Detection and Localization Using
Mid-Fusion of SVM and HMMs,
MultMed(16), No. 4, June 2014, pp. 1032-1044.
IEEE DOI
1407
Accuracy
BibRef
Minotto, V.P.[V. Peruffo],
Jung, C.R.[C. Rosito],
Lee, B.[Bowon],
Multimodal Multi-Channel On-Line Speaker Diarization Using Sensor
Fusion Through SVM,
MultMed(17), No. 10, October 2015, pp. 1694-1705.
IEEE DOI
1511
audio streaming
BibRef
Nicolaou, M.A.[Mihalis A.],
Gunes, H.[Hatice],
Pantic, M.[Maja],
Output-associative RVM regression for dimensional and continuous
emotion prediction,
IVC(30), No. 3, March 2012, pp. 186-196.
Elsevier DOI
1204
BibRef
And:
FG11(16-23).
IEEE DOI
1103
BibRef
And:
Designing frameworks for automatic affect prediction and classification
in dimensional space,
Gesture11(20-26).
IEEE DOI
1106
Dimensional and continuous emotion prediction; Facial expressions;
Shoulder movements; Audio cues; Output-associative RVM regression
BibRef
Nicolaou, M.A.[Mihalis A.],
Gunes, H.[Hatice],
Pantic, M.[Maja],
Continuous Prediction of Spontaneous Affect from Multiple Cues and
Modalities in Valence-Arousal Space,
AffCom(2), No. 2, 2011, pp. 92-105.
IEEE DOI
1202
BibRef
Earlier:
Audio-Visual Classification and Fusion of Spontaneous Affective Data in
Likelihood Space,
ICPR10(3695-3699).
IEEE DOI
1008
BibRef
Nicolaou, M.A.[Mihalis A.],
Pavlovic, V.[Vladimir],
Pantic, M.[Maja],
Dynamic Probabilistic CCA for Analysis of Affective Behavior and
Fusion of Continuous Annotations,
PAMI(36), No. 7, July 2014, pp. 1299-1311.
IEEE DOI
1407
BibRef
Earlier:
Dynamic Probabilistic CCA for Analysis of Affective Behaviour,
ECCV12(VII: 98-111).
Springer DOI
1210
Bismuth
BibRef
Wang, L.J.[Li-Juan],
Qian, Y.[Yao],
Scott, M.R.,
Chen, G.[Gang],
Soong, F.K.,
Computer-Assisted Audiovisual Language Learning,
Computer(45), No. 6, June 2012, pp. 38-47.
IEEE DOI
1208
BibRef
Wu, Q.X.[Qiu-Xia],
Wang, Z.Y.[Zhi-Yong],
Deng, F.Q.[Fei-Qi],
Chi, Z.,
Feng, D.D.[David Dagan],
Realistic Human Action Recognition with
Multimodal Feature Selection and Fusion,
SMCS(43), No. 4, 2013, pp. 875-885.
IEEE DOI multimodal fusion; realistic human action recognition
1307
BibRef
Wu, Q.X.[Qiu-Xia],
Wang, Z.Y.[Zhi-Yong],
Deng, F.Q.[Fei-Qi],
Xia, Y.[Yong],
Kang, W.X.[Wen-Xiong],
Feng, D.D.[David Dagan],
Discriminative two-level feature selection for realistic human action
recognition,
JVCIR(24), No. 7, 2013, pp. 1064-1074.
Elsevier DOI
1309
Realistic human action recognition
BibRef
Wu, Q.X.[Qiu-Xia],
Wang, Z.Y.[Zhi-Yong],
Deng, F.Q.[Fei-Qi],
Feng, D.D.[David Dagan],
Realistic Human Action Recognition with Audio Context,
DICTA10(288-293).
IEEE DOI
1012
BibRef
Wu, Q.X.[Qiu-Xia],
Lu, S.Y.[Shi-Yang],
Wang, Z.Y.[Zhi-Yong],
Deng, F.Q.[Fei-Qi],
Kang, W.X.[Wen-Xiong],
Feng, D.D.[David Dagan],
Structure Context of Local Features in Realistic Human Action
Recognition,
VECTaR11(1496-1501).
IEEE DOI
1201
BibRef
Mirzaei, M.R.[Mohammad Reza],
Ghorshi, S.[Seyed],
Mortazavi, M.[Mohammad],
Audio-visual speech recognition techniques in augmented reality
environments,
VC(30), No. 3, March 2014, pp. 245-257.
WWW Link.
1403
BibRef
Bredin, H.[Hervé],
Roy, A.[Anindya],
Le, V.B.[Viet-Bac],
Barras, C.[Claude],
Person instance graphs for mono-, cross- and multi-modal person
recognition in multimedia data: application to speaker identification
in TV broadcast,
MultInfoRetr(3), No. 3, September 2014, pp. 161-175.
Springer DOI
1408
BibRef
Ozasa, Y.[Yuko],
Nakano, M.[Mikio],
Ariki, Y.[Yasuo],
Iwahashi, N.[Naoto],
Discriminating Unknown Objects from Known Objects Using Image and
Speech Information,
IEICE(E98-D), No. 3, March 2015, pp. 704-711.
WWW Link.
1504
BibRef
Earlier: A1, A3, A2, A4:
Disambiguation in Unknown Object Detection by Integrating Image and
Speech Recognition Confidences,
ACCV12(I:85-96).
Springer DOI
1304
BibRef
Nishimura, H.[Hitoshi],
Ozasa, Y.[Yuko],
Ariki, Y.[Yasuo],
Nakano, M.[Mikio],
Selection of Unknown Objects Specified by Speech Using Models
Constructed from Web Images,
ICPR14(477-482)
IEEE DOI
1412
BibRef
Earlier:
Object Recognition by Integrated Information Using Web Images,
ACPR13(657-661)
IEEE DOI
1408
Accuracy.
acoustic signal processing
BibRef
Ozasa, Y.[Yuko],
Enami, N.,
Ariki, Y.[Yasuo],
Color saliency for object identification,
FCV15(1-5)
IEEE DOI
1506
image colour analysis
BibRef
Harte, N.,
Gillen, E.,
TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech,
MultMed(17), No. 5, May 2015, pp. 603-615.
IEEE DOI
1505
Cameras
BibRef
Katsaggelos, A.K.,
Bahaadini, S.,
Molina, R.,
Audiovisual Fusion: Challenges and New Approaches,
PIEEE(103), No. 9, September 2015, pp. 1635-1653.
IEEE DOI
1509
Data integration
BibRef
Mezai, L.,
Hachouf, F.,
Score-Level Fusion of Face and Voice Using Particle Swarm
Optimization and Belief Functions,
HMS(45), No. 6, December 2015, pp. 761-772.
IEEE DOI
1512
Bayes methods
BibRef
Wu, P.,
Liu, H.,
Li, X.,
Fan, T.,
Zhang, X.,
A Novel Lip Descriptor for Audio-Visual Keyword Spotting Based on
Adaptive Decision Fusion,
MultMed(18), No. 3, March 2016, pp. 326-338.
IEEE DOI
1603
Acoustics
BibRef
Dilpazir, H.[Hammad],
Muhammad, Z.[Zia],
Minhas, Q.[Qurratulain],
Ahmed, F.[Faheem],
Malik, H.[Hafiz],
Mahmood, H.[Hasan],
Multivariate mutual information for audio video fusion,
SIViP(10), No. 7, October 2016, pp. 1265-1272.
Springer DOI
1609
BibRef
Beyan, C.,
Capozzi, F.,
Becchio, C.,
Murino, V.,
Prediction of the Leadership Style of an Emergent Leader Using Audio
and Visual Nonverbal Features,
MultMed(20), No. 2, February 2018, pp. 441-456.
IEEE DOI
1801
Correlation, Feature extraction, Kernel, Learning systems,
Organizations, Psychology, Visualization, Emergent leader,
social signal processing
BibRef
Fernandez-Lopez, A.[Adriana],
Sukno, F.M.[Federico M.],
Survey on automatic lip-reading in the era of deep learning,
IVC(78), 2018, pp. 53-72.
Elsevier DOI
1809
Survey, Lip Reading. Automatic lip-reading, Audio-visual corpora,
Visual speech decoding, Deep learning systems, Multi-view lip-reading
BibRef
Stafylakis, T.[Themos],
Khan, M.H.[Muhammad Haris],
Tzimiropoulos, G.[Georgios],
Pushing the boundaries of audiovisual word recognition using Residual
Networks and LSTMs,
CVIU(176-177), 2018, pp. 22-32.
Elsevier DOI
1812
BibRef
Earlier: A1, A3, Only:
Zero-Shot Keyword Spotting for Visual Speech Recognition In-the-wild,
ECCV18(II: 536-552).
Springer DOI
1810
Audiovisual speech recognition, Lipreading, Deep learning
BibRef
Liu, X.[Xin],
Geng, J.J.[Jia-Jia],
Ling, H.B.[Hai-Bin],
Cheung, Y.M.[Yiu-Ming],
Attention guided deep audio-face fusion for efficient speaker naming,
PR(88), 2019, pp. 557-568.
Elsevier DOI
1901
Speaker naming, Deep audio-face fusion, Common attention model,
Factorized bilinear model
BibRef
Tsiami, A.[Antigoni],
Koutras, P.[Petros],
Katsamanis, A.[Athanasios],
Vatakis, A.[Argiro],
Maragos, P.[Petros],
A behaviorally inspired fusion approach for computational audiovisual
saliency modeling,
SP:IC(76), 2019, pp. 186-200.
Elsevier DOI
1906
Audiovisual saliency, Attention, Fusion, Eye-tracking
BibRef
Hsiao, S.,
Sun, H.,
Hsieh, M.,
Tsai, M.,
Tsao, Y.,
Lee, C.,
Toward Automating Oral Presentation Scoring During Principal
Certification Program Using Audio-Video Low-Level Behavior Profiles,
AffCom(10), No. 4, October 2019, pp. 552-567.
IEEE DOI
1912
Signal processing, Public speaking, Signal processing algorithms,
Emotion recognition, Speech recognition,
educational research
BibRef
Ma, Y.[Yue],
Hong, H.[Hong],
Li, H.[Hui],
Zhao, H.[Heng],
Li, Y.S.[Yu-Sheng],
Sun, L.[Li],
Gu, C.[Chen],
Zhu, X.H.[Xiao-Hua],
Non-Contact Speech Recovery Technology Using a 24 GHz Portable
Auditory Radar and Webcam,
RS(12), No. 4, 2020, pp. xx-yy.
DOI Link
2003
BibRef
Xu, B.,
Wang, J.,
Lu, C.,
Guo, Y.,
Watch to Listen Clearly: Visual Speech Enhancement Driven
Multi-modality Speech Recognition,
WACV20(1626-1635)
IEEE DOI
2006
Visualization, Speech recognition, Feature extraction,
Speech enhancement, Noise measurement, Lips, Convolution
BibRef
Pu, J.,
Panagakis, Y.,
Pantic, M.,
Active Speaker Detection and Localization in Videos Using Low-Rank
and Kernelized Sparsity,
SPLetters(27), 2020, pp. 865-869.
IEEE DOI
2006
Sparse matrices, Kernel, Visualization, Matrix decomposition, Videos,
Correlation, Spectrogram, Active speaker localization,
kernels
BibRef
Tao, F.,
Busso, C.,
End-to-End Audiovisual Speech Recognition System With Multitask
Learning,
MultMed(23), 2021, pp. 1-11.
IEEE DOI
2012
Task analysis, Visualization, Feature extraction,
Speech processing, Acoustics, Robustness, Timing,
end-to-end speech systems
BibRef
Liu, L.,
Feng, G.,
Beautemps, D.,
Zhang, X.P.,
Re-Synchronization Using the Hand Preceding Model for Multi-Modal
Fusion in Automatic Continuous Cued Speech Recognition,
MultMed(23), 2021, pp. 292-305.
IEEE DOI
2012
Lips, Shape, Feature extraction, Hidden Markov models,
Speech recognition, Organizations, Encoding, Cued speech, MSHMM
BibRef
Beyan, C.[Cigdem],
Shahid, M.[Muhammad],
Murino, V.[Vittorio],
RealVAD: A Real-World Dataset and A Method for Voice Activity
Detection by Body Motion Analysis,
MultMed(23), 2021, pp. 2071-2085.
IEEE DOI
2107
Feature extraction, Visualization, Lips, Voice activity detection,
Task analysis, Benchmark testing, Synchronization,
unsupervised domain adaptation
BibRef
Qian, X.Y.[Xin-Yuan],
Liu, Q.[Qi],
Wang, J.D.[Jia-Dong],
Li, H.Z.[Hai-Zhou],
Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling
Factor Estimation,
SPLetters(28), 2021, pp. 1405-1409.
IEEE DOI
2108
Location awareness, Visualization,
Cameras, Microphone arrays, Estimation, Adaptive arrays,
dynamic sensor weighting
BibRef
Zheng, A.[Aihua],
Hu, M.[Menglan],
Jiang, B.[Bo],
Huang, Y.[Yan],
Yan, Y.[Yan],
Luo, B.[Bin],
Adversarial-Metric Learning for Audio-Visual Cross-Modal Matching,
MultMed(24), 2022, pp. 338-351.
IEEE DOI
2202
Visualization, Task analysis, Measurement, Speech recognition,
Videos, Location awareness, Image recognition, metric learning
BibRef
Xu, J.H.[Jia-Hao],
Zhang, B.[Boyan],
Wang, Z.Y.[Zhi-Yong],
Wang, Y.[Yang],
Chen, F.[Fang],
Gao, J.B.[Jun-Bin],
Feng, D.D.[David Dagan],
Affective Audio Annotation of Public Speeches with Convolutional
Clustering Neural Network,
AffCom(13), No. 1, January 2022, pp. 238-249.
IEEE DOI
2203
Annotations, Tagging, Task analysis, Deep learning, Neural networks,
Public speaking, Videos, Affective annotation, public speech,
clustering
BibRef
Afouras, T.[Triantafyllos],
Chung, J.S.[Joon Son],
Senior, A.[Andrew],
Vinyals, O.[Oriol],
Zisserman, A.[Andrew],
Deep Audio-Visual Speech Recognition,
PAMI(44), No. 12, December 2022, pp. 8717-8727.
IEEE DOI
2212
Hidden Markov models, Lips, Speech recognition, Visualization,
Videos, Feeds, Training, Lip reading,
deep learning
BibRef
Rahimi, A.[Akam],
Afouras, T.[Triantafyllos],
Zisserman, A.[Andrew],
Reading to Listen at the Cocktail Party:
Multi-Modal Speech Separation,
CVPR22(10483-10492)
IEEE DOI
2210
Visualization, Fuses, Lips, Computer architecture,
Speech enhancement, Transformers, Robustness, Vision + X, Vision + language
BibRef
Narain, J.[Jaya],
Johnson, K.T.[Kristina T.],
Quatieri, T.F.[Thomas F.],
Picard, R.W.[Rosalind W.],
Maes, P.[Pattie],
Modeling Real-World Affective and Communicative Nonverbal
Vocalizations From Minimally Speaking Individuals,
AffCom(13), No. 4, October 2022, pp. 2238-2253.
IEEE DOI
2212
Statistics, Sociology, Pediatrics, Autism, Data collection,
Mel frequency cepstral coefficient, Laboratories,
speech analysis
BibRef
Gong, Y.[Yuan],
Liu, A.H.[Alexander H.],
Rouditchenko, A.[Andrew],
Glass, J.[James],
UAVM: Towards Unifying Audio and Visual Models,
SPLetters(29), 2022, pp. 2437-2441.
IEEE DOI
2212
WWW Link. Visualization, Codes, Behavioral sciences, Audio-visual learning, unified model
BibRef
Oya, T.[Takashi],
Iwase, S.[Shohei],
Morishima, S.[Shigeo],
The Sound of Bounding-Boxes,
ICPR22(9-15)
IEEE DOI
2212
Sound source separation.
Visualization, Source separation, Annotations, Detectors,
Object recognition, Task analysis
BibRef
Zhou, J.X.[Jin-Xing],
Guo, D.[Dan],
Wang, M.[Meng],
Contrastive Positive Sample Propagation Along the Audio-Visual Event
Line,
PAMI(45), No. 6, June 2023, pp. 7239-7257.
IEEE DOI
2305
Visualization, Task analysis, Image segmentation, Synchronization,
Roads, Aggregates, Representation learning, Audio-visual event,
positive sample propagation
BibRef
Zhou, J.X.[Jin-Xing],
Zheng, L.[Liang],
Zhong, Y.[Yiran],
Hao, S.J.[Shi-Jie],
Wang, M.[Meng],
Positive Sample Propagation along the Audio-Visual Event Line,
CVPR21(8432-8440)
IEEE DOI
2111
Location awareness, Visualization, Correlation,
Filtering, Feature extraction, Pattern recognition
BibRef
Sen, T.K.[Taylan K.],
Naven, G.[Gazi],
Gerstner, L.[Luke],
Bagley, D.[Daryl],
Baten, R.A.[Raiyan Abdul],
Rahman, W.[Wasifur],
Hasan, M.K.[Md Kamrul],
Haut, K.[Kurtis],
Mamun, A.A.[Abdullah Al],
Samrose, S.[Samiha],
Solbu, A.[Anne],
Barnes, R.E.[R. Eric],
Frank, M.G.[Mark G.],
Hoque, E.[Ehsan],
DBATES: Dataset for Discerning Benefits of Audio, Textual, and Facial
Expression Features in Competitive Debate Speeches,
AffCom(14), No. 2, April 2023, pp. 1028-1043.
IEEE DOI
2306
Government, Feature extraction, Visualization, Irrigation,
Video recording, Cameras, Annotations
BibRef
Sharma, G.[Garima],
Dhall, A.[Abhinav],
Cai, J.F.[Jian-Fei],
Audio-Visual Automatic Group Affect Analysis,
AffCom(14), No. 2, April 2023, pp. 1056-1069.
IEEE DOI
2306
Videos, Face recognition, Emotion recognition, Affective computing,
Speech recognition, Feature extraction, Cameras,
affective computing
BibRef
Cheng, W.L.[Wen-Long],
Tang, W.[Wei],
Huang, Y.[Yan],
Luo, Y.W.[Yi-Wen],
Wang, L.[Liang],
A Reconstruction-Based Visual-Acoustic-Semantic Embedding Method for
Speech-Image Retrieval,
MultMed(25), 2023, pp. 4067-4080.
IEEE DOI
2310
BibRef
Kefalas, T.[Triantafyllos],
Fotiadou, E.[Eftychia],
Georgopoulos, M.[Markos],
Panagakis, Y.[Yannis],
Ma, P.C.[Ping-Chuan],
Petridis, S.[Stavros],
Stafylakis, T.[Themos],
Pantic, M.[Maja],
KAN-AV dataset for audio-visual face and speech analysis in the wild,
IVC(140), 2023, pp. 104839.
Elsevier DOI
2312
KAN-AV, Speaker verification, Kinship verification,
Age-invariant, Cross-modal matching, Audio-visual
BibRef
Wang, X.M.[Xing-Mei],
Mi, J.C.[Jia-Chen],
Li, B.Q.[Bo-Quan],
Zhao, Y.X.[Yi-Xu],
Meng, J.X.[Jia-Xiang],
CATNet: Cross-modal fusion for audio-visual speech recognition,
PRL(178), 2024, pp. 216-222.
Elsevier DOI
2402
Audio-visual speech recognition, Cross-modal fusion,
Attention mechanism, Deep learning
BibRef
Zhu, D.D.[Dan-Dan],
Zhang, K.W.[Kai-Wei],
Zhang, N.[Nana],
Zhou, Q.Q.[Qiang-Qiang],
Min, X.K.[Xiong-Kuo],
Zhai, G.T.[Guang-Tao],
Yang, X.K.[Xiao-Kang],
Unified Audio-Visual Saliency Model for Omnidirectional Videos With
Spatial Audio,
MultMed(26), 2024, pp. 764-775.
IEEE DOI
2402
Visualization, Predictive models, Videos, Feature extraction,
Spatial audio, Deep learning, Adaptation models,
omnidirectional videos
BibRef
Qian, X.Y.[Xin-Yuan],
Xue, W.[Wei],
Zhang, Q.[Qiquan],
Tao, R.J.[Rui-Jie],
Li, H.Z.[Hai-Zhou],
Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech,
MultMed(26), 2024, pp. 4480-4489.
IEEE DOI
2403
Task analysis, Visualization, Correlation, Bidirectional control,
Speech recognition, Recording, Feature extraction, reverberation
BibRef
Xie, J.W.[Jia-Wei],
Liu, Z.[Zhi],
Li, G.Y.[Gong-Yang],
Song, Y.J.[Ying-Jie],
Audio-visual saliency prediction with multisensory perception and
integration,
IVC(143), 2024, pp. 104955.
Elsevier DOI Code:
WWW Link.
2403
Audio-visual saliency prediction, Audio-visual fusion,
Image saliency prediction, Self-supervised learning
BibRef
Sun, X.[Xin],
Wang, X.[Xuan],
Liu, Q.[Qiong],
Zhou, X.[Xi],
Multi-Level Signal Fusion for Enhanced Weakly-Supervised Audio-Visual
Video Parsing,
SPLetters(31), 2024, pp. 1149-1153.
IEEE DOI
2405
Visualization, Proposals, Training, Task analysis,
Feature extraction, Noise, Self-supervised learning,
multi-level signal fusion
BibRef
Han, H.C.[Hao-Chen],
Zheng, Q.H.[Qing-Hua],
Luo, M.[Minnan],
Miao, K.[Kaiyao],
Tian, F.[Feng],
Chen, Y.[Yan],
Noise-Tolerant Learning for Audio-Visual Action Recognition,
MultMed(26), 2024, pp. 7761-7774.
IEEE DOI
2405
Noise measurement, Training, Visualization, Task analysis, Kinetic theory,
Correlation, Robustness, Action recognition, noisy correspondence
BibRef
Xiao, Y.W.[Ye-Wei],
Liu, X.M.[Xuan-Ming],
Zhu, A.[Aosu],
Huang, J.[Jian],
Relational-branchformer: Novel framework for audio-visual speech
recognition,
IVC(149), 2024, pp. 105182.
Elsevier DOI
2408
Audio-visual speech recognition, Branchformer, Relational, CTC,
Gated interlayer collaboration
BibRef
Li, W.R.[Wen-Rui],
Wang, P.[Penghong],
Xiong, R.Q.[Rui-Qin],
Fan, X.P.[Xiao-Peng],
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning,
IP(33), 2024, pp. 4840-4852.
IEEE DOI
2409
Transformers, Semantics, Visualization, Zero-shot learning, Neurons,
Training, Task analysis, Audio-visual zero-shot learning,
low-rank approximation
BibRef
Li, K.[Kai],
Xie, F.[Fenghua],
Chen, H.[Hang],
Yuan, K.[Kexin],
Hu, X.L.[Xiao-Lin],
An Audio-Visual Speech Separation Model Inspired by
Cortico-Thalamo-Cortical Circuits,
PAMI(46), No. 10, October 2024, pp. 6637-6651.
IEEE DOI
2409
Visualization, Thalamus, Feature extraction, Videos, Task analysis,
Speech processing, Time-domain analysis, Audio-visual learning,
speech separation
BibRef
Praveen, R.G.[R. Gnana],
Alam, J.[Jahangir],
Audio-Visual Person Verification Based on Recursive Fusion of Joint
Cross-Attention,
FG24(1-5)
IEEE DOI Code:
WWW Link.
2408
Training, Visualization, Codes, Gesture recognition, Faces
BibRef
Praveen, R.G.[R. Gnana],
Alam, J.[Jahangir],
Dynamic Cross Attention for Audio-Visual Person Verification,
FG24(1-5)
IEEE DOI Code:
WWW Link.
2408
Training, Visualization, Codes, Face recognition,
Gesture recognition, Robustness, Faces
BibRef
He, Y.H.[Yu-Hang],
Shin, S.[Sangyun],
Cherian, A.[Anoop],
Trigoni, N.[Niki],
Markham, A.[Andrew],
Sound3DVDet: 3D Sound Source Detection using Multiview Microphone
Array and RGB Images,
WACV24(5484-5495)
IEEE DOI Code:
WWW Link.
2404
Location awareness, Solid modeling, Predictive models,
Position measurement, Transformers, Motors, Algorithms
BibRef
Ghaleb, E.[Esam],
Burenko, I.[Ilya],
Rasenberg, M.[Marlou],
Pouw, W.[Wim],
Uhrig, P.[Peter],
Holler, J.[Judith],
Toni, I.[Ivan],
Özyürek, A.[Asli],
Fernández, R.[Raquel],
Co-Speech Gesture Detection through Multi-Phase Sequence Labeling,
WACV24(3995-4003)
IEEE DOI
2404
Focusing, Predictive models, Transformers,
Conditional random fields, Labeling, Proposals, Algorithms,
Datasets and evaluations
BibRef
Liu, J.X.[Jin-Xiang],
Wang, Y.[Yu],
Ju, C.[Chen],
Ma, C.F.[Chao-Fan],
Zhang, Y.[Ya],
Xie, W.[Weidi],
Annotation-free Audio-Visual Segmentation,
WACV24(5592-5602)
IEEE DOI Code:
WWW Link.
2404
Training, Adaptation models, Image segmentation, Visualization,
Computational modeling, Pipelines, Data models, Algorithms,
Image recognition and understanding
BibRef
Xu, Y.T.[Ya-Ting],
Hu, C.H.[Cong-Hui],
Lee, G.H.[Gim Hee],
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video
Parsing,
WACV24(5603-5612)
IEEE DOI
2404
Visualization, Interference, Transformers, Cameras,
Noise measurement, Microphones, Algorithms,
Video recognition and understanding
BibRef
Rachavarapu, K.K.[Kranthi Kumar],
Rajagopalan, A.N.,
Boosting Positive Segments for Weakly-Supervised Audio-Visual Video
Parsing,
ICCV23(10158-10168)
IEEE DOI
2401
BibRef
Chen, J.[Jinyu],
Wang, W.G.[Wen-Guan],
Liu, S.[Si],
Li, H.S.[Hong-Sheng],
Yang, Y.[Yi],
Omnidirectional Information Gathering for Knowledge Transfer-based
Audio-Visual Navigation,
ICCV23(10959-10969)
IEEE DOI
2401
BibRef
Cheng, X.[Xize],
Jin, T.[Tao],
Huang, R.J.[Rong-Jie],
Li, L.J.[Lin-Jun],
Lin, W.[Wang],
Wang, Z.[Zehan],
Wang, Y.[Ye],
Liu, H.[Huadai],
Yin, A.[Aoxiong],
Zhao, Z.[Zhou],
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream
Mixup for Visual Speech Translation and Recognition,
ICCV23(15689-15699)
IEEE DOI
2401
BibRef
Georgescu, M.I.[Mariana-Iuliana],
Fonseca, E.[Eduardo],
Ionescu, R.T.[Radu Tudor],
Lucic, M.[Mario],
Schmid, C.[Cordelia],
Arnab, A.[Anurag],
Audiovisual Masked Autoencoders,
ICCV23(16098-16108)
IEEE DOI
2401
BibRef
Chen, M.F.[Ming-Fei],
Su, K.[Kun],
Shlizerman, E.[Eli],
Be Everywhere - Hear Everything (BEE): Audio Scene Reconstruction by
Sparse Audio-Visual Samples,
ICCV23(7819-7828)
IEEE DOI
2401
BibRef
Xie, H.X.[Hong-Xia],
Lee, M.X.[Ming-Xian],
Chen, T.J.[Tzu-Jui],
Chen, H.J.[Hung-Jen],
Liu, H.I.[Hou-I],
Shuai, H.H.[Hong-Han],
Cheng, W.H.[Wen-Huang],
Most Important Person-guided Dual-branch Cross-Patch Attention for
Group Affect Recognition,
ICCV23(20541-20551)
IEEE DOI
2401
BibRef
Djilali, Y.A.D.[Yasser Abdelaziz Dahou],
Narayan, S.[Sanath],
Boussaid, H.[Haithem],
Almazrouei, E.[Ebtessam],
Debbah, M.[Merouane],
Lip2Vec: Efficient and Robust Visual Speech Recognition via
Latent-to-Latent Visual to Audio Representation Mapping,
ICCV23(13744-13755)
IEEE DOI
2401
BibRef
Chen, G.Y.[Guang-Yu],
Zhang, D.[Deyuan],
Liu, T.[Tao],
Du, X.Y.[Xiao-Yong],
Local-Global Contrast for Learning Voice-Face Representations,
ICIP23(51-55)
IEEE DOI
2312
BibRef
Hong, J.[Joanna],
Kim, M.[Minsu],
Choi, J.[Jeongsoo],
Ro, Y.M.[Yong Man],
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual
Corruption Modeling and Reliability Scoring,
CVPR23(18783-18794)
IEEE DOI
2309
BibRef
Gao, J.Y.[Jun-Yu],
Chen, M.Y.[Meng-Yuan],
Xu, C.S.[Chang-Sheng],
Collecting Cross-Modal Presence-Absence Evidence for
Weakly-Supervised Audio-Visual Event Perception,
CVPR23(18827-18836)
IEEE DOI
2309
BibRef
Porgali, B.[Bilal],
Albiero, V.[Vítor],
Ryda, J.[Jordan],
Ferrer, C.C.[Cristian Canton],
Hazirbas, C.[Caner],
The Casual Conversations v2 Dataset: A diverse, large benchmark for
measuring fairness and robustness in audio/vision/speech models,
FaDE-TCV23(10-17)
IEEE DOI
2309
BibRef
Xiong, J.W.[Jun-Wen],
Wang, G.[Ganglai],
Zhang, P.[Peng],
Huang, W.[Wei],
Zha, Y.F.[Yu-Fei],
Zhai, G.T.[Guang-Tao],
CASP-Net: Rethinking Video Saliency Prediction from an Audio-Visual
Consistency Perceptual Perspective,
CVPR23(6441-6450)
IEEE DOI
2309
BibRef
Huang, C.[Chao],
Tian, Y.[Yapeng],
Kumar, A.[Anurag],
Xu, C.L.[Chen-Liang],
Egocentric Audio-Visual Object Localization,
CVPR23(22910-22921)
IEEE DOI
2309
BibRef
Liao, J.H.[Jun-Hua],
Duan, H.H.[Hai-Han],
Feng, K.H.[Kang-Hui],
Zhao, W.B.[Wan-Bing],
Yang, Y.B.[Yan-Bing],
Chen, L.Y.[Liang-Yin],
A Light Weight Model for Active Speaker Detection,
CVPR23(22932-22941)
IEEE DOI
2309
BibRef
Seo, P.H.[Paul Hongsuck],
Nagrani, A.[Arsha],
Schmid, C.[Cordelia],
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
AV-ASR,
CVPR23(22922-22931)
IEEE DOI
2309
BibRef
Feng, D.[Dalu],
Yang, S.[Shuang],
Shan, S.G.[Shi-Guang],
Chen, X.L.[Xi-Lin],
Audio-Driven Deformation Flow for Effective Lip Reading,
ICPR22(274-280)
IEEE DOI
2212
Deformable models, Bridges, Visualization, Lips,
Computational modeling, Speech recognition, Acoustics
BibRef
Varshney, M.[Munender],
Yadav, R.[Ravindra],
Namboodiri, V.P.[Vinay P.],
Hegde, R.M.[Rajesh M.],
Learning Speaker-specific Lip-to-Speech Generation,
ICPR22(491-498)
IEEE DOI
2212
Measurement, Vocabulary, Visualization, Chemistry, Lips,
Speech recognition, Transformers
BibRef
Shi, C.[Cheng],
Yang, S.[Sibei],
Spatial and Visual Perspective-Taking via View Rotation and Relation
Reasoning for Embodied Reference Understanding,
ECCV22(XXXVI:201-218).
Springer DOI
2211
WWW Link. Locate object referred to by language and gesture.
BibRef
Hayes, T.[Thomas],
Zhang, S.Y.[Song-Yang],
Yin, X.[Xi],
Pang, G.[Guan],
Sheng, S.[Sasha],
Yang, H.[Harry],
Ge, S.W.[Song-Wei],
Hu, Q.Y.[Qi-Yuan],
Parikh, D.[Devi],
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and
GENeration,
ECCV22(VIII:431-449).
Springer DOI
2211
BibRef
van Horn, G.[Grant],
Qian, R.[Rui],
Wilber, K.[Kimberly],
Adam, H.[Hartwig],
Aodha, O.M.[Oisin Mac],
Belongie, S.[Serge],
Exploring Fine-Grained Audiovisual Categorization with the SSW60
Dataset,
ECCV22(VIII:271-289).
Springer DOI
2211
BibRef
Yu, S.[Samuel],
Wu, P.[Peter],
Liang, P.P.[Paul Pu],
Salakhutdinov, R.[Ruslan],
Morency, L.P.[Louis-Philippe],
PACS: A Dataset for Physical Audiovisual CommonSense Reasoning,
ECCV22(XXXVII:292-309).
Springer DOI
2211
BibRef
Cheng, H.Y.[Hao-Yue],
Liu, Z.Y.[Zhao-Yang],
Zhou, H.[Hang],
Qian, C.[Chen],
Wu, W.[Wayne],
Wang, L.M.[Li-Min],
Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video
Parsing,
ECCV22(XXXIV:431-448).
Springer DOI
2211
BibRef
Zhang, Z.Q.[Zi-Qiang],
Zhang, J.[Jie],
Zhang, J.S.[Jian-Shu],
Wu, M.H.[Ming-Hui],
Fang, X.[Xin],
Dai, L.R.[Li-Rong],
Learning Contextually Fused Audio-Visual Representations for
Audio-Visual Speech Recognition,
ICIP22(1346-1350)
IEEE DOI
2211
Representation learning, Training, Visualization,
Speech recognition, Self-supervised learning, Transformers,
audiovisual speech recognition
BibRef
Mo, S.T.[Shen-Tong],
Morgado, P.[Pedro],
Localizing Visual Sounds the Easy Way,
ECCV22(XXXVII:218-234).
Springer DOI
2211
BibRef
Montesinos, J.F.[Juan F.],
Kadandale, V.S.[Venkatesh S.],
Haro, G.[Gloria],
VoViT: Low Latency Graph-Based Audio-Visual Voice Separation
Transformer,
ECCV22(XXXVII:310-326).
Springer DOI
2211
BibRef
Tzinis, E.[Efthymios],
Wisdom, S.[Scott],
Remez, T.[Tal],
Hershey, J.R.[John R.],
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
Open-Domain On-Screen Sound Separation,
ECCV22(XXXVII:368-385).
Springer DOI
2211
BibRef
Zhou, J.X.[Jin-Xing],
Wang, J.Y.[Jian-Yuan],
Zhang, J.Y.[Jia-Yi],
Sun, W.X.[Wei-Xuan],
Zhang, J.[Jing],
Birchfield, S.[Stan],
Guo, D.[Dan],
Kong, L.P.[Ling-Peng],
Wang, M.[Meng],
Zhong, Y.[Yiran],
Audio-Visual Segmentation,
ECCV22(XXXVII:386-403).
Springer DOI
2211
BibRef
Alcázar, J.L.[Juan León],
Cordes, M.[Moritz],
Zhao, C.[Chen],
Ghanem, B.[Bernard],
End-to-End Active Speaker Detection,
ECCV22(XXXVII:126-143).
Springer DOI
2211
BibRef
Chen, C.G.[Chan-Gan],
Gao, R.H.[Ruo-Han],
Calamia, P.[Paul],
Grauman, K.[Kristen],
Visual Acoustic Matching,
CVPR22(18836-18846)
IEEE DOI
2210
Training, Geometry, Visualization, Computational modeling,
Transformers, Acoustics, Vision+X, Scene analysis and understanding
BibRef
Lee, S.[Sangmin],
Kim, H.I.[Hyung-Il],
Ro, Y.M.[Yong Man],
Weakly Paired Associative Learning for Sound and Image
Representations via Bimodal Associative Memory,
CVPR22(10524-10533)
IEEE DOI
2210
Representation learning, Learning systems, Associative memory,
Annotations, Switches, Image representation, Vision + X
BibRef
Vasudevan, A.B.[Arun Balajee],
Dai, D.X.[Deng-Xin],
Van Gool, L.J.[Luc J.],
Sound and Visual Representation Learning with Multiple Pretraining
Tasks,
CVPR22(14596-14606)
IEEE DOI
2210
Representation learning, Visualization, Semantics,
Image representation, Predictive models, Market research,
Representation learning
BibRef
Xia, Y.[Yan],
Zhao, Z.[Zhou],
Cross-modal Background Suppression for Audio-Visual Event
Localization,
CVPR22(19957-19966)
IEEE DOI
2210
Location awareness, Visualization, Codes, Logic gates,
Feature extraction, Robustness, Action and event recognition,
Vision + X
BibRef
Jiang, H.[Hao],
Murdock, C.[Calvin],
Ithapu, V.K.[Vamsi Krishna],
Egocentric Deep Multi-Channel Audio-Visual Active Speaker
Localization,
CVPR22(10534-10542)
IEEE DOI
2210
Location awareness, Voice activity detection, Visualization,
Machine vision, Lighting, Real-time systems, Microphone arrays,
Vision applications and systems
BibRef
Ng, E.[Evonne],
Joo, H.[Hanbyul],
Hu, L.W.[Li-Wen],
Li, H.[Hao],
Darrell, T.J.[Trevor J.],
Kanazawa, A.[Angjoo],
Ginosar, S.[Shiry],
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion,
CVPR22(20363-20373)
IEEE DOI
2210
Codes, Computational modeling, Oral communication, Transformers,
Data models, Face and gestures, Vision + graphics
BibRef
Mercea, O.B.[Otniel-Bogdan],
Hummel, T.[Thomas],
Koepke, A.S.[A. Sophia],
Akata, Z.[Zeynep],
Temporal and Cross-modal Attention for Audio-Visual Zero-Shot Learning,
ECCV22(XX:488-505).
Springer DOI
2211
BibRef
Mercea, O.B.[Otniel-Bogdan],
Riesch, L.[Lukas],
Koepke, A.S.[A. Sophia],
Akata, Z.[Zeynep],
Audiovisual Generalised Zero-shot Learning with Cross-modal Attention
and Language,
CVPR22(10543-10553)
IEEE DOI
2210
Training, Visualization, Codes, Computational modeling,
Training data, Focusing, Vision + X, Transfer/low-shot/long-tail learning
BibRef
Karas, V.[Vincent],
Tellamekala, M.K.[Mani Kumar],
Mallol-Ragolta, A.[Adria],
Valstar, M.[Michel],
Schuller, B.W.[Björn W.],
Time-Continuous Audiovisual Fusion with Recurrence vs Attention for
In-The-Wild Affect Recognition,
ABAW22(2381-2390)
IEEE DOI
2210
Training, Recurrent neural networks, Face recognition,
Computational modeling, Speech recognition, Network architecture, Data models
BibRef
Yang, K.[Karren],
Markovic, D.[Dejan],
Krenn, S.[Steven],
Agrawal, V.[Vasu],
Richard, A.[Alexander],
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech
Enhancement by Re-Synthesis,
CVPR22(8217-8227)
IEEE DOI
2210
Speech codecs, Vocabulary, Visualization, Codes, Acoustic distortion,
Telepresence, Vision + X, Vision + language
BibRef
Kim, M.[Minsu],
Hong, J.[Joanna],
Park, S.J.[Se Jin],
Ro, Y.M.[Yong Man],
Multi-modality Associative Bridging through Memory:
Speech Sound Recollected from Face Video,
ICCV21(296-306)
IEEE DOI
2203
Bridges, Visualization, Lips, Task analysis, Faces,
Vision + other modalities, Vision + language
BibRef
Li, J.[Jing],
Kang, D.[Di],
Pei, W.J.[Wen-Jie],
Zhe, X.F.[Xue-Fei],
Zhang, Y.[Ying],
He, Z.Y.[Zhen-Yu],
Bao, L.C.[Lin-Chao],
Audio2Gestures: Generating Diverse Gestures from Speech Audio with
Conditional Variational Autoencoders,
ICCV21(11273-11282)
IEEE DOI
2203
Training, Codes, Correlation, Speech coding, Bicycles,
Gestures and body pose, Action and behavior recognition,
Vision + other modalities
BibRef
Ye, M.[Muchao],
You, Q.Z.[Quan-Zeng],
Ma, F.L.[Feng-Long],
QUALIFIER: Question-Guided Self-Attentive Multimodal Fusion Network
for Audio Visual Scene-Aware Dialog,
WACV22(2503-2511)
IEEE DOI
2202
Measurement, Visualization, Semantics, Natural languages,
Network architecture, Feature extraction, Generators, Vision and Languages
BibRef
Yao, S.[Shunyu],
Min, X.K.[Xiong-Kuo],
Zhai, G.T.[Guang-Tao],
Deep Audio-Visual Fusion Neural Network for Saliency Estimation,
ICIP21(1604-1608)
IEEE DOI
2201
Visualization, Fuses, Neural networks, Estimation, Benchmark testing,
Feature extraction, Audio-viusal fusion, saliency, database
BibRef
Krishnamurthy, S.[Sudha],
Learning Self-supervised Audio-Visual Representations for Sound
Recommendations,
ISVC21(II:124-138).
Springer DOI
2112
BibRef
Shi, W.J.[Wen-Jing],
Pattichis, M.S.[Marios S.],
Celedón-Pattichis, S.[Sylvia],
LópezLeiva, C.[Carlos],
Talking Detection in Collaborative Learning Environments,
CAIP21(II:242-251).
Springer DOI
2112
BibRef
Wang, G.[Guotao],
Chen, C.Z.[Chengli-Zhao],
Fan, D.P.[Deng-Ping],
Hao, A.[Aimin],
Qin, H.[Hong],
From Semantic Categories to Fixations: A Novel Weakly-supervised
Visual-auditory Saliency Detection Approach,
CVPR21(15114-15123)
IEEE DOI
2111
Training, Deep learning, Codes, Semantics,
Pattern recognition, Saliency detection
BibRef
Wen, P.S.[Pei-Song],
Xu, Q.Q.[Qian-Qian],
Jiang, Y.B.[Yang-Bangyan],
Yang, Z.Y.[Zhi-Yong],
He, Y.[Yuan],
Huang, Q.M.[Qing-Ming],
Seeking the Shape of Sound:
An Adaptive Framework for Learning Voice-Face Association,
CVPR21(16342-16351)
IEEE DOI
2111
Matched filters, Art, Shape, Filtering,
Face recognition, Pattern matching
BibRef
Monfort, M.[Mathew],
Jin, S.[SouYoung],
Liu, A.[Alexander],
Harwath, D.[David],
Feris, R.S.[Rogerio S.],
Glass, J.[James],
Oliva, A.[Aude],
Spoken Moments: Learning Joint Audio-Visual Representations from
Video Descriptions,
CVPR21(14866-14876)
IEEE DOI
2111
Adaptation models, Video description, Computational modeling,
Semantics, Benchmark testing, Observers, Pattern recognition
BibRef
Tian, Y.P.[Ya-Peng],
Xu, C.L.[Chen-Liang],
Can audio-visual integration strengthen robustness under multimodal
attacks?,
CVPR21(5597-5607)
IEEE DOI
2111
Location awareness, Visualization, Systematics,
Codes, Computational modeling, Robustness
BibRef
Morgado, P.[Pedro],
Vasconcelos, N.M.[Nuno M.],
Misra, I.[Ishan],
Audio-Visual Instance Discrimination with Cross-Modal Agreement,
CVPR21(12470-12481)
IEEE DOI
2111
Visualization, Extraterrestrial measurements,
Pattern recognition, Task analysis
BibRef
Morgado, P.[Pedro],
Misra, I.[Ishan],
Vasconcelos, N.M.[Nuno M.],
Robust Audio-Visual Instance Discrimination,
CVPR21(12929-12940)
IEEE DOI
2111
Training, Learning systems, Transfer learning,
Pattern recognition, Task analysis, Standards
BibRef
Chen, Y.B.[Yan-Bei],
Xian, Y.Q.[Yong-Qin],
Koepke, A.S.[A. Sophia],
Shan, Y.[Ying],
Akata, Z.[Zeynep],
Distilling Audio-Visual Knowledge by Compositional Contrastive
Learning,
CVPR21(7012-7021)
IEEE DOI
2111
Codes, Computational modeling, Semantics,
Benchmark testing, Pattern recognition, Task analysis
BibRef
Zhang, Z.M.[Zhi-Meng],
Li, L.C.[Lin-Cheng],
Ding, Y.[Yu],
Fan, C.J.[Chang-Jie],
Flow-guided One-shot Talking Face Generation with a High-resolution
Audio-visual Dataset,
CVPR21(3660-3669)
IEEE DOI
2111
Visualization, Solid modeling,
Image resolution, Face recognition, Mouth, Transforms
BibRef
Gao, R.[Ruohan],
Grauman, K.[Kristen],
VisualVoice: Audio-Visual Speech Separation with Cross-Modal
Consistency,
CVPR21(15490-15500)
IEEE DOI
2111
Location awareness, Face recognition, Lips,
Computational modeling, Speech recognition, Speech enhancement
BibRef
Lee, J.Y.[Ji-Young],
Chung, S.W.[Soo-Whan],
Kim, S.[Sunok],
Kang, H.G.[Hong-Goo],
Sohn, K.H.[Kwang-Hoon],
Looking into Your Speech: Learning Cross-modal Affinity for
Audio-visual Speech Separation,
CVPR21(1336-1345)
IEEE DOI
2111
Visualization, Stability criteria, Speech recognition, Jitter,
Delays, Synchronization, Data mining
BibRef
Mazumder, P.[Pratik],
Sing, P.[Pravendra],
Parida, K.K.[Kranti Kumar],
Namboodiri, V.P.[Vinay P.],
AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by
Reconstructing Label Features from Multi-Modal Embeddings,
WACV21(3089-3098)
IEEE DOI
2106
Training, Semantics, Decoding, Task analysis, Testing
BibRef
Ishikawa, R.[Reina],
Hachiuma, R.[Ryo],
Kurobe, A.[Akiyoshi],
Saito, H.[Hideo],
Single-modal Incremental Terrain Clustering from Self-Supervised
Audio-Visual Feature Learning,
ICPR21(9399-9406)
IEEE DOI
2105
Vibrations, Training, Visualization, Robot vision systems,
Predictive models, Feature extraction, Cameras
BibRef
Madrigal, F.[Francisco],
Lerasle, F.[Frédéric],
Pibre, L.[Lionel],
Ferrané, I.[Isabelle],
Audio-Video detection of the active speaker in meetings,
ICPR21(2536-2543)
IEEE DOI
2105
Visualization, Human-robot interaction, Benchmark testing,
Feature extraction, Cognition, Pattern recognition, Proposals,
Feature fusion
BibRef
Tellamekala, M.K.[Mani Kumar],
Valstar, M.[Michel],
Pound, M.[Michael],
Giesbrecht, T.[Timo],
Audio-Visual Predictive Coding for Self-Supervised Visual
Representation Learning,
ICPR21(9912-9919)
IEEE DOI
2105
Visualization, Correlation, Semantics, Speech recognition,
Predictive coding, Streaming media, Predictive models
BibRef
Liu, H.[Hong],
Wang, Y.[Yawei],
Yang, B.[Bing],
Mutual Alignment between Audiovisual Features for End-to-End
Audiovisual Speech Recognition,
ICPR21(5348-5353)
IEEE DOI
2105
Visualization, Lips, Speech recognition, Systems modeling, Acoustics,
Noise measurement, Iterative methods, multimodal alignment,
mutual iterative attention
BibRef
Liu, H.[Hong],
Xu, W.L.[Wan-Lu],
Yang, B.[Bing],
Audio-Visual Speech Recognition Using A Two-Step Feature Fusion
Strategy,
ICPR21(1896-1903)
IEEE DOI
2105
Visualization, Lips, Speech recognition, Streaming media,
Feature extraction, speech recognition, feature fusion, non-local
BibRef
Liu, H.[Hong],
Li, W.H.[Wen-Hao],
Yang, B.[Bing],
Robust Audio-Visual Speech Recognition Based on Hybrid Fusion,
ICPR21(7580-7586)
IEEE DOI
2105
Visualization, Correlation, Collaboration, Speech recognition,
Logic gates, Reliability engineering, Robustness,
Hybrid Fusion
BibRef
Chao, F.Y.,
Ozcinar, C.,
Zhang, L.,
Hamidouche, W.,
Deforges, O.,
Smolic, A.,
Towards Audio-Visual Saliency Prediction for Omnidirectional Video
with Spatial Audio,
VCIP20(355-358)
IEEE DOI
2102
Visualization, Feature extraction,
Solid modeling, Predictive models,
virtual reality (VR)
BibRef
Zhou, H.[Hang],
Xu, X.D.[Xu-Dong],
Lin, D.[Dahua],
Wang, X.G.[Xiao-Gang],
Liu, Z.W.[Zi-Wei],
Sep-stereo: Visually Guided Stereophonic Audio Generation by
Associating Source Separation,
ECCV20(XII: 52-69).
Springer DOI
2010
BibRef
Tian, Y.P.[Ya-Peng],
Li, D.Z.[Ding-Zeyu],
Xu, C.L.[Chen-Liang],
Unified Multisensory Perception: Weakly-supervised Audio-visual Video
Parsing,
ECCV20(III:436-454).
Springer DOI
2012
BibRef
Salman, A.N.,
Busso, C.,
Dynamic versus Static Facial Expressions in the Presence of Speech,
FG20(436-443)
IEEE DOI
2102
BibRef
Earlier:
Salman, A.N.,
Busso, C.,
Style Extractor For Facial Expression Recognition in the Presence of
Speech,
ICIP20(1806-1810)
IEEE DOI
2011
Videos, Face recognition, Annotations, Emotion recognition,
Out of order, Training, Reliability, affective computing,
video emotion recognition.
Feature extraction, Speech recognition, Databases,
Phonetics, Agricultural machinery, Affective computing,
factor analysis
BibRef
Liu, Y.F.[Yu-Fan],
Qiao, M.L.[Ming-Lang],
Xu, M.[Mai],
Li, B.[Bing],
Hu, W.M.[Wei-Ming],
Borji, A.[Ali],
Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model,
ECCV20(XX:413-429).
Springer DOI
2011
BibRef
Yang, K.[Karren],
Russell, B.[Bryan],
Salamon, J.[Justin],
Telling Left From Right:
Learning Spatial Correspondence of Sight and Sound,
CVPR20(9929-9938)
IEEE DOI
2008
Visualization, Task analysis, Streaming media, Training, Semantics,
Spatial databases
BibRef
Gao, R.,
Oh, T.,
Grauman, K.,
Torresani, L.,
Listen to Look: Action Recognition by Previewing Audio,
CVPR20(10454-10464)
IEEE DOI
2008
Redundancy, Visualization, Buildings, Proposals, Image segmentation,
Image recognition, Spatiotemporal phenomena
BibRef
Zhang, X.,
Wu, X.,
Zhai, X.,
Ben, X.,
Tu, C.,
DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads,
CVPR20(12332-12341)
IEEE DOI
2008
Feature extraction, Image coding, Image restoration,
Video compression, Head, Image reconstruction, Standards
BibRef
Vaezi Joze, H.R.,
Shaban, A.,
Iuzzolino, M.L.,
Koishida, K.,
MMTM: Multimodal Transfer Module for CNN Fusion,
CVPR20(13286-13296)
IEEE DOI
2008
Gesture recognition, Speech enhancement,
Task analysis, Speech recognition,
Neural networks
BibRef
Alcázar, J.L.,
Caba, F.,
Mai, L.,
Perazzi, F.,
Lee, J.,
Arbeláez, P.,
Ghanem, B.,
Active Speakers in Context,
CVPR20(12462-12471)
IEEE DOI
2008
Context modeling, Task analysis, Face, Visualization,
Computational modeling, Computer architecture, Agriculture
BibRef
Huang, C.,
Koishida, K.,
Improved Active Speaker Detection based on Optical Flow,
MULWS20(4084-4090)
IEEE DOI
2008
Optical imaging, Visualization, Face, Nonlinear optics,
Adaptive optics, Optical filters, Lips
BibRef
Ma, X.J.[Xin-Jun],
Wu, C.C.[Chen-Chen],
Li, Y.Y.[Yuan-Yuan],
Zhong, Q.Y.[Qian-Yuan],
Speaker Identification System Based on Lip-Motion Feature,
CVS17(289-299).
Springer DOI
1711
BibRef
Xu, B.,
Lu, C.,
Guo, Y.,
Wang, J.,
Discriminative Multi-Modality Speech Recognition,
CVPR20(14421-14430)
IEEE DOI
2008
Visualization, Speech recognition, Lips, Noise measurement,
Convolution, Feature extraction, Noise reduction
BibRef
Wang, J.,
Fang, Z.,
Zhao, H.,
AlignNet: A Unifying Approach to Audio-Visual Alignment,
WACV20(3298-3306)
IEEE DOI
2006
Feature extraction, Synchronization, Visualization, Task analysis,
Training, Rhythm, Face
BibRef
Duan, B.[Bin],
Tang, H.[Hao],
Wang, W.[Wei],
Zong, Z.L.[Zi-Liang],
Yang, G.W.[Guo-Wei],
Yan, Y.[Yan],
Audio-Visual Event Localization via Recursive Fusion by Joint
Co-Attention,
WACV21(4012-4021)
IEEE DOI
2106
Location awareness, Visualization, Fuses,
Computer architecture, Task analysis
BibRef
Wu, Y.[Yu],
Zhu, L.C.[Lin-Chao],
Yan, Y.[Yan],
Yang, Y.[Yi],
Dual Attention Matching for Audio-Visual Event Localization,
ICCV19(6291-6299)
IEEE DOI
2004
feature extraction, image fusion,
video signal processing,
Video sequences
BibRef
Subedar, M.,
Krishnan, R.,
Meyer, P.L.,
Tickoo, O.,
Huang, J.,
Uncertainty-Aware Audiovisual Activity Recognition Using Deep
Bayesian Variational Inference,
ICCV19(6300-6309)
IEEE DOI
2004
audio-visual systems, Bayes methods, image recognition,
inference mechanisms, learning (artificial intelligence),
Neural networks
BibRef
Alamri, H.[Huda],
Cartillier, V.[Vincent],
Das, A.[Abhishek],
Wang, J.[Jue],
Cherian, A.[Anoop],
Essa, I.[Irfan],
Batra, D.[Dhruv],
Marks, T.K.[Tim K.],
Hori, C.[Chiori],
Anderson, P.[Peter],
Lee, S.[Stefan],
Parikh, D.[Devi],
Audio Visual Scene-Aware Dialog,
CVPR19(7550-7559).
IEEE DOI
2002
BibRef
Niu, Y.L.[Yu-Lei],
Zhang, H.W.[Han-Wang],
Zhang, M.L.[Man-Li],
Zhang, J.H.[Jian-Hong],
Lu, Z.W.[Zhi-Wu],
Wen, J.R.[Ji-Rong],
Recursive Visual Attention in Visual Dialog,
CVPR19(6672-6681).
IEEE DOI
2002
BibRef
Schwartz, I.[Idan],
Schwing, A.G.[Alexander G.],
Hazan, T.[Tamir],
A Simple Baseline for Audio-Visual Scene-Aware Dialog,
CVPR19(12540-12550).
IEEE DOI
2002
BibRef
Lu, Y.,
Lee, H.,
Tseng, H.,
Yang, M.,
Self-Supervised Audio Spatialization with Correspondence Classifier,
ICIP19(3347-3351)
IEEE DOI
1910
Audio-visual, Spatial audio, Self-supervised
BibRef
Saidi, I.,
Zhang, L.,
Barriac, V.,
Déforges, O.,
Laboratory and Crowdsourcing Studies of Lip Sync Effect on the
Audio-Video Quality Assessment for Videoconferencing Application,
ICIP19(3207-3211)
IEEE DOI
1910
Subjective test, crowdsourcing, quality assessment,
audio-video synchronization, videoconferencing
BibRef
Meng, D.,
Peng, X.,
Wang, K.,
Qiao, Y.,
Frame Attention Networks for Facial Expression Recognition in Videos,
ICIP19(3866-3870)
IEEE DOI
1910
facial expression recognition, audio-video emotion recognition,
frame attention networks, CNN, AFEW
BibRef
Shahid, M.[Muhammad],
Beyan, C.[Cigdem],
Murino, V.[Vittorio],
Comparisons of Visual Activity Primitives for Voice Activity Detection,
CIAP19(I:48-59).
Springer DOI
1909
BibRef
Kim, C.I.[Chang-Il],
Shin, H.J.V.[Hi-Jung Valentina],
Oh, T.H.[Tae-Hyun],
Kaspar, A.[Alexandre],
Elgharib, M.[Mohamed],
Matusik, W.[Wojciech],
On Learning Associations of Faces and Voices,
ACCV18(V:276-292).
Springer DOI
1906
BibRef
Schindler, A.[Alexander],
Boyer, M.[Martin],
Lindley, A.[Andrew],
Schreiber, D.[David],
Philipp, T.[Thomas],
Large Scale Audio-Visual Video Analytics Platform for Forensic
Investigations of Terroristic Attacks,
MMMod19(II:106-119).
Springer DOI
1901
BibRef
Oliveira, D.A.B.,
Mattos, A.B.,
da Silva Morais, E.,
Improving Viseme Recognition Using GAN-Based Frontal View Mapping,
AMFG18(2229-22297)
IEEE DOI
1812
Speech recognition, Face, Mouth,
Hidden Markov models, Visualization, Task analysis
BibRef
Yang, X.,
Molchanov, P.,
Kautz, J.,
Making Convolutional Networks Recurrent for Visual Sequence Learning,
CVPR18(6469-6478)
IEEE DOI
1812
Visualization, Logic gates, Recurrent neural networks,
Task analysis, Convolution, Speech recognition, Face recognition
BibRef
Zhang, J.,
Richmond, K.,
Fisher, R.B.,
Dual-modality Talking-metrics: 3D Visual-Audio Integrated
Behaviometric Cues from Speakers,
ICPR18(3144-3149)
IEEE DOI
1812
Face, Feature extraction, Visualization, Streaming media
BibRef
Chowdhury, A.,
Atoum, Y.,
Tran, L.,
Liu, X.,
Ross, A.,
MSU-AVIS dataset: Fusing Face and Voice Modalities for Biometric
Recognition in Indoor Surveillance Videos,
ICPR18(3567-3573)
IEEE DOI
1812
Face, Face recognition, Videos, Surveillance, Feature extraction,
Speaker recognition, Cameras
BibRef
Nagrani, A.,
Albanie, S.,
Zisserman, A.,
Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching,
CVPR18(8427-8436)
IEEE DOI
1812
Face, Task analysis, Streaming media, Face recognition, Testing,
Speech recognition, Lips
BibRef
Saitoh, T.,
Kubokawa, M.,
SSSD: Speech Scene database by Smart Device for Visual Speech
Recognition,
ICPR18(3228-3232)
IEEE DOI
1812
Databases, Smart devices, Feature extraction, Lips, Face, Cameras,
Speech recognition
BibRef
Owens, A.[Andrew],
Efros, A.A.[Alexei A.],
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features,
ECCV18(VI: 639-658).
Springer DOI
1810
BibRef
Berlin, A.A.,
Surati, R.,
Video Deconfounding: Hearing-Aid Inspired Video Enhancement,
IVMSP18(1-5)
IEEE DOI
1809
Additives, Windows, Cameras, Image color analysis,
Speech enhancement, Automobiles
BibRef
Ding, R.,
Pang, C.,
Liu, H.,
Audio-Visual Keyword Spotting Based on Multidimensional Convolutional
Neural Network,
ICIP18(4138-4142)
IEEE DOI
1809
Visualization, Kernel, Lips, Feature extraction, Databases, decision fusion
BibRef
Liao, J.,
Wang, S.,
Zhang, X.,
Liu, G.,
3D Convolutional Neural Networks Based Speaker Identification and
Authentication,
ICIP18(2042-2046)
IEEE DOI
1809
Lips, Authentication, Feature extraction, Visualization, Training,
Robustness,
Lip feature
BibRef
Savran, A.,
Tavarone, R.,
Higy, B.,
Badino, L.,
Bartolozzi, C.,
Energy and Computation Efficient Audio-Visual Voice Activity
Detection Driven by Event-Cameras,
FG18(333-340)
IEEE DOI
1806
Cameras, Cascading style sheets, Convolution, Kernel, Lips,
Visualization, Voice activity detection, audio visual,
voice activity detection
BibRef
Ephrat, A.,
Halperin, T.,
Peleg, S.,
Improved Speech Reconstruction from Silent Video,
CVAVM17(455-462)
IEEE DOI
1802
Face, Poles and towers, Predictive models, Spectrogram, Speech, Visualization
BibRef
Ahn, J.[Juhyun],
Kim, Y.J.[Yong-Joong],
Kim, D.J.[Dai-Jin],
Patch-based visual microphone for improving quality of sound,
ICPR16(3927-3932)
IEEE DOI
1705
Cameras, Microphones, Noise level, Signal to noise ratio, Speech,
Vibrations, Visualization
BibRef
Chung, J.S.[Joon Son],
Zisserman, A.[Andrew],
Out of Time: Automated Lip Sync in the Wild,
LipRead16(II: 251-263).
Springer DOI
1704
BibRef
Miao, C.L.[Chang-Long],
Feng, J.W.[Jian-Wei],
Ding, Y.[Yu],
Yang, Y.[Yu],
Chen, X.G.[Xiao-Gang],
Ji, X.Y.[Xiang-Yang],
Unsupervised person clustering in videos with cross-modal
communication,
VCIP16(1-4)
IEEE DOI
1701
Feature extraction. Audio-visual.
BibRef
Hu, D.[Di],
Li, X.L.[Xue-Long],
Lu, X.Q.[Xiao-Qiang],
Temporal Multimodal Learning in Audiovisual Speech Recognition,
CVPR16(3574-3582)
IEEE DOI
1612
BibRef
Liu, H.[Hong],
Fan, T.[Ting],
Wu, P.P.[Ping-Ping],
Audio-visual Keyword Spotting for Mandarin Based on Discriminative
Local Spatial-Temporal Descriptors,
ICPR14(785-790)
IEEE DOI
1412
Acoustics
BibRef
Ringeval, F.,
Sonderegger, A.,
Sauer, J.,
Lalanne, D.,
Introducing the RECOLA multimodal corpus of remote collaborative and
affective interactions,
FG13(1-8)
IEEE DOI
1309
natural languages.
Collaborative and affective interactions in French.
BibRef
Aubrey, A.J.[Andrew J.],
Cunningham, D.W.[Douglas W.],
Marshall, D.[David],
Rosin, P.L.[Paul L.],
Shin, A.[Ah_Young],
The Face Speaks:
Contextual and Temporal Sensitivity to Backchannel Responses,
FaceCVHum12(II:248-259).
Springer DOI
1304
BibRef
Tawari, A.[Ashish],
Trivedi, M.[Mohan],
Audio-visual data association for face expression analysis,
ICPR12(1120-1123).
WWW Link.
1302
BibRef
Taj, M.[Murtaza],
Cavallaro, A.[Andrea],
Interaction recognition in wide areas using audiovisual sensors,
ICIP12(1113-1116).
IEEE DOI
1302
BibRef
Giorgolo, G.[Gianluca],
Integration of Gesture and Verbal Language: A Formal Semantics Approach,
GW11(216-227).
Springer DOI
1211
BibRef
Le, Q.A.[Quoc Anh],
Pelachaud, C.[Catherine],
Generating Co-speech Gestures for the Humanoid Robot NAO through BML,
GW11(228-237).
Springer DOI
1211
BibRef
Navarathna, R.,
Dean, D.,
Sridharan, S.[Sridha],
Fookes, C.[Clinton],
Lucey, P.,
Visual Voice Activity Detection Using Frontal versus Profile Views,
DICTA11(134-139).
IEEE DOI
1205
BibRef
Komai, Y.[Yuto],
Ariki, Y.[Yasuo],
Takiguchi, T.[Tetsuya],
Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme
Analysis of Visual Feature,
PSIVT11(I: 97-108).
Springer DOI
1111
BibRef
Zheng, H.M.[Hao-Main],
Wang, M.[Meng],
Li, Z.[Zhu],
Audio-visual speaker identification with multi-view distance metric
learning,
ICIP10(4561-4564).
IEEE DOI
1009
BibRef
Krishnan, R.K.[Ravi-Kiran],
Sarkar, S.[Sudeep],
Similarity Measure between Two Gestures Using Triplets,
HAU3D13(506-513)
IEEE DOI
1309
BibRef
Krishnan, R.K.[Ravi-Kiran],
Sarkar, S.[Sudeep],
Detecting Group Turn Patterns in Conversations Using Audio-Video Change
Scale-Space,
ICPR10(137-140).
IEEE DOI
1008
BibRef
Aran, O.[Oya],
Gatica-Perez, D.[Daniel],
Fusing Audio-Visual Nonverbal Cues to Detect Dominant People in Group
Conversations,
ICPR10(3687-3690).
IEEE DOI
1008
BibRef
Niese, R.[Robert],
Al-Hamadi, A.[Ayoub],
Michaelis, B.[Bernd],
A New Multi-camera Based Facial Expression Analysis Concept,
ICIAR12(II: 64-71).
Springer DOI
1206
BibRef
Steer, M.A.[Michael Alan],
Al-Hamadi, A.[Ayoub],
Michaelis, B.[Bernd],
Audio-Visual Data Fusion Using a Particle Filter in the Application of
Face Recognition,
ICPR10(4392-4395).
IEEE DOI
1008
BibRef
Roy, A.[Anindya],
Marcel, S.[Sebastien],
Crossmodal Matching of Speakers Using Lip and Voice Features in
Temporally Non-overlapping Audio and Video Streams,
ICPR10(4504-4507).
IEEE DOI
1008
BibRef
Cour, T.[Timothee],
Sapp, B.[Benjamin],
Nagle, A.[Akash],
Taskar, B.[Ben],
Talking pictures:
Temporal grouping and dialog-supervised person recognition,
CVPR10(1014-1021).
IEEE DOI
1006
BibRef
Wu, G.Y.[Guan-Yong],
Zhu, J.[Jie],
Xu, H.H.[Hai-Hua],
A hybrid visual feature extraction method for audio-visual speech
recognition,
ICIP09(1829-1832).
IEEE DOI
0911
BibRef
Ceballos, A.[Alexánder],
Gómez, J.[Juan],
Prieto, F.[Flavio],
Redarce, T.[Tanneguy],
Robot Command Interface Using an Audio-Visual Speech Recognition System,
CIARP09(869-876).
Springer DOI
0911
BibRef
Cifani, S.[Simone],
Abel, A.[Andrew],
Hussain, A.[Amir],
Squartini, S.[Stefano],
Piazza, F.[Francesco],
An Investigation into Audiovisual Speech Correlation in Reverberant
Noisy Environments,
COST08(331-343).
Springer DOI
0810
BibRef
Fanelli, G.[Gabriele],
Gall, J.[Jürgen],
Van Gool, L.J.[Luc J.],
Hough transform-based mouth localization for audio-visual speech
recognition,
BMVC09(xx-yy).
PDF File.
0909
BibRef
Cadavid, S.[Steven],
Abdel-Mottaleb, M.[Mohamed],
Messinger, D.S.[Daniel S.],
Mahoor, M.H.[Mohammad H.],
Bahrick, L.E.[Lorraine E.],
Detecting local audio-visual synchrony in monologues utilizing vocal
pitch and facial landmark trajectories,
BMVC09(xx-yy).
PDF File.
0909
BibRef
Lee, J.S.[Jong-Seok],
Ebrahimi, T.[Touradj],
Two-Level Bimodal Association for Audio-Visual Speech Recognition,
ACIVS09(133-144).
Springer DOI
0909
BibRef
Marchegiani, M.L.[Maria Letizia],
Pirri, F.[Fiora],
Pizzoli, M.[Matia],
Multimodal Speaker Recognition in a Conversation Scenario,
CVS09(11-20).
Springer DOI
0910
BibRef
Kumar, K.[Kshitiz],
Navratil, J.[Jiri],
Marcheret, E.[Etienne],
Libal, V.[Vit],
Ramaswamy, G.[Ganesh],
Potamianos, G.[Gerasimos],
Audio-visual speech synchronization detection using a bimodal linear
prediction model,
Biometrics09(53-59).
IEEE DOI
0906
BibRef
Karam, W.[Walid],
Mokbel, C.[Chafic],
Greige, H.[Hanna],
Chollet, G.[Gérard],
Audio-Visual Identity Verification and Robustness to Imposture,
ICB09(796-805).
Springer DOI
0906
BibRef
Rebillat, M.[Marc],
Katz, B.F.G.[Brian F.G.],
Corteel, E.[Etienne],
SMART-I2: Spatial Multi-user Audio-visual Real-time interactive
interface, A broadcast application context,
3DTV09(1-4).
IEEE DOI
0905
BibRef
Eisenstein, J.[Jacob],
Gesture in Automatic Discourse Processing,
CSAIL-2008-027, May 2008.
BibRef
0805
Ph.D.Thesis, MIT, May 2008.
WWW Link.
BibRef
Das, A.[Amitava],
Manyam, O.K.[Ohil K.],
Tapaswi, M.[Makarand],
Audio-Visual Person Authentication with Multiple Visualized-Speech
Features and Multiple Face Profiles,
ICCVGIP08(39-46).
IEEE DOI
0812
BibRef
Cao, Y.[Yu],
Baang, S.[Sung],
Liu, S.H.[Shih-Hsi],
Li, M.[Ming],
Hu, S.Q.[San-Qing],
Audio-visual event classification via spatial-temporal-audio words,
ICPR08(1-5).
IEEE DOI
0812
BibRef
Terry, L.H.[Louis H.],
Shiell, D.J.[Derek J.],
Katsaggelos, A.K.[Aggelos K.],
Feature space video stream consistency estimation for dynamic stream
weighting in audio-visual speech recognition,
ICIP08(1316-1319).
IEEE DOI
0810
BibRef
Naseem, I.[Imran],
Mian, A.S.[Ajmal S.],
User Verification by Combining Speech and Face Biometrics in Video,
ISVC08(II: 482-492).
Springer DOI
0812
BibRef
Ettinger, E.[Evan],
Freund, Y.[Yoav],
Coordinate-free calibration of an acoustically driven camera pointing
system,
ICDSC08(1-9).
IEEE DOI
0809
BibRef
Hung, H.[Hayley],
Friedland, G.[Gerald],
Towards Audio-Visual On-line Diarization Of Participants In Group
Meetings,
M2SFA208(xx-yy).
0810
BibRef
Liu, Y.Y.[Yu-Yu],
Sato, Y.[Yoichi],
Finding Speaker Face Region by Audiovisual Correlation,
M2SFA208(xx-yy).
0810
BibRef
Kelly, D.[Damien],
Pitie, F.[Francois],
Kokaram, A.[Anil],
Boland, F.[Frank],
A Comparative Error Analysis of Audio-Visual Source Localization,
M2SFA208(xx-yy).
0810
BibRef
Pachoud, S.,
Gong, S.,
Cavallaro, A.,
Video Augmentation for Improving Audio Speech Recognition under Noise,
BMVC08(xx-yy).
PDF File.
0809
BibRef
Horii, Y.[Yu],
Kawashima, H.[Hiroaki],
Matsuyama, T.[Takashi],
Speaker detection using the timing structure of lip motion and sound,
CVPR4HB08(1-8).
IEEE DOI
0806
BibRef
Rúa, E.A.[Enrique Argones],
Castro, J.L.A.[José Luis Alba],
Mateo, C.G.[Carmen García],
Quality-Based Score Normalization for Audiovisual Person Authentication,
ICIAR08(xx-yy).
Springer DOI
0806
BibRef
Wang, L.[Lei],
Tjondrongoro, D.[Dian],
Liu, Y.[Yuee],
Clustering and Visualizing Audio-Visual Dataset on Mobile Devices in a
Topic-Oriented Manner,
Visual07(310-321).
Springer DOI
0706
BibRef
Zajdel, W.,
Krijnders, J.D.,
Andringa, T.,
Gavrila, D.M.,
CASSANDRA: audio-video sensor fusion for aggression detection,
AVSBS07(200-205).
IEEE DOI
0709
BibRef
Stødle, D.[Daniel],
Bjørndalen, J.M.[John Markus],
Anshus, O.J.[Otto J.],
A System for Hybrid Vision- and Sound-Based Interaction with Distal and
Proximal Targets on Wall-Sized, High-Resolution Tiled Displays,
CVHCI07(59-68).
Springer DOI
0710
BibRef
van Hengel, P.W.J.,
Andringa, T.C.,
Verbal aggression detection in complex social environments,
AVSBS07(15-20).
IEEE DOI
0709
BibRef
Ikeda, O.[Osamu],
Detection of a Speaker in Video by Combined Analysis of Speech Sound
and Mouth Movement,
ISVC07(II: 602-610).
Springer DOI
0711
BibRef
Das, A.[Amitava],
Audio Visual Person Authentication by Multiple Nearest Neighbor
Classifiers,
ICB07(1114-1123).
Springer DOI
0708
BibRef
Xin, L.[Le],
Tao, J.H.[Jian-Hua],
Tan, T.N.[Tie-Niu],
Dynamic Audio-Visual Mapping using Fused Hidden Markov Model Inversion
Method,
ICIP07(III: 293-296).
IEEE DOI
0709
BibRef
Barzelay, Z.[Zohar],
Schechner, Y.Y.[Yoav Y.],
Harmony in Motion,
CVPR07(1-8).
IEEE DOI
0706
Audio-visual analysis.
BibRef
O'Donovan, A.[Adam],
Duraiswami, R.[Ramani],
Neumann, J.[Jan],
Microphone Arrays as Generalized Cameras for Integrated Audio Visual
Processing,
CVPR07(1-8).
IEEE DOI
0706
BibRef
Abbas, J.[Jehanzeb],
Dagli, C.K.[Charlie K.],
Huang, T.S.[Thomas S.],
A Multimodality Framework for Creating Speaker/Non-Speaker Profile
Databases for Real-World Video,
SLAM07(1-8).
IEEE DOI
0706
BibRef
Kushal, A.[Akash],
Rahurkar, M.[Mandar],
Fei-Fei, L.[Li],
Ponce, J.[Jean],
Huang, T.[Thomas],
Audio-Visual Speaker Localization Using Graphical Models,
ICPR06(I: 291-294).
IEEE DOI
0609
BibRef
Tsuji, T.[Tokuo],
Yamamoto, K.[Kenkichi],
Ishii, I.[Idaku],
Real-time Sound Source Localization Based on Audiovisual Frequency
Integration,
ICPR06(IV: 322-325).
IEEE DOI
0609
BibRef
Monaci, G.[Gianluca],
Vandergheynst, P.[Pierre],
Audiovisual Gestalts,
PercOrg06(200).
IEEE DOI
0609
BibRef
Zhu, Z.G.[Zhi-Gang],
Li, W.H.[Wei-Hong],
Molina, E.[Edgardo],
Wolberg, G.[George],
LDV Sensing and Processing for Remote Hearing in a Multimodal
Surveillance System,
MSCSAS07(1-2).
IEEE DOI
0706
BibRef
Zhu, Z.G.[Zhi-Gang],
Li, W.H.[Wei-Hong],
Wolberg, G.,
Integrating LDV Audio and IR Video for Remote Multimodal Surveillance,
OTCBVS05(III: 10-10).
IEEE DOI
0507
BibRef
Wu, Z.Y.[Zhi-Yong],
Cai, L.H.[Lian-Hong],
Meng, H.[Helen],
Multi-level Fusion of Audio and Visual Features for Speaker
Identification,
ICB06(493-499).
Springer DOI
0601
BibRef
Yang, P.[Pu],
Yang, Y.C.[Ying-Chun],
Wu, Z.H.[Zhao-Hui],
Exploiting Glottal Information in Speaker Recognition Using Parallel
GMMs,
AVBPA05(804).
Springer DOI
0509
BibRef
Lei, Z.C.[Zhen-Chun],
Combining the Likelihood and the Kullback-Leibler Distance in
Estimating the Universal Background Model for Speaker Verification
Using SVM,
ICPR10(4553-4556).
IEEE DOI
1008
BibRef
Lei, Z.C.[Zhen-Chun],
Yang, Y.C.[Ying-Chun],
Wu, Z.H.[Zhao-Hui],
An UBM-Based Reference Space for Speaker Recognition,
ICPR06(IV: 318-321).
IEEE DOI
0609
BibRef
Earlier:
Constructing the Discriminative Kernels Using GMM for Text-Independent
Speaker Identification,
IWBRS05(165).
Springer DOI
0601
BibRef
And:
Speaker Identification Using the VQ-Based Discriminative Kernels,
AVBPA05(797).
Springer DOI
0509
BibRef
Li, D.D.[Dong-Dong],
Yang, Y.C.[Ying-Chun],
Wu, Z.H.[Zhao-Hui],
Dynamic Bayesian Networks for Audio-Visual Speaker Recognition,
ICB06(539-545).
Springer DOI
0601
BibRef
Fox, N.A.[Niall A.],
O'Mullane, B.A.[Brian A.],
Reilly, R.B.[Richard B.],
VALID:
A New Practical Audio-Visual Database, and Comparative Results,
AVBPA05(777).
Springer DOI
WWW Link.
0509
Dataset, Faces.
BibRef
Sharma, P.[Prag],
Reilly, R.B.[Richard B.],
The UCD Colour Face Image Database for Face Detection,
Online1998.
WWW Link.
Dataset, Faces.
BibRef
9800
Fox, N.A.[Niall A.],
O'Mullane, B.A.[Brian A.],
Reilly, R.B.[Richard B.],
Audio-Visual Speaker Identification via Adaptive Fusion Using
Reliability Estimates of Both Modalities,
AVBPA05(787).
Springer DOI
0509
BibRef
Zhang, D.,
Ghobakhlou, A.,
Kasabov, N.,
An adaptive model of person identification combining speech and image
information,
ICARCV04(I: 413-418).
IEEE DOI
0412
BibRef
Kratt, J.[Jan],
Metze, F.[Florian],
Stiefelhagen, R.[Rainer],
Waibel, A.[Alex],
Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech
Recognition Toolkit,
DAGM04(488-495).
Springer DOI
0505
BibRef
Hanafiah, Z.M.,
Yamazaki, C.,
Nakamura, A.,
Kuno, Y.,
Understanding inexplicit utterances using vision for helper robots,
ICPR04(IV: 925-928).
IEEE DOI
0409
BibRef
Hermann, T.[Thomas],
Henning, T.[Thomas],
Ritter, H.[Helge],
Gesture Desk an Integrated Multi-modal Gestural Workplace
for Sonification,
GW03(369-379).
Springer DOI
0405
BibRef
Merola, G.[Giorgio],
The Effects of the Gesture Viewpoint on the Students' Memory of Words
and Stories,
GW07(272-281).
Springer DOI
0705
BibRef
Merola, G.[Giorgio],
Poggi, I.[Isabella],
Multimodality and Gestures in the Teacher's Communication,
GW03(101-111).
Springer DOI
0405
BibRef
Kranstedt, A.[Alfred],
Kühnlein, P.[Peter],
Wachsmuth, I.[Ipke],
Deixis in Multimodal Human Computer Interaction:
An Interdisciplinary Approach,
GW03(112-123).
Springer DOI
0405
BibRef
Saeed, K.[Khalid],
Kozlowski, M.[Marcin],
An Image-Based System for Spoken-Letter Recognition,
CAIP03(494-502).
Springer DOI
0311
BibRef
Ho, P.[Purdy],
Armington, J.[John],
A Dual-Factor Authentication System Featuring Speaker Verification and
Token Technology,
AVBPA03(128-136).
Springer DOI
0310
BibRef
Fox, N.A.[Niall A.],
Reilly, R.B.[Richard B.],
Audio-Visual Speaker Identification Based on the Use of Dynamic Audio
and Visual Features,
AVBPA03(743-751).
Springer DOI
0310
BibRef
Czyz, J.[Jacek],
Bengio, S.[Samy],
Marcel, C.[Christine],
Vandendorpe, L.[Luc],
Scalability Analysis of Audio-Visual Person Identity Verification,
AVBPA03(752-760).
Springer DOI
0310
BibRef
Bengio, S.[Samy],
Multimodal Authentication Using Asynchronous HMMs,
AVBPA03(770-777).
Springer DOI
0310
BibRef
Lucey, S.[Simon],
Chen, T.H.[Tsu-Han],
Improved Audio-Visual Speaker Recognition via the Use of a Hybrid
Combination Strategy,
AVBPA03(929-936).
Springer DOI
0310
BibRef
Krahnstoever, N.,
Schapira, E.,
Kettebeko, S.,
Sharma, R.,
Multimodal human-computer interaction for crisis management systems,
WACV02(203-207).
IEEE DOI
0303
BibRef
Kettebekov, S.,
Yeasin, M.,
Sharma, R.,
Improving continuous gesture recognition with spoken prosody,
CVPR03(I: 565-570).
IEEE DOI
0307
BibRef
Poh, N.[Norman],
Korczak, J.[Jerzy],
Hybrid Biometric Person Authentication Using Face and Voice Features,
AVBPA01(348).
Springer DOI
0310
BibRef
Nakamura, S.[Satoshi],
Fusion of Audio-Visual Information for Integrated Speech Processing,
AVBPA01(127).
Springer DOI
0310
BibRef
Sullivan, K.P.H.[Kirk P.H.],
Pelecanos, J.[Jason],
Revisiting Carl Bildt's Impostor: Would a Speaker Verification System
Foil Him?,
AVBPA01(144).
Springer DOI
0310
BibRef
Geiger, G.[Gadi],
Ezzat, T.[Tony],
Poggio, T.[Tomaso],
Perceptual Evaluation of Video-Realistic Speech,
MIT AIMAIM-2003-003, February 28, 2003.
WWW Link. We describe here our scheme and its application to a new video-realistic (potentially
indistinguishable from real recorded video) visual-speech animation system, called Mary 101.
0306
BibRef
Zhang, X.Z.[Xiao-Zheng],
Merserratt, R.M.,
Clements, M.,
Bimodal fusion in audio-visual speech recognition,
ICIP02(I: 964-967).
IEEE DOI
0210
BibRef
Graf, H.P.,
Cosatto, E.,
Strom, V.,
Huang, F.J.[Fu Jie],
Visual prosody: facial movements accompanying speech,
AFGR02(381-386).
IEEE DOI
0206
BibRef
Qi, Y.[Yuan],
Learning Algorithms for Audio and Video Processing:
Independent Component Analysis and Support Vector Machine Based Approaches,
UMD--TR4174, August 2000.
WWW Link.
BibRef
0008
Nankaku, Y.,
Tokuda, K.[Keiichi],
Kitamura, T.[Tadashi],
Normalized Training for HMM-based Visual Speech Recognition,
ICIP00(Vol III: 234-237).
IEEE DOI
0008
BibRef
Zhang, Y.[You],
Levinson, S.[Stephen],
Huang, T.S.[Thomas S.],
Speaker Independent Audio-Visual Speech Recognition,
ICME00(TP8).
0007
BibRef
Pan, H.[Hao],
Huang, T.S.[Thomas S.],
A New Approach to Integrate Audio and Visual Features of Speech,
ICME00(TP8).
0007
BibRef
Potamianos, G.[Gerasimos],
Verma, A.[Ashish],
Neti, C.[Chalapathy],
Iyengar, G.[Giri],
Basu, S.[Sankar],
A Cascade Image Transform for Speaker Independent Automatic Speech
Reading,
ICME00(TP8).
0007
BibRef
Pan, H.,
Liang, Z.P.,
Huang, T.S.,
Fusing Audio and Visual Features of Speech,
ICIP00(Vol III: 214-217).
IEEE DOI
0008
BibRef
Faruquie, T.A.,
Majumdar, A.,
Rajput, N.,
Subramaniam, L.V.,
Large Vocabulary Audio-visual Speech Recognition Using Active Shape
Models,
ICPR00(Vol III: 106-109).
IEEE DOI
0009
BibRef
Yu, K.,
Jiang, X.,
Bunke, H.,
Combining Acoustic and Visual Classifiers for the Recognition of Spoken
Sentences,
ICPR00(Vol II: 491-494).
IEEE DOI
0009
BibRef
Nam, J.,
Alghoniemy, M.,
Tewfik, A.H.[Ahmed H.],
Audio-visual content-based violent scene characterization,
ICIP98(I: 353-357).
IEEE DOI
9810
BibRef
Luettin, J.[Juergen],
Dupont, S.[Stéphane],
Continuous Audio-Visual Speech Recognition,
ECCV98(II: 657).
Springer DOI
BibRef
9800
Yang, J.[Jie],
Xiao, J.[Jing],
Ritter, M.[Max],
Automatic Selection of Visemes for Image-based Visual Speech Synthesis,
ICME00(TP8).
0007
BibRef
Sharma, R.[Rajeev],
Cai, J.Y.[Jiong-Yu],
Chakravarthy, S.[Srivatsan],
Poddar, I.[Indrajit],
Sethi, Y.[Yogesh],
Exploiting Speech/Gesture Co-occurrence for Improving Continuous
Gesture Recognition in Weather Narration,
AFGR00(422-427).
IEEE DOI
0003
BibRef
Yamamoto, E.,
Nakamura, S.,
Shikano, K.,
Lip Movement Synthesis from Speech Based on Hidden Markov Models,
AFGR98(154-159).
IEEE DOI
BibRef
9800
Roy, D.,
Pentland, A.P.,
Automatic spoken affect classification and analysis,
AFGR96(363-367).
IEEE DOI
9610
BibRef
Petajan, E.D.[Eric D.],
An Architecture for Automatic Lipreading to Enhance Speech Recognition,
CVPR85(40-47).
(AT&T Bell Labs)
Application, Lipreading. A real hardware implementation of a system that tracks the nostrils
and mouth. Improvement over use of acoustic data alone.
BibRef
8500
Chapter on Face Recognition, Detection, Tracking, Gesture Recognition, Fingerprints, Biometrics continues in
Combined Audio Visual Speaker Tracking .