CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement

Gogate, Mandar; Dashtipour, Kia; Adeel, Ahsan; Hussain, Amir

doi:10.1016/j.inffus.2020.04.001

Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/31059

Full metadata record

DC Field	Value	Language
dc.contributor.author	Gogate, Mandar	en_UK
dc.contributor.author	Dashtipour, Kia	en_UK
dc.contributor.author	Adeel, Ahsan	en_UK
dc.contributor.author	Hussain, Amir	en_UK
dc.date.accessioned	2020-04-28T00:03:02Z	-
dc.date.available	2020-04-28T00:03:02Z	-
dc.date.issued	2020-11	en_UK
dc.identifier.uri	http://hdl.handle.net/1893/31059	-
dc.description.abstract	Noisy situations cause huge problems for suffers of hearing loss as hearing aids often make speech more audible but do not always restore the intelligibility. In noisy settings, humans routinely exploit the audio-visual (AV) nature of speech to selectively suppress the background noise and focus on the target speaker. In this paper, we present a language, noise and speaker independent AV deep neural network (DNN) architecture for causal or real-time speech enhancement (SE). The model jointly exploits the noisy acoustic cues and noise robust visual cues to focus on the desired speaker and improve speech intelligibility. The proposed SE framework is evaluated using a first of its kind AV binaural speech corpus, called ASPIRE, recorded in real noisy environments including cafeteria and restaurant. We demonstrate superior performance of our approach in terms of objective measures and subjective listening tests over the state-of-the-art SE approaches as well as recent DNN based SE models. In addition, our work challenges a popular belief that, scarcity of multi-language large vocabulary AV corpus and a wide variety of noises is a major bottleneck to build a robust language, speaker and noise independent SE systems. We show that a model trained on synthetic mixture of Grid corpus (with 33 speakers and a small English vocabulary) and ChiME 3 Noises (consisting of bus, pedestrian, cafeteria, and street noises) generalise well not only on large vocabulary corpora, wide variety of speakers/noises but also on completely unrelated language (such as Mandarin).	en_UK
dc.language.iso	en	en_UK
dc.publisher	Elsevier BV	en_UK
dc.relation	Gogate M, Dashtipour K, Adeel A & Hussain A (2020) CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement. Information Fusion, 63, pp. 273-285. https://doi.org/10.1016/j.inffus.2020.04.001	en_UK
dc.rights	This item has been embargoed for a period. During the embargo please use the Request a Copy feature at the foot of the Repository record to request a copy directly from the author. You can only request a copy if you wish to use this work for your own research or private study. Accepted refereed manuscript of: Gogate M, Dashtipour K, Adeel A & Hussain A (2020) CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement. Information Fusion, 63, pp. 273-285. https://doi.org/10.1016/j.inffus.2020.04.001 © 2020, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/	en_UK
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	en_UK
dc.subject	Audio-Visual	en_UK
dc.subject	Speech Enhancement	en_UK
dc.subject	Speech SeparationDeep Learning	en_UK
dc.subject	Real Noisy Audio-Visual Corpus	en_UK
dc.subject	Speaker Independent	en_UK
dc.subject	Causal	en_UK
dc.title	CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement	en_UK
dc.type	Journal Article	en_UK
dc.rights.embargodate	2021-10-22	en_UK
dc.rights.embargoreason	[CochleaNet_2020.pdf] Publisher requires embargo of 18 months after formal publication.	en_UK
dc.identifier.doi	10.1016/j.inffus.2020.04.001	en_UK
dc.citation.jtitle	Information Fusion	en_UK
dc.citation.issn	1566-2535	en_UK
dc.citation.volume	63	en_UK
dc.citation.spage	273	en_UK
dc.citation.epage	285	en_UK
dc.citation.publicationstatus	Published	en_UK
dc.citation.peerreviewed	Refereed	en_UK
dc.type.status	AM - Accepted Manuscript	en_UK
dc.contributor.funder	EPSRC Engineering and Physical Sciences Research Council	en_UK
dc.author.email	kia.dashtipour@stir.ac.uk	en_UK
dc.citation.date	21/04/2020	en_UK
dc.contributor.affiliation	Edinburgh Napier University	en_UK
dc.contributor.affiliation	Computing Science	en_UK
dc.contributor.affiliation	University of Wolverhampton	en_UK
dc.contributor.affiliation	Edinburgh Napier University	en_UK
dc.identifier.isi	WOS:000572142800004	en_UK
dc.identifier.scopusid	2-s2.0-85088642963	en_UK
dc.identifier.wtid	1608303	en_UK
dc.contributor.orcid	0000-0001-8651-5117	en_UK
dc.date.accepted	2020-04-11	en_UK
dcterms.dateAccepted	2020-04-11	en_UK
dc.date.filedepositdate	2020-04-27	en_UK
dc.relation.funderproject	Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices	en_UK
dc.relation.funderref	EP/M026981/1	en_UK
rioxxterms.apc	not required	en_UK
rioxxterms.type	Journal Article/Review	en_UK
rioxxterms.version	AM	en_UK
local.rioxx.author	Gogate, Mandar\|	en_UK
local.rioxx.author	Dashtipour, Kia\|0000-0001-8651-5117	en_UK
local.rioxx.author	Adeel, Ahsan\|	en_UK
local.rioxx.author	Hussain, Amir\|	en_UK
local.rioxx.project	EP/M026981/1\|Engineering and Physical Sciences Research Council\|http://dx.doi.org/10.13039/501100000266	en_UK
local.rioxx.freetoreaddate	2021-10-22	en_UK
local.rioxx.licence	http://www.rioxx.net/licenses/under-embargo-all-rights-reserved\|\|2021-10-21	en_UK
local.rioxx.licence	http://creativecommons.org/licenses/by-nc-nd/4.0/\|2021-10-22\|	en_UK
local.rioxx.filename	CochleaNet_2020.pdf	en_UK
local.rioxx.filecount	1	en_UK
local.rioxx.source	1566-2535	en_UK
Appears in Collections:	Computing Science and Mathematics Journal Articles

Files in This Item:

File	Description	Size	Format
CochleaNet_2020.pdf	Fulltext - Accepted Version	9.64 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

A file in this item is licensed under a Creative Commons License

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.

STORRE

STORRE: Stirling Online Research Repository