Please use this identifier to cite or link to this item:
http://hdl.handle.net/1893/31059
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Gogate, Mandar | en_UK |
dc.contributor.author | Dashtipour, Kia | en_UK |
dc.contributor.author | Adeel, Ahsan | en_UK |
dc.contributor.author | Hussain, Amir | en_UK |
dc.date.accessioned | 2020-04-28T00:03:02Z | - |
dc.date.available | 2020-04-28T00:03:02Z | - |
dc.date.issued | 2020-11 | en_UK |
dc.identifier.uri | http://hdl.handle.net/1893/31059 | - |
dc.description.abstract | Noisy situations cause huge problems for suffers of hearing loss as hearing aids often make speech more audible but do not always restore the intelligibility. In noisy settings, humans routinely exploit the audio-visual (AV) nature of speech to selectively suppress the background noise and focus on the target speaker. In this paper, we present a language, noise and speaker independent AV deep neural network (DNN) architecture for causal or real-time speech enhancement (SE). The model jointly exploits the noisy acoustic cues and noise robust visual cues to focus on the desired speaker and improve speech intelligibility. The proposed SE framework is evaluated using a first of its kind AV binaural speech corpus, called ASPIRE, recorded in real noisy environments including cafeteria and restaurant. We demonstrate superior performance of our approach in terms of objective measures and subjective listening tests over the state-of-the-art SE approaches as well as recent DNN based SE models. In addition, our work challenges a popular belief that, scarcity of multi-language large vocabulary AV corpus and a wide variety of noises is a major bottleneck to build a robust language, speaker and noise independent SE systems. We show that a model trained on synthetic mixture of Grid corpus (with 33 speakers and a small English vocabulary) and ChiME 3 Noises (consisting of bus, pedestrian, cafeteria, and street noises) generalise well not only on large vocabulary corpora, wide variety of speakers/noises but also on completely unrelated language (such as Mandarin). | en_UK |
dc.language.iso | en | en_UK |
dc.publisher | Elsevier BV | en_UK |
dc.relation | Gogate M, Dashtipour K, Adeel A & Hussain A (2020) CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement. Information Fusion, 63, pp. 273-285. https://doi.org/10.1016/j.inffus.2020.04.001 | en_UK |
dc.rights | This item has been embargoed for a period. During the embargo please use the Request a Copy feature at the foot of the Repository record to request a copy directly from the author. You can only request a copy if you wish to use this work for your own research or private study. Accepted refereed manuscript of: Gogate M, Dashtipour K, Adeel A & Hussain A (2020) CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement. Information Fusion, 63, pp. 273-285. https://doi.org/10.1016/j.inffus.2020.04.001 © 2020, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ | en_UK |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | en_UK |
dc.subject | Audio-Visual | en_UK |
dc.subject | Speech Enhancement | en_UK |
dc.subject | Speech SeparationDeep Learning | en_UK |
dc.subject | Real Noisy Audio-Visual Corpus | en_UK |
dc.subject | Speaker Independent | en_UK |
dc.subject | Causal | en_UK |
dc.title | CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement | en_UK |
dc.type | Journal Article | en_UK |
dc.rights.embargodate | 2021-10-22 | en_UK |
dc.rights.embargoreason | [CochleaNet_2020.pdf] Publisher requires embargo of 18 months after formal publication. | en_UK |
dc.identifier.doi | 10.1016/j.inffus.2020.04.001 | en_UK |
dc.citation.jtitle | Information Fusion | en_UK |
dc.citation.issn | 1566-2535 | en_UK |
dc.citation.volume | 63 | en_UK |
dc.citation.spage | 273 | en_UK |
dc.citation.epage | 285 | en_UK |
dc.citation.publicationstatus | Published | en_UK |
dc.citation.peerreviewed | Refereed | en_UK |
dc.type.status | AM - Accepted Manuscript | en_UK |
dc.contributor.funder | EPSRC Engineering and Physical Sciences Research Council | en_UK |
dc.author.email | kia.dashtipour@stir.ac.uk | en_UK |
dc.citation.date | 21/04/2020 | en_UK |
dc.contributor.affiliation | Edinburgh Napier University | en_UK |
dc.contributor.affiliation | Computing Science | en_UK |
dc.contributor.affiliation | University of Wolverhampton | en_UK |
dc.contributor.affiliation | Edinburgh Napier University | en_UK |
dc.identifier.isi | WOS:000572142800004 | en_UK |
dc.identifier.scopusid | 2-s2.0-85088642963 | en_UK |
dc.identifier.wtid | 1608303 | en_UK |
dc.contributor.orcid | 0000-0001-8651-5117 | en_UK |
dc.date.accepted | 2020-04-11 | en_UK |
dcterms.dateAccepted | 2020-04-11 | en_UK |
dc.date.filedepositdate | 2020-04-27 | en_UK |
dc.relation.funderproject | Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices | en_UK |
dc.relation.funderref | EP/M026981/1 | en_UK |
rioxxterms.apc | not required | en_UK |
rioxxterms.type | Journal Article/Review | en_UK |
rioxxterms.version | AM | en_UK |
local.rioxx.author | Gogate, Mandar| | en_UK |
local.rioxx.author | Dashtipour, Kia|0000-0001-8651-5117 | en_UK |
local.rioxx.author | Adeel, Ahsan| | en_UK |
local.rioxx.author | Hussain, Amir| | en_UK |
local.rioxx.project | EP/M026981/1|Engineering and Physical Sciences Research Council|http://dx.doi.org/10.13039/501100000266 | en_UK |
local.rioxx.freetoreaddate | 2021-10-22 | en_UK |
local.rioxx.licence | http://www.rioxx.net/licenses/under-embargo-all-rights-reserved||2021-10-21 | en_UK |
local.rioxx.licence | http://creativecommons.org/licenses/by-nc-nd/4.0/|2021-10-22| | en_UK |
local.rioxx.filename | CochleaNet_2020.pdf | en_UK |
local.rioxx.filecount | 1 | en_UK |
local.rioxx.source | 1566-2535 | en_UK |
Appears in Collections: | Computing Science and Mathematics Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
CochleaNet_2020.pdf | Fulltext - Accepted Version | 9.64 MB | Adobe PDF | View/Open |
This item is protected by original copyright |
A file in this item is licensed under a Creative Commons License
Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/
If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.