Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/28200
Full metadata record
DC FieldValueLanguage
dc.contributor.authorGogate, Mandaren_UK
dc.contributor.authorAdeel, Ahsanen_UK
dc.contributor.authorMarxer, Ricarden_UK
dc.contributor.authorBarker, Jonen_UK
dc.contributor.authorHussain, Amiren_UK
dc.date.accessioned2018-11-10T01:01:11Z-
dc.date.available2018-11-10T01:01:11Z-
dc.date.issued2018-09-02en_UK
dc.identifier.urihttp://hdl.handle.net/1893/28200-
dc.description.abstractHuman auditory cortex excels at selectively suppressing background noise to focus on a target speaker. The process of selective attention in the brain is known to contextually exploit the available audio and visual cues to better focus on target speaker while filtering out other noises. In this study, we propose a novel deep neural network (DNN) based audiovisual (AV) mask estimation model. The proposed AV mask estimation model contextually integrates the temporal dynamics of both audio and noise-immune visual features for improved mask estimation and speech separation. For optimal AV features extraction and ideal binary mask (IBM) estimation, a hybrid DNN architecture is exploited to leverages the complementary strengths of a stacked long short term memory (LSTM) and convolution LSTM network. The comparative simulation results in terms of speech quality and intelligibility demonstrate significant performance improvement of our proposed AV mask estimation model as compared to audio-only and visual-only mask estimation approaches for both speaker dependent and independent scenarios.en_UK
dc.language.isoenen_UK
dc.publisherISCAen_UK
dc.relationGogate M, Adeel A, Marxer R, Barker J & Hussain A (2018) DNN Driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation. In: Proceedings of the Annual Conference of the International Speech Communication Association. Interspeech 2018, 02.09.2018-06.09.2018. Baixas, France: ISCA, pp. 2723-2727. https://doi.org/10.21437/Interspeech.2018-2516en_UK
dc.rightsPublisher policy allows this work to be made available in this repository. Published in Proceedings of Interspeech 2018 by ISCA. The original publication is available at: https://doi.org/10.21437/Interspeech.2018-2516.en_UK
dc.subjectSpeech Separationen_UK
dc.subjectBinary Mask Estimationen_UK
dc.subjectDeep Neural Networken_UK
dc.subjectSpeech Enhancementen_UK
dc.titleDNN Driven Speaker Independent Audio-Visual Mask Estimation for Speech Separationen_UK
dc.typeConference Paperen_UK
dc.identifier.doi10.21437/Interspeech.2018-2516en_UK
dc.citation.issn2308-457Xen_UK
dc.citation.spage2723en_UK
dc.citation.epage2727en_UK
dc.citation.publicationstatusPublisheden_UK
dc.type.statusVoR - Version of Recorden_UK
dc.contributor.funderEngineering and Physical Sciences Research Councilen_UK
dc.citation.btitleProceedings of the Annual Conference of the International Speech Communication Associationen_UK
dc.citation.conferencedates2018-09-02 - 2018-09-06en_UK
dc.citation.conferencenameInterspeech 2018en_UK
dc.citation.date02/09/2018en_UK
dc.publisher.addressBaixas, Franceen_UK
dc.contributor.affiliationComputing Scienceen_UK
dc.contributor.affiliationComputing Scienceen_UK
dc.contributor.affiliationAix-Marseille Universityen_UK
dc.contributor.affiliationComputing Scienceen_UK
dc.identifier.scopusid2-s2.0-85054957432en_UK
dc.identifier.wtid1044899en_UK
dc.contributor.orcid0000-0003-1712-9014en_UK
dc.contributor.orcid0000-0002-8080-082Xen_UK
dc.date.accepted2018-06-03en_UK
dcterms.dateAccepted2018-06-03en_UK
dc.date.filedepositdate2018-11-09en_UK
dc.relation.funderprojectTowards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devicesen_UK
dc.relation.funderrefEP/M026981/1en_UK
rioxxterms.apcnot requireden_UK
rioxxterms.typeConference Paper/Proceeding/Abstracten_UK
rioxxterms.versionVoRen_UK
local.rioxx.authorGogate, Mandar|0000-0003-1712-9014en_UK
local.rioxx.authorAdeel, Ahsan|en_UK
local.rioxx.authorMarxer, Ricard|en_UK
local.rioxx.authorBarker, Jon|en_UK
local.rioxx.authorHussain, Amir|0000-0002-8080-082Xen_UK
local.rioxx.projectEP/M026981/1|Engineering and Physical Sciences Research Council|http://dx.doi.org/10.13039/501100000266en_UK
local.rioxx.freetoreaddate2018-11-09en_UK
local.rioxx.licencehttp://www.rioxx.net/licenses/all-rights-reserved|2018-11-09|en_UK
local.rioxx.filename2516.pdfen_UK
local.rioxx.filecount1en_UK
local.rioxx.source2308-457Xen_UK
Appears in Collections:Computing Science and Mathematics Conference Papers and Proceedings

Files in This Item:
File Description SizeFormat 
2516.pdfFulltext - Published Version556.07 kBAdobe PDFView/Open


This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.