Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/32078
Full metadata record
DC FieldValueLanguage
dc.contributor.authorZhang, Xuejieen_UK
dc.contributor.authorXu, Yanen_UK
dc.contributor.authorAbel, Andrew Ken_UK
dc.contributor.authorSmith, Leslie Sen_UK
dc.contributor.authorWatt, Rogeren_UK
dc.contributor.authorHussain, Amiren_UK
dc.contributor.authorGao, Chengxiangen_UK
dc.date.accessioned2020-12-12T01:05:39Z-
dc.date.available2020-12-12T01:05:39Z-
dc.date.issued2020-12en_UK
dc.identifier.other1367en_UK
dc.identifier.urihttp://hdl.handle.net/1893/32078-
dc.description.abstractExtraction of relevant lip features is of continuing interest in the visual speech domain. 1 Using end-to-end feature extraction can produce good results, but at the cost of the results being 2 difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction 3 approach, motivated by human-centric glimpse based psychological research into facial barcodes, 4 and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor 5 based image patches), can successfully be used for speech recognition with LSTM based machine 6 learning. This approach can successfully extract low dimensionality lip parameters with a minimum 7 of processing. One key difference between using these Gabor-based features and using other features 8 such as traditional DCT, or the current fashion for CNN features is that these are human-centric 9 features that can be visualised and analysed by humans. This means that it is easier to explain and 10 visualise the results. They can also be used for reliable speech recognition, as demonstrated using the 11 Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate 12 of over 82%, which compares well to less explainable features in the literature. 13en_UK
dc.language.isoenen_UK
dc.publisherMDPIen_UK
dc.relationZhang X, Xu Y, Abel AK, Smith LS, Watt R, Hussain A & Gao C (2020) Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features. Entropy, 22 (12), Art. No.: 1367. https://doi.org/10.3390/e22121367en_UK
dc.rightsCopyright 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).en_UK
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en_UK
dc.subjectSpeech Recognitionen_UK
dc.subjectImage Processingen_UK
dc.subjectGabor Featuresen_UK
dc.subjectLip Readingen_UK
dc.subjectExplainableen_UK
dc.titleVisual Speech Recognition with Lightweight Psychologically Motivated Gabor Featuresen_UK
dc.typeJournal Articleen_UK
dc.rights.embargodate2020-12-11en_UK
dc.identifier.doi10.3390/e22121367en_UK
dc.identifier.pmid33279914en_UK
dc.citation.jtitleEntropyen_UK
dc.citation.issn1099-4300en_UK
dc.citation.volume22en_UK
dc.citation.issue12en_UK
dc.citation.publicationstatusPublisheden_UK
dc.citation.peerreviewedRefereeden_UK
dc.type.statusVoR - Version of Recorden_UK
dc.contributor.funderEPSRC Engineering and Physical Sciences Research Councilen_UK
dc.citation.date03/12/2020en_UK
dc.contributor.affiliationXi'an Jiaotong-Liverpool University, Chinaen_UK
dc.contributor.affiliationXi'an Jiaotong-Liverpool University, Chinaen_UK
dc.contributor.affiliationXi'an Jiaotong-Liverpool University, Chinaen_UK
dc.contributor.affiliationComputing Scienceen_UK
dc.contributor.affiliationPsychologyen_UK
dc.contributor.affiliationEdinburgh Napier Universityen_UK
dc.contributor.affiliationXi'an Jiaotong-Liverpool University, Chinaen_UK
dc.identifier.isiWOS:000602727500001en_UK
dc.identifier.scopusid2-s2.0-85097060874en_UK
dc.identifier.wtid1683831en_UK
dc.contributor.orcid0000-0002-3716-8013en_UK
dc.contributor.orcid0000-0001-8660-1875en_UK
dc.contributor.orcid0000-0002-3716-8013en_UK
dc.contributor.orcid0000-0002-3716-8013en_UK
dc.contributor.orcid0000-0001-8660-1875en_UK
dc.contributor.orcid0000-0002-3716-8013en_UK
dc.contributor.orcid0000-0001-8660-1875en_UK
dc.date.accepted2020-11-23en_UK
dcterms.dateAccepted2020-11-23en_UK
dc.date.filedepositdate2020-12-11en_UK
dc.relation.funderprojectTowards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devicesen_UK
dc.relation.funderrefEP/M026981/1en_UK
dc.subject.tagHearing Aidsen_UK
rioxxterms.apcnot requireden_UK
rioxxterms.typeJournal Article/Reviewen_UK
rioxxterms.versionVoRen_UK
local.rioxx.authorZhang, Xuejie|0000-0002-3716-8013en_UK
local.rioxx.authorXu, Yan|0000-0001-8660-1875en_UK
local.rioxx.authorAbel, Andrew K|0000-0002-3716-8013en_UK
local.rioxx.authorSmith, Leslie S|0000-0002-3716-8013en_UK
local.rioxx.authorWatt, Roger|0000-0001-8660-1875en_UK
local.rioxx.authorHussain, Amir|0000-0002-3716-8013en_UK
local.rioxx.authorGao, Chengxiang|0000-0001-8660-1875en_UK
local.rioxx.projectEP/M026981/1|Engineering and Physical Sciences Research Council|http://dx.doi.org/10.13039/501100000266en_UK
local.rioxx.freetoreaddate2020-12-11en_UK
local.rioxx.licencehttp://creativecommons.org/licenses/by/4.0/|2020-12-11|en_UK
local.rioxx.filenameentropy-22-01367.pdfen_UK
local.rioxx.filecount1en_UK
local.rioxx.source1099-4300en_UK
Appears in Collections:Computing Science and Mathematics Journal Articles

Files in This Item:
File Description SizeFormat 
entropy-22-01367.pdfFulltext - Published Version2.47 MBAdobe PDFView/Open


This item is protected by original copyright



A file in this item is licensed under a Creative Commons License Creative Commons

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.