Towards An Intelligent Fuzzy Based Multimodal Two Stage Speech Enhancement System

Abel, Andrew

Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/15989

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Hussain, Amir	-
dc.contributor.advisor	Smith, Leslie	-
dc.contributor.author	Abel, Andrew	-
dc.date.accessioned	2013-07-26T10:31:33Z	-
dc.date.available	2013-07-26T10:31:33Z	-
dc.date.issued	2013	-
dc.identifier.citation	A. Abel, A. Hussain, Q.D. Nguyen, F. Ringeval, M. Chetouani, and M. Milgram. Maximising audiovisual correlation with automatic lip tracking and vowel based segmentation. In Biometric ID Management and Multimodal Communication: Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid, Spain, September 16-18, 2009, Proceedings, volume 5707, pages 65--72. Springer-Verlag, 2009.	en_GB
dc.identifier.citation	S. Cifani, A. Abel, A. Hussain, S. Squartini, and F. Piazza. An investigation into audiovisual speech correlation in reverberant noisy environments. In Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions: COST Action 2102 International Conference Prague, Czech Republic, October 15-18, 2008 Revised Selected and Invited Papers, volume 5641, pages 331--343. Springer-Verlag, 2009.	en_GB
dc.identifier.uri	http://hdl.handle.net/1893/15989	-
dc.description.abstract	This thesis presents a novel two stage multimodal speech enhancement system, making use of both visual and audio information to filter speech, and explores the extension of this system with the use of fuzzy logic to demonstrate proof of concept for an envisaged autonomous, adaptive, and context aware multimodal system. The design of the proposed cognitively inspired framework is scalable, meaning that it is possible for the techniques used in individual parts of the system to be upgraded and there is scope for the initial framework presented here to be expanded. In the proposed system, the concept of single modality two stage filtering is extended to include the visual modality. Noisy speech information received by a microphone array is first pre-processed by visually derived Wiener filtering employing the novel use of the Gaussian Mixture Regression (GMR) technique, making use of associated visual speech information, extracted using a state of the art Semi Adaptive Appearance Models (SAAM) based lip tracking approach. This pre-processed speech is then enhanced further by audio only beamforming using a state of the art Transfer Function Generalised Sidelobe Canceller (TFGSC) approach. This results in a system which is designed to function in challenging noisy speech environments (using speech sentences with different speakers from the GRID corpus and a range of noise recordings), and both objective and subjective test results (employing the widely used Perceptual Evaluation of Speech Quality (PESQ) measure, a composite objective measure, and subjective listening tests), showing that this initial system is capable of delivering very encouraging results with regard to filtering speech mixtures in difficult reverberant speech environments. Some limitations of this initial framework are identified, and the extension of this multimodal system is explored, with the development of a fuzzy logic based framework and a proof of concept demonstration implemented. Results show that this proposed autonomous,adaptive, and context aware multimodal framework is capable of delivering very positive results in difficult noisy speech environments, with cognitively inspired use of audio and visual information, depending on environmental conditions. Finally some concluding remarks are made along with proposals for future work.	en_GB
dc.language.iso	en	en_GB
dc.publisher	University of Stirling	en_GB
dc.subject	audiovisual	en_GB
dc.subject	speech	en_GB
dc.subject	filtering	en_GB
dc.subject	multimodal	en_GB
dc.subject	fuzzy logic	en_GB
dc.subject.lcsh	Computational linguistics	en_GB
dc.subject.lcsh	Human-computer interaction	en_GB
dc.subject.lcsh	Human-machine systems	en_GB
dc.title	Towards An Intelligent Fuzzy Based Multimodal Two Stage Speech Enhancement System	en_GB
dc.type	Thesis or Dissertation	en_GB
dc.type.qualificationlevel	Doctoral	en_GB
dc.type.qualificationname	Doctor of Philosophy	en_GB
dc.rights.embargodate	2014-08-01	-
dc.rights.embargoreason	I intend to write another journal paper based on one of my thesis chapters.	en_GB
dc.author.email	aka@cs.stir.ac.uk	en_GB
dc.contributor.affiliation	School of Natural Sciences	en_GB
dc.contributor.affiliation	Computing Science and Mathematics	en_GB
Appears in Collections:	Computing Science and Mathematics eTheses

Files in This Item:

File	Description	Size	Format
LowColThesisNoDed.pdf		5.55 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.

STORRE

STORRE: Stirling Online Research Repository