Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/25345
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorHussain, Amir-
dc.contributor.authorMinhas, Saliha Z-
dc.date.accessioned2017-05-16T13:24:49Z-
dc.date.issued2016-11-18-
dc.identifier.urihttp://hdl.handle.net/1893/25345-
dc.description.abstractFinancial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as £193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud firms of similar size and industrial grouping. Differently from other similar studies, this thesis uniquely takes a wide angled view and extracts a range of features of different categories from the corpus. These linguistic correlates of deception are uncovered using a variety of techniques and tools. Corpus linguistics methodology is applied to extract keywords and to examine linguistic structure. N-grams are extracted to draw out collocations. Readability measurement in financial text is advanced through the extraction of new indices that probe the text at a deeper level. Cognitive and perceptual processes are also picked out. Tone, intention and liquidity are gauged using customised word lists. Linguistic ratios are derived from grammatical constructs and word categories. An attempt is also made to determine ‘what’ was said as opposed to ‘how’. Further a new module is developed to condense synonyms into concepts. Lastly frequency counts from keywords unearthed from a previous content analysis study on financial narrative are also used. These features are then used to drive machine learning based classification and clustering algorithms to determine if they aid in discriminating a fraud from a non-fraud firm. The results derived from the battery of models built typically exceed classification accuracy of 70%. The above process is amalgamated into a framework. The process outlined, driven by empirical data demonstrates in a practical way how linguistic analysis could aid in fraud detection and also constitutes a unique contribution made to deception detection studies.en_GB
dc.language.isoenen_GB
dc.publisherUniversity of Stirlingen_GB
dc.subjectMachine Learningen_GB
dc.subjectFinancial Statement Frauden_GB
dc.subjectClassififcationen_GB
dc.subjectClusteringen_GB
dc.subjectLanguageen_GB
dc.subjectReadabilityen_GB
dc.subjectCorpus Linguisticsen_GB
dc.subjectFinancial Frauden_GB
dc.subjectDeception Detectionen_GB
dc.subjectUnstructured Texten_GB
dc.subject.lcshLanguage and computers Data processingen_GB
dc.subject.lcshCorpora (Linguistics) Data processingen_GB
dc.subject.lcshMisleading financial statementsen_GB
dc.subject.lcshFrauden_GB
dc.titleA Corpus Driven Computational Intelligence Framework for Deception Detection in Financial Texten_GB
dc.typeThesis or Dissertationen_GB
dc.type.qualificationlevelDoctoralen_GB
dc.type.qualificationnameDoctor of Philosophyen_GB
dc.rights.embargodate2017-09-01-
dc.rights.embargoreasonI would like to publish one last paper on the the thesis.en_GB
dc.author.emailsaliha.minhas@gmail.comen_GB
dc.rights.embargoterms2017-09-02en_GB
dc.rights.embargoliftdate2017-09-02-
Appears in Collections:Computing Science and Mathematics eTheses

Files in This Item:
File Description SizeFormat 
FINAL- MAIN.pdfThe Main Thesis - Vol 15.25 MBAdobe PDFView/Open
FINAL - APPENDICES.pdfAppendices - Vol 213.59 MBAdobe PDFView/Open


This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.