Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/252
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorHussain, Amir-
dc.contributor.authorAbdullah, Ahsan-
dc.date.accessioned2007-12-19T11:05:19Z-
dc.date.available2007-12-19T11:05:19Z-
dc.date.issued2007-
dc.identifier.citationA. Abdullah & A. Hussain, Using biclustering for automatic Attribute Selection to Enhance Global Visualization, Springer Verlag Lecture Notes in Computer Science, 4370 (2007), 35-47.en
dc.identifier.urihttp://hdl.handle.net/1893/252-
dc.description.abstractOur ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis.en
dc.language.isoenen
dc.publisherUniversity of Stirlingen
dc.subjectdata miningen
dc.subjectcrossing minimizationen
dc.subjectbiclusteringen
dc.subjectagricultureen
dc.subject.lcshData miningen
dc.subject.lcshInformation retrievalen
dc.subject.lcshCotton trade Pakistanen
dc.titleData Mining Using the Crossing Minimization Paradigmen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnameDoctor of Philosophyen
dc.contributor.affiliationSchool of Natural Sciences-
dc.contributor.affiliationComputing Science and Mathematics-
Appears in Collections:Computing Science and Mathematics eTheses

Files in This Item:
File Description SizeFormat 
Thesis-AhsanAbdullah-2007.pdf1.53 MBAdobe PDFView/Open


This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.