Data Mining Using the Crossing Minimization Paradigm

Abdullah, Ahsan

Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/252

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Hussain, Amir	-
dc.contributor.author	Abdullah, Ahsan	-
dc.date.accessioned	2007-12-19T11:05:19Z	-
dc.date.available	2007-12-19T11:05:19Z	-
dc.date.issued	2007	-
dc.identifier.citation	A. Abdullah & A. Hussain, Using biclustering for automatic Attribute Selection to Enhance Global Visualization, Springer Verlag Lecture Notes in Computer Science, 4370 (2007), 35-47.	en
dc.identifier.uri	http://hdl.handle.net/1893/252	-
dc.description.abstract	Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis.	en
dc.language.iso	en	en
dc.publisher	University of Stirling	en
dc.subject	data mining	en
dc.subject	crossing minimization	en
dc.subject	biclustering	en
dc.subject	agriculture	en
dc.subject.lcsh	Data mining	en
dc.subject.lcsh	Information retrieval	en
dc.subject.lcsh	Cotton trade Pakistan	en
dc.title	Data Mining Using the Crossing Minimization Paradigm	en
dc.type	Thesis or Dissertation	en
dc.type.qualificationlevel	Doctoral	en
dc.type.qualificationname	Doctor of Philosophy	en
dc.contributor.affiliation	School of Natural Sciences	-
dc.contributor.affiliation	Computing Science and Mathematics	-
Appears in Collections:	Computing Science and Mathematics eTheses

Files in This Item:

File	Description	Size	Format
Thesis-AhsanAbdullah-2007.pdf		1.53 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.

STORRE

STORRE: Stirling Online Research Repository