Towards On-line Domain-Independent Big Data Learning: Novel Theories and Applications

Malik, Zeeshan

Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/22591

Appears in Collections:	Computing Science and Mathematics eTheses
Title:	Towards On-line Domain-Independent Big Data Learning: Novel Theories and Applications
Author(s):	Malik, Zeeshan
Supervisor(s):	Hussain, Amir Cairns, David
Keywords:	Slow Feature Analysis Canonical Correlation Analysis Laplacian eigenmaps Linear Discriminant Analysis Invariance Dimensionality Reduction Feature Extraction Domain Independent Learning Incremental Learning Concept Drifts
Issue Date:	19-Oct-2015
Publisher:	University of Stirling
Citation:	Malik, Zeeshan Khawar, Amir Hussain, and Jonathan Wu. "Novel biologically inspired approaches to extracting online information from temporal data." Cognitive Computation 6.3 (2014): 595-607. Malik, Zeeshan Khawar, Amir Hussain, and Jonathan Wu. "An online generalized eigenvalue version of Laplacian Eigenmaps for visual big data." Neurocomputing 173 (2016): 127-136.
Abstract:	Feature extraction is an extremely important pre-processing step to pattern recognition, and machine learning problems. This thesis highlights how one can best extract features from the data in an exhaustively online and purely adaptive manner. The solution to this problem is given for both labeled and unlabeled datasets, by presenting a number of novel on-line learning approaches. Specifically, the differential equation method for solving the generalized eigenvalue problem is used to derive a number of novel machine learning and feature extraction algorithms. The incremental eigen-solution method is used to derive a novel incremental extension of linear discriminant analysis (LDA). Further the proposed incremental version is combined with extreme learning machine (ELM) in which the ELM is used as a preprocessor before learning. In this first key contribution, the dynamic random expansion characteristic of ELM is combined with the proposed incremental LDA technique, and shown to offer a significant improvement in maximizing the discrimination between points in two different classes, while minimizing the distance within each class, in comparison with other standard state-of-the-art incremental and batch techniques. In the second contribution, the differential equation method for solving the generalized eigenvalue problem is used to derive a novel state-of-the-art purely incremental version of slow feature analysis (SLA) algorithm, termed the generalized eigenvalue based slow feature analysis (GENEIGSFA) technique. Further the time series expansion of echo state network (ESN) and radial basis functions (EBF) are used as a pre-processor before learning. In addition, the higher order derivatives are used as a smoothing constraint in the output signal. Finally, an online extension of the generalized eigenvalue problem, derived from James Stone’s criterion, is tested, evaluated and compared with the standard batch version of the slow feature analysis technique, to demonstrate its comparative effectiveness. In the third contribution, light-weight extensions of the statistical technique known as canonical correlation analysis (CCA) for both twinned and multiple data streams, are derived by using the same existing method of solving the generalized eigenvalue problem. Further the proposed method is enhanced by maximizing the covariance between data streams while simultaneously maximizing the rate of change of variances within each data stream. A recurrent set of connections used by ESN are used as a pre-processor between the inputs and the canonical projections in order to capture shared temporal information in two or more data streams. A solution to the problem of identifying a low dimensional manifold on a high dimensional dataspace is then presented in an incremental and adaptive manner. Finally, an online locally optimized extension of Laplacian Eigenmaps is derived termed the generalized incremental laplacian eigenmaps technique (GENILE). Apart from exploiting the benefit of the incremental nature of the proposed manifold based dimensionality reduction technique, most of the time the projections produced by this method are shown to produce a better classification accuracy in comparison with standard batch versions of these techniques - on both artificial and real datasets.
Type:	Thesis or Dissertation
URI:	http://hdl.handle.net/1893/22591

Files in This Item:

File	Description	Size	Format
thesis.pdf	Final Version of Thesis	3.68 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show full item record

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.

STORRE

STORRE: Stirling Online Research Repository