Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study

Amin, Adnan; Anwar, Sajid; Adnan, Awais; Nawaz, Muhammad; Howard, Newton; Qadir, Junaid; Hawalah, Ahmad Y A; Hussain, Amir

doi:10.1109/access.2016.2619719

Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/24917

Appears in Collections:	Computing Science and Mathematics Journal Articles
Peer Review Status:	Refereed
Title:	Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study
Author(s):	Amin, Adnan Anwar, Sajid Adnan, Awais Nawaz, Muhammad Howard, Newton Qadir, Junaid Hawalah, Ahmad Y A Hussain, Amir
Contact Email:	ahu@cs.stir.ac.uk
Issue Date:	26-Oct-2016
Date Deposited:	1-Feb-2017
Citation:	Amin A, Anwar S, Adnan A, Nawaz M, Howard N, Qadir J, Hawalah AYA & Hussain A (2016) Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study. IEEE Access, 4, pp. 7940-7957. https://doi.org/10.1109/access.2016.2619719
Abstract:	Customer retention is a major issue for various service-based organizations particularly telecom industry, wherein predictive models for observing the behavior of customers are one of the great instruments in customer retention process and inferring the future behavior of the customers. However, the performances of predictive models are greatly affected when the real-world data set is highly imbalanced. A data set is called imbalanced if the samples size from one class is very much smaller or larger than the other classes. The most commonly used technique is over/under sampling for handling the class-imbalance problem (CIP) in various domains. In this paper, we survey six well-known sampling techniques and compare the performances of these key techniques, i.e., mega-trend diffusion function (MTDF), synthetic minority oversampling technique, adaptive synthetic sampling approach, couples top-N reverse k-nearest neighbor, majority weighted minority oversampling technique, and immune centroids oversampling technique. Moreover, this paper also reveals the evaluation of four rules-generation algorithms (the learning from example module, version 2 (LEM2), covering, exhaustive, and genetic algorithms) using publicly available data sets. The empirical results demonstrate that the overall predictive performance of MTDF and rules-generation based on genetic algorithms performed the best as compared with the rest of the evaluated oversampling methods and rule-generation algorithms.
DOI Link:	10.1109/access.2016.2619719
Rights:	Copyright 2016 IEEE. IEEE's Open Access Publishing Agreement allows: OA authors are assured that they are free to post the final, published version of their articles on their personal websites, their employers' sites, or their funding agency's sites. The OAPA allows users to copy the work, as well as to translate it or to reuse it for text/data mining, as long as the usage is for non-commercial purposes. IEEE authors who want to submit their manuscripts under an OA license are encouraged to use the IEEE OAPA. Available at: http://www.ieee.org/publications_standards/publications/rights/oa_author_choices.html

Files in This Item:

File	Description	Size	Format
07707454.pdf	Fulltext - Published Version	8.42 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show full item record

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.

STORRE

STORRE: Stirling Online Research Repository