Machine learning-based statistical analysis for early stage detection of cervical cancer
Cervical cancer (CC) is the most common type of cancer in women and remains a significant cause of mortality, particularly in less developed countries, although it can be effectively treated if detected at an early stage. This study aimed to find efficient machine-learning-based classifying models to detect early stage CC using clinical data. We obtained a Kaggle data repository CC dataset which contained four classes of attributes including biopsy, cytology, Hinselmann, and Schiller. This dataset was split into four categories based on these class attributes. Three feature transformation methods, including log, sine function, and Z-score were applied to these datasets. Several supervised machine learning algorithms were assessed for their performance in classification. A Random Tree (RT) algorithm provided the best classification accuracy for the biopsy (98.33%) and cytology (98.65%) data, whereas Random Forest (RF) and Instance-Based K-nearest neighbor (IBk) provided the best performance for Hinselmann (99.16%), and Schiller (98.58%) respectively. Among the feature transformation methods, logarithmic gave the best performance for biopsy datasets whereas sine function was superior for cytology. Both logarithmic and sine functions performed the best for the Hinselmann dataset, while Z-score was best for the Schiller dataset. Various Feature Selection Techniques (FST) methods were applied to the transformed datasets to identify and prioritize important risk factors. The outcomes of this study indicate that appropriate system design and tuning, machine learning methods and classification are able to detect CC accurately and efficiently in its early stages using clinical data.
|ISBN||1879-0534 (Electronic) 0010-4825 (Linking)|
|Authors||Ali, M. M.; Ahmed, K.; Bui, F. M.; Paul, B. K.; Ibrahim, S. M.; Quinn, J. M. W.; Moni, M. A.|
|Responsible Garvan Author|
|Publisher Name||COMPUTERS IN BIOLOGY AND MEDICINE|
|URL link to publisher's version||https://www.ncbi.nlm.nih.gov/pubmed/34735942|