Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol

  • Zakia Salod | zakia.salod@gmail.com Department of TeleHealth, University of KwaZulu-Natal, Durban, South Africa.
  • Yashik Singh Department of TeleHealth, University of KwaZulu-Natal, Durban, South Africa.

Abstract

Background: Breast Cancer (BC) is a known global crisis. TheWorld Health Organization reports a global 2.09 million inci-dences and 627,000 deaths in 2018 relating to BC. The traditionalBC screening method in developed countries is mammography,whilst developing countries employ breast self-examination andclinical breast examination. The prominent gold standard for BCdetection is triple assessment: i) clinical examination, ii) mam-mography and/or ultrasonography; and iii) Fine Needle AspirateCytology. However, the introduction of cheaper, efficient and non-invasive methods of BC screening and detection would be benefi-cial.

Design and methods: We propose the use of eight machinelearning algorithms: i) Logistic Regression; ii) Support VectorMachine; iii) K-Nearest Neighbors; iv) Decision Tree; v) RandomForest; vi) Adaptive Boosting; vii) Gradient Boosting; viii)eXtreme Gradient Boosting, and blood test results using BCCoimbra Dataset (BCCD) from University of California Irvineonline database to create models for BC prediction. To ensure themodels’ robustness, we will employ: i) Stratified k-fold Cross-Validation; ii) Correlation-based Feature Selection (CFS); and iii)parameter tuning. The models will be validated on validation andtest sets of BCCD for full features and reduced features. Featurereduction has an impact on algorithm performance. Seven metricswill be used for model evaluation, including accuracy.

Expected impact of the study for public health: The CFStogether with highest performing model(s) can serve to identifyimportant specific blood tests that point towards BC, which mayserve as an important BC biomarker. Highest performing model(s)may eventually be used to create an Artificial Intelligence tool toassist clinicians in BC screening and detection.

Downloads

Download data is not yet available.
Published
2019-12-04
Section
Study Protocols
Keywords:
Breast cancer, cancer screening, biomarkers, machine learning, blood tests
Statistics
Abstract views: 368

PDF: 39
Share it

PlumX Metrics

PlumX Metrics provide insights into the ways people interact with individual pieces of research output (articles, conference proceedings, book chapters, and many more) in the online environment. Examples include, when research is mentioned in the news or is tweeted about. Collectively known as PlumX Metrics, these metrics are divided into five categories to help make sense of the huge amounts of data involved and to enable analysis by comparing like with like.

How to Cite
Salod, Z., & Singh, Y. (2019). Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol. Journal of Public Health Research, 8(3). https://doi.org/10.4081/jphr.2019.1677