معالجة اختلال التوازن الطبقي للتنبؤ بسرطان الثدي في جنوب ليبيا: دراسة مقارنة لتقنيات أخذ العينات

محتوى المقالة الرئيسي

اسمه اعجال
منصور الصغير
امال معيوف

الملخص

يشير اختلال التوازن الطبقي إلى سيناريو تكون فيه كمية البيانات في فئة الأقلية أقل بكثير من تلك الموجودة في فئة الأغلبية، مما يؤدي إلى تحديات في التصنيف. ولمعالجة هذه المشكلة، تتناول هذه الدراسة التحدي المتمثل في عدم التوازن الطبقي في التنبؤ بسرطان الثدي باستخدام مجموعة بيانات من مركز سبها لعلاج الأورام في جنوب ليبيا. يبحث البحث في تأثير ثمانية تقنيات مختلفة لأخذ العينات، بما في ذلك SMOTE وAdasyn وNearMiss، عند دمجها مع تصنيف Random Forest. تكشف النتائج أن دمج SMOTE مع Random Forest يتفوق بشكل كبير على تكوينات النماذج الأخرى، مما يؤدي إلى زيادة بنسبة 21% في الدقة للتنبؤ بالعينات الخبيثة والوصول إلى ذروة الاستدعاء بنسبة 96%. توضح هذه الدراسة أهمية معالجة الاختلالات الطبقية في مجموعات البيانات الطبية لتحسين فعالية نماذج التنبؤ بسرطان الثدي.

تفاصيل المقالة

كيفية الاقتباس
اعجال A., الصغير M., & معيوف A. (2024). معالجة اختلال التوازن الطبقي للتنبؤ بسرطان الثدي في جنوب ليبيا: دراسة مقارنة لتقنيات أخذ العينات. وقائع مؤتمرات جامعة سبها, 3(2), 416–422. https://doi.org/10.51984/sucp.v3i2.3357
القسم
مقالة مؤتمر

المراجع

Yang, F., et al., Global trajectories of liver cancer burden from 1990 to 2019 and projection to 2035. 2023. 136(12): p. 1413-1421.

Jain, L., Artificial Intelligence and Machine Learning for Healthcare. 2023.

Jiang, Y., C. Wang, and S. Zhou. Artificial Intelligence-based Risk Stratification, Accurate Diagnosis and Treatment Prediction in Gynecologic Oncology. in Seminars in Cancer Biology. 2023. Elsevier.

Twomey, D., Novel Algorithm-Level Approaches for Class-Imbalanced Machine Learning. 2023, UCL (University College London).

Aguiar, G., B. Krawczyk, and A.J.M.L. Cano, A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. 2023: p. 1-79.

Teslenko, D., et al., Comparison of Dataset Oversampling Algorithms and Their Applicability to the Categorization Problem. 2023(2 (24)): p. 161-171.

Yu, T. and H.J.a.p.a. Zhu, Hyper-parameter optimization: A review of algorithms and applications. 2020.

Brandt, J. and E. Lanzén, A comparative review of SMOTE and ADASYN in imbalanced data classification. 2021.

Qing, Z., et al., ADASYN-LOF Algorithm for Imbalanced Tornado Samples. 2022. 13(4): p. 544.

Mqadi, N.M., N. Naicker, and T.J.M.P.i.E. Adeliyi, Solving misclassification of the credit card imbalance problem using near miss. 2021. 2021(1): p. 7194728.

Vuttipittayamongkol, P. and E.J.I.S. Elyan, Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. 2020. 509: p. 47-70.

Hairani, H., A. Anggrawan, and D.J.J.I.J.o.I.V. Priyanto, Improvement performance of the random forest method on unbalanced diabetes data classification using Smote-Tomek Link. 2023. 7(1): p. 258-264.

Dal Pozzolo, A., et al. Racing for unbalanced methods selection. in Intelligent Data Engineering and Automated Learning–IDEAL 2013: 14th International Conference, IDEAL 2013, Hefei, China, October 20-23, 2013. Proceedings 14. 2013. Springer.

Kovács, G.J.A.S.C., An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. 2019. 83: p. 105662.

López, V., et al., An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. 2013. 250: p. 113-141.

Ishaq, A., et al., Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. 2021. 9: p. 39707-39716.

Belarouci, S., et al., Comparative study of balancing methods: case of imbalanced medical data. 2016. 21(3): p. 247-263.

Raeder, T., et al., Learning from imbalanced data: Evaluation matters. 2012: p. 315-331.

Rendon, E., et al., Data sampling methods to deal with the big data multi-class imbalance problem. 2020. 10(4): p. 1276.

Huda, S., et al., A hybrid feature selection with ensemble classification for imbalanced healthcare data: A case study for brain tumor diagnosis. 2016. 4: p. 9145-9154.

Huang, M.-W., et al., On combining feature selection and over-sampling techniques for breast cancer prediction. 2021. 11(14): p. 6574.

Fotouhi, S., S. Asadi, and M.W.J.J.o.b.i. Kattan, A comprehensive data level analysis for cancer diagnosis on imbalanced data. 2019. 90: p. 103089.

Kaope, C. and Y.J.M.J.M. Pristyanto, Teknik Informatika dan Rekayasa Komputer, The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance. 2023. 22(2): p. 227-238.

Vinutha, H., B. Poornima, and B. Sagar. Detection of outliers using interquartile range technique from intrusion dataset. in Information and Decision Sciences: Proceedings of the 6th International Conference on FICTA. 2018. Springer.

Little, R.J. and D.B. Rubin, Statistical analysis with missing data. Vol. 793. 2019: John Wiley & Sons.

Raju, V.G., et al. Study the influence of normalization/transformation process on the accuracy of supervised classification. in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT). 2020. IEEE.

Billot, B., et al., Robust machine learning segmentation for large-scale analysis of heterogeneous clinical brain MRI datasets. 2023. 120(9): p. e2216399120.

Mahesh, T., et al., The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification. 2023. 4: p. 100247.

Yuan, Y., et al., Review of resampling techniques for the treatment of imbalanced industrial data classification in equipment condition monitoring. 2023. 126: p. 106911.

Stracqualursi, E., et al., Systematic review of energy theft practices and autonomous detection through artificial intelligence methods. 2023. 184: p. 113544.

Kim, A. and I.J.P.o. Jung, Optimal selection of resampling methods for imbalanced data with high complexity. 2023. 18(7): p. e0288540.

Wongvorachan, T., S. He, and O.J.I. Bulut, A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining. 2023. 14(1): p. 54.

Singh, P.S., et al., Enhanced classification of hyperspectral images using improvised oversampling and undersampling techniques. 2022. 14(1): p. 389-396.

Kou, G., et al., Improved hybrid resampling and ensemble model for imbalance learning and credit evaluation. 2022. 7(4): p. 511-529.

Saalim, M.I., Studying the perturbation-based oversampling technique for imbalanced classification problems. 2023.

Mesquita, F., J. Maurício, and G. Marques. Oversampling techniques for diabetes classification: A comparative study. in 2021 International Conference on e-Health and Bioengineering (EHB). 2021. IEEE.

Halim, A.M., et al., Handling Imbalanced Data Sets Using SMOTE and ADASYN to Improve Classification Performance of Ecoli Data Sets. 2023. 5(1): p. 246− 253-246− 253.

Elreedy, D. and A.F.J.I.S. Atiya, A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. 2019. 505: p. 32-64.

Chen, B., et al., RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise. 2021. 553: p. 397-428.

Tyagi, A.K. and V.K. Reddy, Performance analysis of under-sampling and over-sampling techniques for solving class imbalance problem. 2019.

Sarkar, S., et al. An ensemble learning-based undersampling technique for handling class-imbalance problem. in Proceedings of ICETIT 2019: Emerging Trends in Information Technology. 2020. Springer.

Tanimoto, A., et al., Improving imbalanced classification using near-miss instances. 2022. 201: p. 117130.

Ludera, D.T. Credit card fraud detection by combining synthetic minority oversampling and edited nearest neighbours. in Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 2. 2021. Springer.

Palimkar, P., R.N. Shaw, and A. Ghosh. Machine learning technique to prognosis diabetes disease: Random forest classifier approach. in Advanced Computing and Intelligent Technologies: Proceedings of ICACIT 2021. 2022. Springer.

Shekar, B. and G. Dagnew. Grid search-based hyperparameter tuning and classification of microarray cancer data. in 2019 second international conference on advanced computational and communication paradigms (ICACCP). 2019. IEEE.

Padilla, R., S.L. Netto, and E.A. Da Silva. A survey on performance metrics for object-detection algorithms. in 2020 international conference on systems, signals and image processing (IWSSIP). 2020. IEEE.