Comparative Analysis of Random Forest and XGBoost Algorithms for Credit Risk Classification

Norita Sinaga; Syaeful Machfud

doi:10.31572/inotera.Vol10.Iss2.2025.ID556

Norita Sinaga Universitas Pamulang
Syaeful Machfud University of Pamulang

DOI: https://doi.org/10.31572/inotera.Vol10.Iss2.2025.ID556

Keywords: Credit Risk Classification, Machine Learning, GridSearchCV, Python, Random Forest, XGBoost

Abstract

Determining the eligibility of credit is an important process for financial institutions to avoid the risk of default. Errors in classifying potential customers can cause significant losses, especially if high-risk customers are predicted to be creditworthy. To overcome these problems, this study proposes the application of machine learning algorithms as a solution in building an accurate credit risk classification system. The purpose of this study is to analyze and compare the performance of the Random Forest and XGBoost algorithms in predicting credit risk using the German Credit Data dataset. The research was conducted using the Python programming language with stages of data preprocessing, train-test split, model training, performance evaluation based on Accuracy, Precision, Recall, F1-Score, and ROC-AUC metrics, as well as hyperparameter optimization through GridSearchCV and 5-Fold Cross Validation. The experimental results showed that XGBoost had superior performance with an Accuracy of 0.91, F1-score of 0.89, and ROC-AUC of 0.94 compared to Random Forest, which obtained an Accuracy of 0.88, F1-score of 0.85, and ROC-AUC of 0.90. With a lower rate of misclassification, the XGBoost model is considered more effective in supporting an automatic and efficient credit risk classification system.

Downloads

Download data is not yet available.

References

X. Zhang and L. Yu, “Consumer credit risk assessment: A review from the state-of-the-art classification algorithms, data traits, and learning methods,” Expert Syst. Appl., vol. 237, p. 121484, 2024, doi: https://doi.org/10.1016/j.eswa.2023.121484.

S. Lane, “Submarginal Credit Risk Classification,” J. Financ. Quant. Anal., vol. 7, no. 1, pp. 1379–1385, 1972, doi: 10.2307/2330069.

H. Dong, R. Liu, and A. W. Tham, “Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation,” J. Risk Financ. Manag., vol. 17, no. 2, 2024, doi: 10.3390/jrfm17020050.

A. K. Sharma, L.-H. Li, and R. Ahmad, “Default Risk Prediction Using Random Forest and XGBoosting Classifier BT - 2021 International Conference on Security and Information Technologies with AI, Internet Computing and Big-data Applications,” G. A. Tsihrintzis, S.-J. Wang, and I.-C. Lin, Eds., Cham: Springer International Publishing, 2023, pp. 91–101.

M. Ikermane, M. Mohy-eddine, and Y. Rachidi, “Credit Card Fraud Detection: Comparing Random Forest and XGBoost Models with Explainable AI Interpretations BT - Innovative Technologies on Electrical Power Systems for Smart Cities Infrastructure,” I. Aboudrar, F. Ilahi Bakhsh, A. Nayyar, and I. Ouachtouk, Eds., Cham: Springer Nature Switzerland, 2025, pp. 126–135.

L. Munkhdalai, T. Munkhdalai, O.-E. Namsrai, J. Y. Lee, and K. H. Ryu, “An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments,” Sustainability, vol. 11, no. 3, 2019, doi: 10.3390/su11030699.

A. Pandey, S. Shukla, and K. K. Mohbey, “Comparative Analysis of a Deep Learning Approach with Various Classification Techniques for Credit Score Computation,” Recent Adv. Comput. Sci. Commun., vol. 14, no. 9, 2020, doi: 10.2174/2666255813999200721004720.

J. Jemai, M. Chaieb, and A. Zarrad, “A Big Data Mining Approach for Credit Risk Analysis,” in 2022 International Symposium on Networks, Computers and Communications, ISNCC 2022, 2022. doi: 10.1109/ISNCC55209.2022.9851809.

A. Alagic et al., “Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data,” Mach. Learn. Knowl. Extr., vol. 6, no. 1, 2024, doi: 10.3390/make6010004.

S. Fatima, A. Hussain, S. Bin Amir, S. H. Ahmed, and S. M. H. Aslam, “XGBoost and Random Forest Algorithms: An in-depth Analysis,” Pakistan J. Sci. Res., vol. 3, no. 1, 2023, doi: 10.57041/pjosr.v3i1.946.

S. Ben Jabeur, S. Mefteh-Wali, and J. L. Viviani, “Forecasting gold price with the XGBoost algorithm and SHAP interaction values,” Ann. Oper. Res., vol. 334, no. 1–3, 2024, doi: 10.1007/s10479-021-04187-w.

M. R. Givari, M. R. Sulaeman, and Y. Umaidah, “Perbandingan Algoritma SVM, Random Forest Dan XGBoost Untuk Penentuan Persetujuan Pengajuan Kredit,” NUANSA Inform., vol. 16, no. 1, 2022, doi: 10.25134/nuansa.v16i1.5406.

Jan Melvin Ayu Soraya Dachi and Pardomuan Sitompul, “Analisis Perbandingan Algoritma XGBoost dan Algoritma Random Forest Ensemble Learning pada Klasifikasi Keputusan Kredit,” J. Ris. RUMPUN Mat. DAN ILMU Pengetah. ALAM, vol. 2, no. 2, 2023, doi: 10.55606/jurrimipa.v2i2.1470.

S. Lina, M. Sitio, and N. Rofiq, “Classification of Creditworthy Customer Using Support Vector Machine Algorithm,” vol. 10, no. 2, pp. 339–345, 2025, doi: 10.31572/inotera.Vol10.Iss2.2025.ID502.

A. Lisanthoni, F. I. Sari, E. L. Gunawan, and C. A. Adhigiadany, “Model Prediksi Kepadatan Lalu Lintas: Perbandingan Algoritma Random Forest dan XGBoost,” Pros. Semin. Nas. SAINS DATA, vol. 3, no. 1, 2023, doi: 10.33005/senada.v3i1.126.

N. Agian, S. Dinata, G. Abdurrahman, and N. Q. Fitriyah, “Perbandingan Optimasi Algoritma Random Forest Menggunakan Teknik Boosting Terhadap Kasus Klasifikasi Churn Pelanggan Di Industri Telekomunikasi,” J. Apl. Sist. Inf. dan Elektron., vol. 5, no. 1, 2023.