In today's data-driven financial and economic landscape, the evaluation of mortgage loan default risks has significantly evolved due to enhanced data insights and sophisticated model techniques. Predictive modelling techniques have risen to prominence due to their ability to enhance decision-making, streamline lending processes, and ultimately enhance transparency and risk management of financial institutions.

The EBA’s follow-up report on Machine Learning for Internal Ratings-Based (“IRB”) models (“EBA report”) published in August 2023 [EBA 2023], (follow-up to the European Banking Authority (“EBA”) report of November 2021 [EBA]) emphasises the prudent use of Machine Learning in IRB models, offers recommendations for this prudent use, and points out potential issues related to the General Data Protection Regulation and the Artificial Intelligence ("AI”) Act. The aim of this follow-up report is to summarise the main conclusions from the consultation on the Discussion Paper (“DP”) on Machine Learning (“ML”) in the context of IRB models. According to the report, Machine Learning in the context of Credit Risk can serve various purposes and levels, including data preparation, risk differentiation, risk quantification, and internal validation. Recommendations suggest avoidance of excessive complexity, inclusion of non-predictive drivers, alongside emphasising proper interpretation, documentation, and addressing potential biases in Machine Learning models.

The aim of the report is to establish supervisory expectations regarding the coexistence and adherence of new, advanced Machine Learning models with the Capital Requirements Regulation (“CRR”) in the context of IRB models used for calculating regulatory capital for credit risk. This follow-up report specifically focuses on more complex models than traditional techniques such as, regression analysis or simple Decision Trees, which are often less transparent and challenging to understand. In the credit risk context, these Machine Learning models have the potential to enhance predictive power and are already being used in internal models for credit approval processes.

The study conducted by Grant Thornton Cyprus Quantitative Risk (“GT QR”) team in collaboration with SEKASA Technologies, investigates Logistic Regression, as a benchmark, against various Machine Learning models' performance in mortgage default prediction. Key points investigated are:

  • Predictive accuracy: A comparison of accuracy between Logistic Regression and advanced ML models in predicting mortgage defaults in the light of EBA recommendations.
  • Balancing complexity and explainability: Aim to find an optimal balance between model complexity and explainability as stated in the EBA follow up report.
  • EBA Recommendations: A detailed examination of how Logistic Regression and advanced ML models can be used to predict mortgage defaults, considering the recommendations of the EBA follow-up report.

It is important to emphasise that this study aims to investigate the benefits and challenges of ML models in financial institutions in the context of the recommendation stated in the EBA follow-up report and not to identify the optimal fit for IRB modelling. For this reason, each model was run on the same data set without model specific feature optimisation.

Various methodologies are assessed in predicting the default on this dataset, including Gradient Boosting Trees (“XGBoost”) and Neural Networks (“NN”) in comparison with Logistic Regression which acts as a benchmark of a traditional statistical method. As part of our analysis, we also investigated other Machine Learning models (Decision Trees, Random Forests, Support Vector Machines and Stochastic Gradient Descent Classifiers). The latter are not presented in the report, as the results are similar to those of the Logistic Regression model and Gradient Boosting Trees algorithm.

This research provides insights that can be assessed by Financial Institutions (“FI”) and specifically risk modelers, in expanding on the suite of methodologies already employed in risk modelling, in the light of the recommendations as presented in the EBA follow-up report. It also explores the trade-offs mentioned in the relevant report between model complexity, interpretability, and predictive accuracy in credit risk assessment. This study points out challenges in validation and peculiarities of the models that are relevant when applied to credit default forecasting.

The study utilises a comprehensive historical mortgage loan dataset. The dataset contains information on 50,000 U.S. residential mortgage borrowers across 60 timestamps. The target to be examined is a binary value (indicator) based on the probability of default predicted by the models (“default_time”).

The subsequent sections provide an in-depth exploration of the study's methodology and outcomes. Section 2. Data Preparation outlines the loading, feature construction, and preprocessing steps. Section 3. Model Building details the approach, incorporating Forward Stepwise Logistic Regression, Logistic Regression, XGBoost, and Artificial Neural Network Models. Section 4. Results and Discussion presents and analyzes the study's outcomes. The study concludes in Section 5. Conclusion and Future Work with a summary and suggestions for future research.  


SEKASA Technologies Contacts:

Sebastian Niehaus

Chief Technology Officer


Katharina Brunkhorst

Chief Executive Officer