Hidden figures
What determines a bank’s health

Can machine learning algorithm predict troubles with Ukrainian banks?


Dmytro Ostapchuk, Tymofii Brik

Machine learning algorithms have been around for dozens of years, but the real potential of the learning models was revealed only after the big data revolution and the sharp decrease in computing capacity cost. The financial sector, in particular banking, is among the leaders in terms of using machine learning methods underlying automated decision-making. For example, based on online testing, the ADM system determines the probability of a potential borrower’s default. Similar systems are used in health care and jurisprudence and certainly they occasionally make mistakes. Nevertheless, this is a great step forward in data analysis.

The article “Invisible connection: who owns the banking system of Ukraine” presented an analysis of the network of owners of solvent Ukrainian banks. A year later, we study the banking system again, adding new data and using new analytical methods. In this article, we will show how the banking system was cleansed after January 1, 2014, and identify the financial indicators having the greatest impact on a bank’s “health”.

The big bank cleansing

In the past three years, there was a twofold decrease in the number of banks in Ukraine; the branch network suffered a similar reduction; and the total assets in dollar terms were reduced more than three times. During that period, the banking system kept losing, on the average, seven banks in every three months. The reasons for this include devaluation of the national currency, economic decline, occupation of a part of Ukraine’s territory, and increase in the share of insolvent debtors.

The bank cleansing and the nationalization of Privatbank resolved a number of systemic problems. One can say that the banking sector is getting out of the crisis: the system has been profitable since the beginning of 2017, net interest yields and net commission income have been on the rise. At the same time, however, the “cleansing” brought about long-term challenges, for the state became the owner of about half the assets of the banking system, 62% of the population’s deposits, and four out of the top ten banks.

How can a bank’s “health condition” be determined? Traditionally, financial indicators and their dynamics are crucial for determining the current state of the banking system in general as well as of any individual banks. But the structure of the sector has been getting more complicated, the components are connected in non-linear ways, and the volume of data has been growing. Computerized learning methods can expose hidden patterns, especially in raw data, which cannot be achieved through the use of traditional analytical instruments.

Bank as a patient

In 2015a research group at Mount Sinai Hospital (New York) applied a deep learning algorithm to patient data. As a result of analyzing hidden patterns within the data, the researchers were able to predict the right diagnosis with a high probability. The model, which was named Deep Patient, singled out similar groups of patients and identified the disease based on a number of characteristics: age, gender, temperature, red and white blood cell count, etc. The model proved to be much more successful in fulfilling the task compared to its “algorithmic” predecessors, although the researchers acknowledge that they do not fully understand how it works.

A bank is like a patient, differing in that it is characterized not by temperature or white blood cell count but by financial indicators such as funds, subordinated debt, authorized capital, etc. Just like a patient, a bank can be healthy (solvent) or sick (insolvent). The regulator’s task is to identify healthy and sick banks; to that end, it is necessary to determine the characteristics having the greatest impact on its “health condition.” Selection and assessment of characteristics make up the first step in the “diagnostic procedure”.

From financial reports presented by the NBU, 67 characteristics describing the state of a bank can be selected. In this article, we used financial reporting data as of January 1, 2015. The category of a bank (solvent / insolvent) was identified as of January 1, 2017. For example, if a bank was solvent in 2015, but became insolvent in 2016 or 2017, we assigned it to the “insolvent” category. This is of course a controversial way to identify bank categories; but it enabled us to balance the data. Eventually, 93 banks were found to be solvent and 64 insolvent.

However, not all of the characteristics are equally informative. More than that, some characteristics can be closely interrelated; this makes them superfluous for a predictive model. Therefore, stage one was focused on selecting the most significant characteristics; stage two, on assessing the importance of each of them for classifying the banks.

To identify the most significant characteristics, the classical algorithm “Sequential Backward Selection” was chosen. Without going into details, the idea of the algorithm is as follows: it searches for such a subset of characteristics that provide for the best result of the model (e.g. the KNN classifier). The optimal number of characteristics providing for the model’s best result lies between 13 and 30. We used the smallest possible number, 13, since we wanted to minimize data dimensionality for easier interpretation of the results.

After selecting the optimal number of characteristics, the relative importance of each of them was assessed. To assess the importance of each characteristic, the random forest algorithm was chosen. The random forest algorithm is described as an “ensemble of decision trees” wherein each tree “votes” for assigning the object to a certain category. Usually, the “tree” is a stepwise classification procedure. Since different decisions can be taken at each stage, the model presumes branches. The numerous branches make up a tree; and the numerous trees make up a forest. Since each individual tree is a contributor to the procedure for predicting a specific result, the trees have come to be referred to as “decision trees.” For more details on this algorithm, see Leo Breiman's article.

At the output, weight is assigned to each characteristic: higher weight means greater impact on identification of a bank’s category (solvent / insolvent). In the below Table of Importance, the characteristics are ranked in descending order.

Feature Importance
Profit and loss*Allocations to loan impairment reserves**0.12
AssetsMonetary funds and their equivalents0.118
LiabilitiesCorporate funds0.107
AssetsLoans and debts of natural persons0.095
AssetsFixed and intangible assets0.094
AssetsCorporate loans and debts0.088
LiabilitiesCurrent profit tax liabilities0.077
LiabilitiesBanks’ resources0.074
LiabilitiesSubordinated debt0.068
AssetsNoncurrent assets***0.06
AssetsTrading securities0.053
AssetsBank’s emergency funds at the NBU0.025
Profit and lossIncome / (expenditures)0.023

* and other aggregate income

** and funds in other banks

*** saleable and disposable group assets

We confined ourselves to identifying the importance of characteristics, without studying the specific way in which each of them impacts the identification of a bank’s category. Nevertheless, we would like to emphasize a number of points.

“Allocations to loan impairment reserves” are in the first place. When a bank grants a loan, there is always a probability that it won’t be reimbursed; therefore, a special reserve is formed, which is necessitated by the credit risks taken by the bank. According to the NBU's financial stability report, the share of non-performing loans in April 2017 was as high as 57%; it is a serious burden for the banks. The restructuring of non-performing loans is a slow process; as a result, the risks are preserved.

Seven out of the thirteen characteristics fall into the category of assets. Bank assets are bank’s resources and funds classified by types of investment and of use for profit-making. Worthy of note are “Fixed and intangible assets” which are in the high fifth place. Fixed assets are tangible ones, such as land, buildings, or computer hardware. Intangible assets include software, patents and copyrights. The volume of fixed and intangible assets of solvent banks is, on the average, three times higher compared to insolvent banks. Most likely, the higher is the number of cars, ATMs, and computer applications owned by a bank, the lower is the probability of it becoming insolvent.

Four of the thirteen characteristics fall into the category of liabilities. High places are occupied by “Corporate funds” (3) and “Banks’ resources” (8). Both these characteristics reflect the level of economic agents’ confidence in the financial institution: higher amounts deposited at the bank by other banks and companies are associated with a higher probability of the bank’s solvency.

The advantages of the above approach consist in its relative simplicity and application speed. The approach can be reduced to three steps: modeling starts with all of the characteristics; then the number of characteristics is reduced to provide for optimal performance of the prediction model; finally, the importance of each characteristic is assessed and priority ones are singled out. This sort of approach can be applied in parallel with existing ones or used as an add-on.

Along with the advantages, the algorithmic approach also has certain limitations. The choice of algorithm and the quality of the constructed model may always be called in question. Moreover, problems faced by a bank can be seen not only from financial indicators but also from difficulties experienced by the bank’s shareholders in their non-banking business, enhanced PR activities of the financial institution, or media attacks on the central bank.

Well aware of the weaknesses of the above analysis, we will nevertheless summarize the results obtained. In the past three years, the banking network has shrunk noticeably: about 50% of the banks have been removed from the market. On the one hand, such “cleansing” enabled the NBU to resolve a number of systemic problems and make the banking sector profitable; on the other hand, the state has significantly expanded its presence on the market, which is a long-term challenge for the system. A highly important attribute is allocations to loan impairment reserves. Most of the characteristics (financial indicators) having the strongest impact on a bank’s “health” fall into the category of assets. Two other important characteristics are “Corporate funds” and “Banks’ resources” – these reflect economic agents’ confidence in the financial institution.