Marek Gruszczynski Logit scoring in the banking practice and the problem of coincidence
Credit score is a composite index used for the evaluation of risk associated with a customer's application for a loan or other financial product. The paper considers one of the statistical models serving as a tool for devising the application scoring in a bank. This is the binomial logit model, in which the dependent variable Y is binary (0 or 1) and the probability of Y = 1 is defined in terms of a logistic distribution. For an application, Y = 1 denotes that the loan is booked and Y = 0 means that the loan is refused. The model's exogenous variables are the predictors of Y.
The feature of the logit model is that the sign of the parameter for a nonnegative exogenous variable X shows the direction of its effect on Y. If this parameter is positive, then the higher values of X are to be associated with the higher chances that Y = 1. For a negative parameter: the higher values of X correspond to the lower probability that Y = 1. This means that the process of model's specification may take into account the postulate of coincidence, i.e. the equality of signs of the exogenous variables' parameters with the corresponding simple correlation coefficient.
Paper presents the numerical illustration (based on simulated data) of the approach to specification of the logit model for 1200 applications, of which 600 were accepted and 600 were rejected. The intended predictors (exogenous variables) include a set of numerical variables, measured on a ratio scale, such as applicant's annual income or number of years in work. Also, the predictors include the dummy (binary) variables such as education level, marital status etc.
Firstly, the predictors are selected according to the following rules:
- Weak correlation among predictors,
- Strong correlation between predictors and the Y variable.
This is done by the inspection of the matrix showing the level and direction of association between all possible pairs of variables X and Y. If at least one of the variables in a pair is binary, the simple correlation is replaced by a measure of association. There are two proposals of such measure:
- Yule association coefficient, if both variables are binary,
- The outcome of the t-test for means if one of the variables is binary and second variable is measured on a ratio scale.
Secondly, logit models for the selected sets of predictors are estimated and examined in terms of the coincidence property. The postulate of coincidence is not widely accepted in econometrics. However, in the case when both Y and X are binary, the requirement of coincidence is justified. Consider two loan applications, for which the values of all X variables are the same exept for one X variable that is equal 0 for the first application and 1 for the second application. If the association between this variable and Y is positive, then it should be expected that the probability of Y = 1 is higher for the second application as compared to the first one. This means that the parameter for this variable in the logit model should be positive, i.e. the model should have the property of coincidence. Therefore, the use of coincidence property may improve the process of finding optimal rule of credit scoring.
|