The Kolmogorov Smirnov (KS) Metric
If you’ve never heard of the KS metric, you’re not alone. Outside of the scoring world (and a few other highly specialized statistical professions), few have. However, after reading this post, hopefully, KS will be your best friend in the world of predictive analytics.
You’ve probably heard of the 80-20 rule, or Pareto Principle, wherein “20% of customers represent 80% of sales,” or “80% of complaints are generated by 20% of employees.” Business managers often swear by it because of its uncanny ability to describe the behavior of dramatically different situations and point the way to resource optimization opportunities. The obvious approach to increase sales or reduce complaints would be to direct high-impact actions at the relevant 20%, rather than the overall population. The goal here being: achieve most of the desired effect at a fraction of the cost. You don’t need to retrain all employees; retraining just 20% will address 80% of the complaints, provided you know which 20% to retrain!
The good news is, if you understand the 80-20 rule, you understand KS. While the mathematical definition is not particularly illuminating – KS is the maximum separation of the cumulative distribution function of the scores for the positive and negative populations – the actual application is quite intuitive. For example, if we have a model predictive of mortgage defaults, and for some score threshold X, the population of all mortgages scoring below X include 60% of all future defaults as well as 20% of all non-defaults; then the KS is 40% (for X at maximum separation). If a lender had instituted a policy of declining all otherwise-approvable mortgage applications scoring below X, they will have averted 60% of their defaults at the cost of also turning away 20% of their non-defaulting business. The person setting policy guidelines can make an informed decision if that’s a net positive or not — and find the best threshold “X” to maximize business objectives.
Figure 5 (below) shows an actual scoring model developed by Betterview to predict natural peril losses based on property aerial images and no other information. While this particular model was built solely for illustrative purposes, and not optimized for any particular business objectives, it was built with real data in realistic conditions. The data used for this example model included multi-structure residential properties (such as condos and HOAs). As with all correctly reported model results, structures used in model buildings were not used in this evaluation. As can be seen in figure 5, if, for example, the insurer was to take some remedial action (such as a requirement to fix roof deficiencies) on 19% of the policies; they would have addressed 91% of the structures destined to suffer a loss — providing a KS of 72%, even better than 80-20!
Scoring in a Mature Credit Environment
KS is wonderful, but how do we know that our model captures high-severity losses. In other words, capturing, say, 91% of policies that generated losses is nice, but what we really want is to capture loss dollars, not loss count. There is more good news on this front: Scoring models that are trained to find rare events – such as policies that will generate natural peril losses – tend to be better at finding severe situations more than borderline ones. In other words, a structure that is a disaster-waiting-to-happen will have more visible malady indicators than one that merely have some modest flaws; when a peril, such as a windstorm, hits, that first structure will both have a higher risk and higher likely severity of losses. Figure 6 (below) illustrates this point. In this figure, we show the same model from figure 5, but refine the traditional KS metric into our new measure – the Dollar-KS – where rather than asking what percentage of claims are captured at any given score threshold, we ask what percent of Dollar losses are captured. As can be seen, the Dollar-KS, at 81%, is even higher than the KS.
And now for the bad news: How do we know that the predictive power is not already captured and priced by the existing actuarial process? This is very much our opening question – after thousands of years of evolutionary learning, is there really any predictive capacity that is not, in some substantial way, already accounted for?
To answer this question, we need to make another refinement to the KS metric – the Dollar-Dollar-KS – where we consider the Dollar impact not just on the loss side, but also on the premium side. To restate our original use example: If we take a remedial action affecting policies representing some X% of our dollar premiums, what % of our dollar losses will be addressed. The most extreme action, to be reserved only for the most egregious situations, would be to decline underwriting (or renewal) altogether. Alternatively, we could think of taking action from the opposite direction — what percent of properties are so low risk that we can expedite the underwriting and renewal process and save time and money? Because we are now measuring our impact in terms of premiums affected, we know that all previously predictable losses, already captured in the premiums, are accounted for. Any predictive power shown on the Dollar-Dollar-KS metric would thus have to be strictly incremental.
So, the fifty-billion-dollar question: Can aerial imagery economically predict natural peril losses above and beyond the existing mature actuarial process?
The answer, happily, is a resounding yes; but there is a big caveat: The scoring model must be specifically trained to maximize Dollar-Dollar-KS. If trained to just maximize predictive power, it will find the easiest predictive power available – which, incidentally, is the predictive power already captured by the mature actuarial process. Figure 7 (below) shows the same scoring model as in figures 5 and 6 – a model that was, indeed, trained to maximize Dollar-Dollar-KS, but this time showing what percent of premiums need to be actioned to address percent of losses. As can be seen, the stratospheric KS, as measured without considering premiums has come far down — to 31%. However, and this is our ultimate punchline, this 31% KS is entirely incremental to risks already captured in the premiums. It is truly new money on the table!
(It should be noted that this example model was trained specifically to maximize Dollar-Dollar-KS, and not to match any specific use case. As it happened, the maximum occurred at the high-end of the scale, meaning that if a use case was to be designed around this model, it will be an efficiency play — finding policies that should be fast-tracked — rather than a loss-reduction play. Of course, in actual customer products, models are optimized to meet use case needs, not the other way around).
Getting from a predictive model to a hard value proposition is never easy, nor is bridging the divide between what data scientists can model and optimize and what business managers can turn into provable bottom-line impact. This is particularly challenging in a mature environment where a lot of the predictive power that models may find might have already been incorporated, through long-term evolutionary and competitive forces, into the prevailing state of the arts. In this post we have shown how, for natural peril claim losses prediction from aerial imagery, it is indeed possible to find substantial provable incremental dollar-denominated value.