It’s time for machine learning to grow up.
Machine learning has proved valuable in many areas of technology and business. ML has a seat at the table, but for ML to truly mature, we need to know we can trust it.
For ML to integrate more fully and contribute everything it can, people need to understand and trust ML models. We need to see inside the black box of non-linear models.
But wait, you say, isn’t deep learning inherently un-explainable, and what would it even mean to explain a model anyways?
Why explainable ML?
Machine learning is amazing, right, so why bother explaining it?
Establishing trust
A radiologist is using an AI program to assist in diagnosing cancer patients.
The AI program is highly accurate, but the patient has an unusual combination of symptoms and the clinician is unsure if the program is accounting for them.
Can the AI show the doctor where on the radiogram it sees evidence for its diagnosis? Can the AI tell the doctor what other diagnoses are possible? How likely are they? Can it integrate other symptom reports and explain how?
These kinds of questions are critical for the overall success of machine learning.
Troubleshooting
Deep learning is powerful, but any specific deep learning model is imperfect.
Case in point:
Deep learning models can be easily be fooled by image contexts it hasn’t seen before (see above). If the model is able to explain *how* it’s getting fooled, fixing the model gets easier.
Ethics and regulatory compliance
There is a lot of justified skepticism about machine learning, especially in relation to fields like finance and criminal justice.
To ensure a model complies with our ethics and laws, it needs to be able to explain itself enough that we trust it.
What does it mean to explain a model?
Linear models like linear and logistic regression don’t need any more explanation. It’s those fiendishly complex models–SVMs, random forests, and most of all neural nets, that need to be explained.
If you feed an image to Inception and Inception says “toilet seat: 88%”, you might want to ask, “why, Inception, why?”.
Inception might respond by highlighting the relevant areas of the image.

Best approaches
The above method (k-LIME) is accomplished by linearizing the model around the original image. (Published in 2016 by Ribeiro et al.)
Listen: Data Skeptic podcast with the author of k-LIME
In LIME, to explain an image classification, the image is first split into superpixels via a quickshift algorithm. Then, a random number of superpixels are sampled for inclusion in the sampled image. The discarded superpixels are set to gray and the model predicts the resulting image. The output of the prediction will be a probability assigned to each output class. Superpixels that are the most associated with a given output would be selected as the explanation.
The process is repeated several times (say, 1,000). Then a regression of the 1,000 sampled superpixel mosaics on the predicted output classes can be performed. The resulting coefficients would describe the choices the classifier and provide a local explanation. The results in the above image show the process working mostly as expected.
LIME can also operate on non-linear text classifiers like SVM or neural nets. For text, the original passage is sampled for inclusion/exclusion of each word (or character) and a local linear model is fit to these samples. An explanation may looke like the following:

Another related method is Shapley Additive Explanations (SHAP) method. Shapley explanations are based on retraining the model without individual features and re-evaluating on the example in question.
Shapley is more sophisticated and newer than LIME, but is more complicated to understand. Shapley was able to provide better explanations on handwriting samples and tends to have performance gains. In short, the SHAP explanations takes the average of the difference between an instance with and without a given feature, weighted amongst all subsets of the feature space.
The SHAP paper uses some old game theory proofs to claim it is the only additive explanation that has some desirable properties like local accuracy and consistency.
A great summary of the current state of the art (updated November 2018) is written by engineers at H2O.ai.
Not all efforts to create explainable AI succeed. The importance of the field has attracted many entrants, including some less successful ones.
One example is xNN (explainable neural nets). Written by researchers at Wells Fargo, the method uses a linear combination of sub-models. Ostensibly, the coefficients on each submodel would provide sufficient explanations.
This fails, because the submodels themselves need good explanations. There is no guarantee that an xNN trained in this way would yield explainable submodels. One possible extenstion would be to restrict the different submodels to only certain subsets of variables. These models may perform worse than an unconstrained model, however.
In general, the model-agnostic methods like Shapley and LIME are likely to be more useful in the long run, since they impose no restrictions on the model to be explained.
Where it’s being used and why
Explainable AI is now a marquee feature in the H2O.ai suite of products. The team at H2O has already implemented both k-LIME (published 2016) and Shapley explanations (published 2017).

Explainable AI is important to industries like finance. Consider the following quote from a story about ML service provider ZestFinance their customer, Prestige Auto Finance:
Like all lenders, Prestige worried that using artificial intelligence to make loan decisions could prompt regulators to claim it was using a black box, within which loan decisions couldn’t be adequately explained.
“We were skeptical at the onset,” Warnick said. “The black box is a major concern from a compliance level and an executive level because we want to understand what’s going on.”
Armstrong acknowledged that this comes up a lot with financial institutions.
“When you put a model in production, you have to make sure the models isn’t discriminating and there’s no disparate impact,” Armstrong said. “If a lender declines an applicant, it needs to flesh out what were the key variables that triggered the decline decision and furnish those adverse action notes to the consumer in the form of an adverse action letter.”
Zest Finance/American Banker
Is ML biased?
ML has been criticized for being too opaque in fields like finance and criminal justice. Without proper oversight, the argument goes, these black boxes could end up being biased against minorities or the poor.
On the other hand, if a more powerful model is able to identify credit-worthy individuals turned down by traditional lenders, or identify non-dangerous prisoners for early parole, that seems like a win-win.
This debate is likely to continue to rage. The concern about opaque risk assessment metrics may seem new. However, even traditional forms of risk assessment like credit scores are not fully transparent. Equifax’s FICO scores are still calculated using a secret formula.
The verdict
ML explainability is an important next step in the broader use in many areas like medicine and finance.
“ML is a black box” is becoming less true by the day. Significant work and attention has gone into the development and deployment of these methods. It’s striking how quickly explainable AI has gone from research project (November 2017) to feature (August 2018) to the feature executives need to have (September 2018). It’s the next frontier of machine learning.