Interpretability in Machine Learning

Since OpenAI released its large language model (LLM) chatbot, ChatGPT, machine learning, and artificial intelligence have entered mainstream discourse. The reaction has been a mix of skepticism, trepidation, and panic as the public comes to terms with how this technology will shape our future. Many fail to realize that machine learning already shapes the present, and many developers have been grappling with introducing this technology into products and services for years. Machine learning models are used to make increasingly important decisions – from aiding physicians in diagnosing serious health issues to making financial decisions for customers.

How it Works

I strongly dislike the term "artificial intelligence" because what the phrase describes is a mirage. There is no complex thought process at work – the model doesn't even understand the information it is processing. In a nutshell, OpenAI's model powering ChatGPT calculates the statistically most probable next word given the immediately surrounding context based on the enormous amount of information developers used to train the model.

A Model?

Let's say we compiled an accurate dataset containing the time it takes for an object to fall from specific heights:

Height Time
100 m 4.51 sec
200 m 6.39 sec
300 m 7.82 sec
400 m 9.03 sec
500 m 10.10 sec

What if we need to determine the time it takes for that object to fall from a distance we don't have data for? We build a model representing our data and either interpolate or extrapolate to find the answer:

{\displaystyle \ t=\ {\sqrt {\frac

Models for more complex calculations are often created with neural networks, mathematical systems that learn skills by analyzing vast amounts of data. A vast collection of nodes evaluate a specific function and pass the result to the next node. Simple neural networks can be expressed as mathematical functions, but as the number of variables and nodes increase, the model can become opaque to human comprehension.

The Interpretability Problem

Unfortunately, opening many complex models and providing a precise mathematical explanation for the decision is impossible. In other words, models often lack human interpretability and accountability. We often can't say, mathematically speaking, exactly how the network makes the distinction it does; we only know that its decisions align with those of a human. It doesn't require a keen imagination to see how this presents a problem in regulated, high-stakes decision-making.

Let's say John visits a lender and applies for a $37,000 small business loan. The lender needs to determine the probability that John will default on the loan, so they feed John's information into an algorithm, which computes a low score causing a denial. By law, the lender must provide John with a statement of the specific reasons for the denial. In this scenario, what do we tell John? Today, we can reverse engineer the model and provide a detailed answer, but even simple models of tomorrow will quickly test the limits of human understanding as computing resources become more powerful and less expensive. So how do we design accountable, transparent systems in the face of exponentially growing complexity?

Solutions?

Proponents of interpretable models suggest limiting the number of variables used in a model. The problem with this approach becomes apparent after considering how neural networks weigh variables. Models multiply results by coefficients that determine the relative importance of each variable or calculation before passing them to the next node. These coefficients and variables are often between 20 and 50 decimal places long, containing positive and negative numbers. While understanding the data underpinning a decision is essential, more is needed to truly elucidate a clear explanation. We can partially solve this problem by building tooling to abstract implementation details and provide a more intelligible overview of the model; however, this still only provides an approximation of the decision-making process.

Other thought leaders in machine learning argue that the most viable long-term solutions may not involve futile attempts to explain the model but should instead focus on auditing and regulating performance. Do large volumes of test data reveal statistical trends of bias? Does analyzing the training data show any gaps or irregularities that could result in harm? Unfortunately, this does not solve the issue in my hypothetical scenario above. I can't conclusively prove that my current decision was correct by pointing to past performance.

Technology is simply moving too rapidly to rely on regulations, which are, at best, a lagging remedy. We must pre-emptively work to build explainability into our models, but doing this in an understandable and actionable way will require rethinking our current AI architectures. We need forward-looking solutions that address bias at every stage of the development lifecycle with strong internal governance. Existing systems should undergo regular audits to ensure small changes haven't caused disparate impacts.

I can't help but feel very lucky to live in this transformative sliver of time, from the birth of the personal computer to the beginning of the internet age and the machine learning revolution. Today's developers and system architects have a massive responsibility to consider the impact of the technology they create. The future adoption of AI heavily depends on the trust we build in our systems today.