February 27, 2023

Introduction to Counterfactual Explanations in Machine Learning

As a potentially world-changing technology, AI is increasing its hold on most industries. However, the vast majority of Machine Learning models are black boxes, and often enough this opaqueness generates distrust, coming in conflict with our own nature - that of meaning-seeking creatures.

Just imagine having your credit application rejected with no further explanation given. Or an insurance claim denied with no further ado.

The frustration is immense.

Adding a layer of explainability to the process is therefore required to implement an efficient Machine Learning strategy. The benefits span a number of dimensions, including increased user trust and the ability to ensure legal compliance easier.

In Machine Learning, explanations can take a number of forms, and one of the most valuable types consists of counterfactuals.

Understanding counterfactual explanations

A counterfactual explanation indicates the smallest change in feature values that can translate to a different outcome. It shows us what should be different in an input instance, in order to obtain a predefined output.

For example, if we roll back to the credit rejection scenario, the counterfactual explanation would consist of what should have been different in order to have the application accepted.

Would you need to:

  • Earn 20,000 more per year?
  • Have fewer credit cards?
  • Pay your debt in full?

Imagine an explanation that sounds like this:

Your application was rejected because your annual income is $45,000. If your current income had instead been $55,000, your application would have been approved.

As you can see, counterfactuals provide easy-to-understand explanations for the decisions made by algorithms. It helps users trust the decision, but it can also be of fundamental use for the service provider as well. In this particular instance, the bank could check if the decision is justified and respectful of the regulations.


Counterfactual explanations are human-friendly because they allow users to understand what needs to change in order for them to get a predefined outcome. As per their name, thinking in ‘counterfactuals’ requires imagining a reality that contradicts the existent facts, and this type of reasoning is specific to humans.

Moreover, counterfactuals are also selective, which means that they usually focus on a limited number of features only. By adding new information to what is known, they are informative and favor creative problem-solving.

On top of this, counterfactuals can also offer important information regarding the decision-making process, allowing an organization to see if the process is not based on bias and follows legal regulations.


As per Molnar, 2022, one of the disadvantages of counterfactuals is what he calls the ‘Rashomon effect’. In short, since each counterfactual tells a different story regarding a predefined output, the process of selecting one over the other might become confusing.

In order to reach outcome C, you could choose to change feature A. But, at the same time, changing feature B might also lead to the same output. This multitude of contradicting truths makes it challenging to choose the best one. This is inconvenient due to the challenge of selecting which explanation is better suited for a certain situation.

For example, imagine that you’re using a machine learning model to predict how much you can charge for an apartment you want to rent out.

As per the AI, you can currently charge 600 dollars for it. Nevertheless, you would like to see if there’s anything you can do in order to charge more. For this purpose, you decide to use counterfactual explanations. You generate the report and there are 50 counterfactuals that can be taken into consideration.

Some of them, might not be actionable - like increasing the size of your apartment. But after removing these from the equation, you might still end up with a high number of viable options like allowing pets, changing your windows and doors, carpeting the interiors, and so forth. They all might be relatively good, yet very different.

Rolling back to the credit scenario, it would be difficult for the bank to choose which explanations to feedback to the applicant because, on a case-to-case basis, different people may find different suggestions useful.

Why use counterfactual explanations?

What makes counterfactuals a great tool for explainability is their capacity to be easily understood.

They’re clear and concise: if a specific feature’s value changes, the prediction will in turn change to the predefined output.

At the same time, they do not require the user to have access to the model or the data behind it. Generating counterfactuals only requires the model’s prediction function, which can be accessed via web API.

Once implemented, counterfactuals work to increase the level of model transparency by allowing users to see what’s happening behind the AI’s black box.

Looking to make your ML model more explainable?

Lumenova AI can help your organization open up the algorithmic black box and make Trustworthy AI a key pillar of your Machine Learning strategy.

Get in touch with us for a custom demo.

Make your AI ethical, transparent, and compliant - with Lumenova AI

Book your demo