On this page
Impact evaluation is used to understand if government programs and policies work and how well they work.
The term ‘impact’ is used by evaluators in different ways. We use the term to mean the average effect of a program or policy on the outcome or outcomes it was designed to influence.
We recognise that ‘impact evaluation’ can also refer to theory-based impact evaluation approaches however we do not discuss them in this guidance.
On this page of the toolkit, you can find more background information on impact evaluation. You can also jump directly to more information on impact evaluation methods: randomised controlled trials, and quasi-experimental methods.
Questions about program impact are causal in nature. We are asking if the program caused a change in outcomes. For example, did a school-based education program cause a change in average test scores?
There are other important evaluation questions, aside from assessing program impact. It is often appropriate to combine several evaluation methods to fully evaluate a program or policy. For more information, see How to evaluate.
Causal questions example
To see how impact evaluation can address causal questions, let’s consider a thought experiment.
You have a headache, and you want to know if taking a new medicine will make it go away.
We can clarify this question by imagining 2 worlds. These worlds are identical beside one thing: in one world you take the medicine, in the other you don’t.
The next step is to see if you still have a headache in the 2 imaginary words.
After taking the medicine in the first world, your headache disappears.
In the second world, where you didn’t take the medicine, your headache persists.
As the only difference between the 2 worlds was whether you took the medicine, you now know that the medicine worked for you. We call this the individual causal effect of the medicine.
Answering causal questions
The challenge with answering causal questions is that we never get access to 2 worlds. We never know the individual causal effect.
Instead, we only see if your headache persists in this world after you take one course of action.
If your headache went away after taking the medication, there could be many reasons why.
Perhaps it was the medicine, or it was the water you drank with it, or it just got better on its own.
Without the second world where you didn’t take the medicine, how can we be sure the change in your outcome was caused by the medicine?
We need to find a way to mimic the 2 identical worlds above to make valid comparisons.
Comparing the number of headaches that persist among a group of people who took the medicine and a group that didn’t seems like it might be the solution. However, this approach can seriously mislead us.
It can mislead because looking at different people in 2 groups is not the same as looking at the same people in 2 worlds where everything is identical. In the 2 groups, the people who take the medicine and the people who don’t might be very different in ways that matter.
For example, what if the people who took the medicine do it because their headache is worse, and thus less likely to go away? We might underestimate the impact of our medicine. What if the group that take the medicine drink more water with it, and it’s the water helps relieve the headache? Then we might overestimate the impact of our medicine. In both cases, simply comparing the number of headaches in our 2 groups would be misleading.
Creating credible counterfactuals
The field of causal inference looks for ways to solve this problem of misleading comparisons. The goal is to create a group that can stand in for the world that you didn’t observe (in this case, the world where you didn’t take the medicine). We call this a counterfactual.
The best way to create a credible counterfactual is to use a randomised controlled trial (RCT). By assigning people to groups using a coin flip, and then giving one group the medicine, we create 2 groups that are highly comparable. This is as close as we can get to creating 2 identical worlds. Since individuals were assigned at random, we are unlikely to get all the people with the worst headaches assigned to the group that takes the medicine.
There are other methods, sometimes called quasi‑experimental methods, that can be used to create counterfactuals, including:
- regression discontinuity designs
- instrumental variables.
- Running an RCT – the basics
- Benefits of randomisation
- Requirements for a well-conducted RCT
- Regression discontinuity design (RDD)
- Instrumental variables