On this page
Randomised controlled trials (RCTs), when ethical, feasible and well‑conducted, are a strong method of evaluating program impact.
Part of the work of the Australian Centre of Evaluation is to increase the use of RCTs in government and improve the way we test policy and program effectiveness.
RCTs are widespread in other parts of the world. For example, a systematic review found that, between 1980 and 2016, more than 1,000 RCTs were undertaken in relation to education programs alone (Connolly et al 2018).
There are opportunities to increase the use of RCTs in the Australian context and improve the way we test the effectiveness of policy and programs.
Running an RCT – the basics
To consider what is involved in designing and implementing an RCT, let’s look at an example.
Suppose we have a new employment program, and we’d like to know how it impacts the employment status of those that participate in it.
To run an RCT, we would randomly assign individuals who are looking for work to two groups.
We flip a coin and those that get heads will participate in the new program, and those who get tails will participate in standard job seeker activities.
Six months later, we measure the employment rate in each group.
The difference in employment outcomes between the group assigned to the program and the group assigned to standard job seeker activities gives us the impact of the program.
Benefits of randomisation
Randomisation deals with a tricky problem called selection bias.
For example, you might worry that more highly motivated individuals will find their way into your new employment program, this extra motivation might improve their chances of finding a job, and this would exaggerate the benefit of the program.
However, because we randomised, our two groups, while not identical, will not be different in systematic ways.
We would expect that highly motivated individuals will be distributed evenly between the two groups.
This means that if we see the employment rate go up in the group that participated in the new program, we can attribute the change solely to the program.
Our randomised control group also protects us from external factors influencing our assessment of the program.
If Australia saw strong economic growth, we might expect more people to find employment regardless of our new program.
Without a control group, we would likely attribute the increased employment due to economic growth to our new program. However, because we randomised, and our two groups are very similar, we expect that both groups will see a similar increase in employment due to economic growth over the period.
The difference between the treatment and control group still gives us the pure impact of our program in this situation.
Human research is research conducted with people, or their data.
This research contributed enormously to human good, however, there are also historical cases where it has resulted in great harm to individuals.
Ethical and culturally appropriate
The Commonwealth Evaluation Policy states that evaluations need to be robust, ethical and culturally appropriate.
Ethical and culturally appropriate approaches should be considered in all evaluation activities, including for the collection, assessment and use of information.
This applies to RCTs, just as it does to other evaluation methods.
The National Health and Medical Research Council National Statement on Ethical Conduct in Human Research 2007 (Updated 2018), sets out requirements for the ethical design, review and conduct of human research in Australia, with a focus on preventing harm.
All human research conducted by the Australian Centre for Evaluation team will be subject to independent ethical review in line with the National Statement.
Independent ethics review
Independent ethics review provides a systematic approach to identifying ethical considerations associated with evaluations, including RCTs. These ethical considerations include:
- when and how to establish that individuals have consented to participate in the evaluation
- how to secure and manage any sensitive or personal data that are collected (including compliance with the Australian Privacy Principles)
- how to engage sensitively with vulnerable populations, such as children
- how to conduct the evaluation in a culturally sensitive way, and
- what ‘distress protocols’ should be put in place to help evaluation participants, if required.
Sometimes not ethical
Sometimes ethical considerations mean that an RCT is not possible.
For example, it may not be ethical to randomise people to an untreated control group when there is already strong evidence of program effectiveness, or strong community expectation that access to the program will be provided.
In most cases, however, the RCT design can be adjusted to address relevant ethical considerations. So it has still been possible to receive independent ethics approval for thousands of RCTs in fields as ethically sensitive as medicine, preventive health, education and international development.
While it is not feasible to conduct an RCT for all government policies and programs, there is plenty of scope to increase their use.
It is always worth consulting an expert before deciding if an RCT is feasible.
Randomised trials don’t need to be expensive. Public sector RCTs can be conducted at lower cost where:
- the program or service that is being tested is already being delivered, and
- where outcome data are already being collected from routine monitoring systems.
Feasibility of randomisation
Researchers have also developed sophisticated techniques to enable randomisation in many situations where you might think it was impossible:
- Randomise a group rather than an individual. For example, it may not be possible to randomise students in the same class to receive different lesson plans. Instead, researchers often randomise schools to implement different teaching practices. Similarly, we can randomise households or businesses – or even whole postcodes.
- Randomise in stages. For example, when the ATO delivered fiscal stimulus payments in 2008, it was unable to deliver them all at once so it staggered delivery to different regions over successive fortnights. Which region received the payments first? The answer: the delivery order was randomised, allowing researchers to evaluate the impact of the fiscal stimulus on consumer spending. An appealing feature of staggered roll‑outs is that everyone in the evaluation eventually has access to the program or payment.
- Randomise encouragement to join the program, rather than the program itself. This encouragement can then be used to assess the impact of the program among those that respond.
Any research method must be well‑designed to generate high‑quality evidence. This is equally true for RCTs.
Some key design considerations to keep in mind when designing a trial are detailed below. This discussion is not exhaustive, and many design issues can be subtle, so we suggest consulting an expert to ensure your RCT is robust.
Randomisation is the heart of any RCT, so it is important to ensure that the process you use to assign people (or groups) to different trial groups is truly random, otherwise our groups may not be comparable.
For example, assigning people to groups based on the time of day that they arrive at a service provider may introduce bias if those arriving in the morning differ systematically to those arriving in the afternoon.
Sample size and measurement
As with any quantitative research design, sample size and measurement matter. Having a large enough sample is key to ensuring that you’re able to see the impact of your program if there is one.
By conducting what we call a ‘power analysis’ we can get a handle on how many individuals we need to enrol in our RCT to detect a proposed effect of the program on an outcome of interest.
It’s also important that you are measuring the outcomes you care about directly and with as little error as possible. The more error in your measurement, the more people you will need to enrol in your RCT to overcome it.
It’s important to ensure that any benefits of treatment do not ‘spillover’ to the control group.
For example, some students in a single class are randomly allocated to receive extra maths instruction. Later, some students in the treatment group might help friends in the control group by sharing what they learnt during the extra instruction.
In effect, the control group students would be receiving some of the benefits of the instruction, which would obscure the full benefits of the program.
This issue can be avoided by randomly allocating extra maths instruction to entire schools rather than classrooms.
The ability to collect data on all participants, even those in the control group, is a key design consideration.
This is usually not a problem when outcome data are available in administrative systems, but missing outcome data can be a big issue for outcomes collected by survey.
Missing outcome data becomes a problem if missingness is related to which treatment group individuals are assigned to.
For example, if people in the control group are less likely to answer the survey than those in the treatment group, this can introduce bias and impact the credibility of an RCT.
It’s very important to think through strategies for minimising missing data during the design phase.
Preregistration and transparency
By publicly sharing information on the proposed RCT design, research hypotheses, data collection procedures, and analytic procedures, preregistration allows for greater transparency and reproducibility.
By pre‑committing to a specific set of analyses, preregistration greatly reduces the risk that researchers ‘fish’ or ‘mine’ the data for statistically significant results, or adjust their research hypotheses to suit the data.
It also lets others know you are working on a topic, reducing duplication and enabling collaboration.