Experimental designs are often touted as the most “rigorous” of all research designs or, as the “gold standard” against which all other designs are judged. In one sense, they probably are. If you can implement an experimental design well (and that is a big “if” indeed), then the experiment is probably the strongest design with respect to internal validity. Why? Recall that internal validity is at the center of all causal or cause-effect inferences. When you want to determine whether some program or treatment causes some outcome or outcomes to occur, then you are interested in having strong internal validity. Essentially, you want to assess the proposition:
If X, then Y
or, in more colloquial terms:
If the program is given, then the outcome occurs
Unfortunately, it’s not enough just to show that when the program or treatment occurs the expected outcome also happens. That’s because there may be lots of reasons, other than the program, for why you observed the outcome. To really show that there is a causal relationship, you have to simultaneously address the two propositions:
If X, then Y
If not X, then not Y
Or, once again more colloquially:
If the program is given, then the outcome occurs
If the program is not given, then the outcome does not occur
If you are able to provide evidence for both of these propositions, then you’ve in effect isolated the program from all of the other potential causes of the outcome. You’ve shown that when the program is present the outcome occurs and when it’s not present, the outcome doesn’t occur. That points to the causal effectiveness of the program.
Think of all this like a fork in the road. Down one path, you implement the program and observe the outcome. Down the other path, you don’t implement the program and the outcome doesn’t occur. But, how do we take both paths in the road in the same study? How can we be in two places at once? Ideally, what we want is to have the same conditions – the same people, context, time, and so on – and see whether when the program is given we get the outcome and when the program is not given we don’t. Obviously, we can never achieve this hypothetical situation. If we give the program to a group of people, we can’t simultaneously not give it! So, how do we get out of this apparent dilemma?
Perhaps we just need to think about the problem a little differently. What if we could create two groups or contexts that are as similar as we can possibly make them? If we could be confident that the two situations are comparable, then we could administer our program in one (and see if the outcome occurs) and not give the program in the other (and see if the outcome doesn’t occur). And, if the two contexts are comparable, then this is like taking both forks in the road simultaneously! We can have our cake and eat it too, so to speak.
That’s exactly what an experimental design tries to achieve. In the simplest type of experiment, we create two groups that are “equivalent” to each other. One group (the program or treatment group) gets the program and the other group (the comparison or control group) does not. In all other respects, the groups are treated the same. They have similar people, live in similar contexts, have similar backgrounds, and so on. Now, if we observe differences in outcomes between these two groups, then the differences must be due to the only thing that differs between them – that one got the program and the other didn’t.
OK, so how do we create two groups that are “equivalent”? The approach used in experimental design is to assign people randomly from a common pool of people into the two groups. The experiment relies on this idea of random assignment to groups as the basis for obtaining two groups that are similar. Then, we give one the program or treatment and we don’t give it to the other. We observe the same outcomes in both groups.
The key to the success of the experiment is in the random assignment. In fact, even with random assignment we never expect that the groups we create will be exactly the same. How could they be, when they are made up of different people? We rely on the idea of probability and assume that the two groups are “probabilistically equivalent” or equivalent within known probabilistic ranges.
So, if we randomly assign people to two groups, and we have enough people in our study to achieve the desired probabilistic equivalence, then we may consider the experiment to be strong in internal validity and we probably have a good shot at assessing whether the program causes the outcome(s).
But there are lots of things that can go wrong. We may not have a large enough sample. Or, we may have people who refuse to participate in our study or who drop out part way through. Or, we may be challenged successfully on ethical grounds (after all, in order to use this approach we have to deny the program to some people who might be equally deserving of it as others). Or, we may get resistance from the staff in our study who would like some of their “favorite” people to get the program. Or, they mayor might insist that her daughter be put into the new program in an educational study because it may mean she’ll get better grades.
The bottom line here is that experimental design is intrusive and difficult to carry out in most real world contexts. And, because an experiment is often an intrusion, you are to some extent setting up an artificial situation so that you can assess your causal relationship with high internal validity. If so, then you are limiting the degree to which you can generalize your results to real contexts where you haven’t set up an experiment. That is, you have reduced your external validity in order to achieve greater internal validity.
In the end, there is just no simple answer (no matter what anyone tells you!). If the situation is right, an experiment can be a very strong design to use. But it isn’t automatically so. My own personal guess is that randomized experiments are probably appropriate in no more than 10% of the social research studies that attempt to assess causal relationships.
Experimental design is a fairly complex subject in its own right. I’ve been discussing the simplest of experimental designs – a two-group program versus comparison group design. But there are lots of experimental design variations that attempt to accomplish different things or solve different problems. In this section you’ll explore the basic design and then learn some of the principles behind the major variations.