Single Group Threats
The Single Group Case
What is meant by a “single group” threat? Let’s consider two single group designs and then consider the threats that are most relevant with respect to internal validity. The top design in the figure shows a “posttest-only” single group design. Here, a group of people receives your program and afterwards is given a posttest. In the bottom part of the figure we see a “pretest-posttest” single group design. In this case, we give the participants a pretest or baseline measure, give them the program or treatment, and then give them a posttest.
To help make this a bit more concrete, let’s imagine that we are studying the effects of a compensatory education program in mathematics for first grade students on a measure of math performance such as a standardized math achievement test. In the post-only design, we would give the first graders the program and then give a math achievement posttest. We might choose not to give them a baseline measure because we have reason to believe they have no prior knowledge of the math skills we are teaching. It wouldn’t make sense to pretest them if we expect they would all get a score of zero. In the pre-post design we are not willing to assume that they have no prior knowledge. We measure the baseline in order to determine where the students start out in math achievement. We might hypothesize that the change or gain from pretest to posttest is due to our special math tutoring program. This is a compensatory program because it is only given to students who are identified as potentially low in math ability on the basis of some screening mechanism.
The Single Group Threats
With either of these scenarios in mind, consider what would happen if you observe a certain level of posttest math achievement or a change or gain from pretest to posttest. You want to conclude that the outcome is due to your math program. How could you be wrong? Here are some of the ways, some of the threats to interval validity that your critics might raise, some of the plausible alternative explanations for your observed effect:
It’s not your math program that caused the outcome, it’s something else, some historical event that occurred. For instance, we know that lot’s of first graders watch the public TV program Sesame Street. And, we know that in every Sesame Street show they present some very elementary math concepts. Perhaps these shows cause the outcome and not your math program.
The children would have had the exact same outcome even if they had never had your special math training program. All you are doing is measuring normal maturation or growth in understanding that occurs as part of growing up – your math program has no effect. How is this maturation explanation different from a history threat? In general, if we’re talking about a specific event or chain of events that could cause the outcome, we call it a history threat. If we’re talking about all of the events that typically transpire in your life over a period of time (without being specific as to which ones are the active causal agents) we call it a maturation threat.
This threat only occurs in the pre-post design. What if taking the pretest made some of the children more aware of that kind of math problem – it “primed” them for the program so that when you began the math training they were ready for it in a way that they wouldn’t have been without the pretest. This is what is meant by a testing threat – taking the pretest (not getting your program) affects how participants do on the posttest.
Like the testing threat, this one only operates in the pretest-posttest situation. What if the change from pretest to posttest is due not to your math program but rather to a change in the test that was used? This is what’s meant by an instrumentation threat. In many schools when they have to administer repeated testing they don’t use the exact same test (in part because they’re worried about a testing threat!) but rather give out “alternate forms” of the same tests. These alternate forms were designed to be “equivalent” in the types of questions and level of difficulty, but what if they aren’t? Perhaps part or all of any pre-post gain is attributable to the change in instrument, not to your program. Instrumentation threats are especially likely when the “instrument” is a human observer. The observers may get tired over time or bored with the observations. Conversely, they might get better at making the observations as they practice more. In either event, it’s the change in instrumentation, not the program, that leads to the outcome.
Mortality doesn’t mean that people in your study are dying (although if they are, it would be considered a mortality threat!). Mortality is used metaphorically here. It means that people are “dying” with respect to your study. Usually, it means that they are dropping out of the study. What’s wrong with that? Let’s assume that in our compensatory math tutoring program we have a nontrivial dropout rate between pretest and posttest. And, assume that the kids who are dropping out are the low pretest math achievement test scorers. If you look at the average gain from pretest to posttest using all of the scores available to you at each occasion, you would include these low pretest subsequent dropouts in the pretest and not in the posttest. You’d be dropping out the potential low scorers from the posttest, or, you’d be artificially inflating the posttest average over what it would have been if no students had dropped out. And, you won’t necessarily solve this problem by comparing pre-post averages for only those kids who stayed in the study. This subsample would certainly not be representative even of the original entire sample. Furthermore, we know that because of regression threats (see below) these students may appear to actually do worse on the posttest, simply as an artifact of the non-random dropout or mortality in your study. When mortality is a threat, the researcher can often gauge the degree of the threat by comparing the dropout group against the nondropout group on pretest measures. If there are no major differences, it may be more reasonable to assume that mortality was happening across the entire sample and is not biasing results greatly. But if the pretest differences are large, one must be concerned about the potential biasing effects of mortality.
A regression threat, also known as a “regression artifact” or “regression to the mean” is a statistical phenomenon that occurs whenever you have a nonrandom sample from a population and two measures that are imperfectly correlated. OK, I know that’s gibberish. Let me try again. Assume that your two measures are a pretest and posttest (and you can certainly bet these aren’t perfectly correlated with each other). Furthermore, assume that your sample consists of low pretest scorers. The regression threat means that the pretest average for the group in your study will appear to increase or improve (relatively to the overall population) even if you don’t do anything to them – even if you never give them a treatment. Regression is a confusing threat to understand at first. I like to think about it as the “you can only go up from here” phenomenon. If you include in your program only the kids who constituted the lowest ten percent of the class on the pretest, what are the chances that they would constitute exactly the lowest ten percent on the posttest? Not likely. Most of them would score low on the posttest, but they aren’t likely to be the lowest ten percent twice. For instance, maybe there were a few kids on the pretest who got lucky on a few guesses and scored at the eleventh percentile who won’t get so lucky next time. No, if you choose the lowest ten percent on the pretest, they can’t get any lower than being the lowest – they can only go up from there, relative to the larger population from which they were selected. This purely statistical phenomenon is what we mean by a regression threat. There is a more detailed discussion of why regression threats occur and how to estimate them.
How do we deal with these single group threats to internal validity? While there are several ways to rule out threats, one of the most common approaches to ruling out the ones listed above is through your research design. For instance, instead of doing a single group study, you could incorporate a control group. In this scenario, you would have two groups: one receives your program and the other one doesn’t. In fact, the only difference between these groups should be the program. If that’s true, then the control group would experience all the same history and maturation threats, would have the same testing and instrumentation issues, and would have similar rates of mortality and regression to the mean. In other words, a good control group is one of the most effective ways to rule out the single-group threats to internal validity. Of course, when you add a control group, you no-longer have a single group design. And, you will still have to deal with threats two major types of threats to internal validity: the multiple-group threats to internal validity and the social threats to internal validity.