Advances in Quasi-Experimentation
Reprinted from Trochim, W. (Ed.), (1986). Editor’s Notes. Advances in quasi-experimental design and analysis. New Directions for Program Evaluation Series, Number 31, San Francisco, CA: Jossey-Bass.
The intent of this volume is to update, perhaps even to alter, our thinking about quasi-experimentation in applied social research and program evaluation. Since Campbell and Stanley (1963) introduced the term quasi-experiment, we have tended to see this area as involving primarily two interrelated topics: the theory of the validity of casual inferences and a taxonomy of the research designs that enable us to examine causal hypotheses. We can see this in the leading expositions of quasi-experimentation (Campbell and Stanley, 1963, 1966; Cook and Campbell, 1979) as well as in the standard textbook presentations of the topic (Kidder and Judd, 1986; Rossi and Freeman, 1985), where it is typical to have separate sections or chapters that discuss validity issues first and then proceed to distinguishable quasi-experimental designs (for example, the pretest-posttest nonequivalent group design, the regression-discontinuity design, the interrupted time series design). My first inclination in editing this volume was to emulate this tradition, beginning the volume with a chapter on validity and following it with a chapter for each of the major quasi-experimental designs that raised the relevant conceptual and analytical issues and discussed recent advances. But, I think, such an approach would have simply contributed to a persistent confusion about the nature of quasi-experimentation and its role in research.
Instead, this volume makes the case that we have moved beyond the traditional thinking on quasi-experiments as a collection of specific designs and threats to validity toward a more integrated, synthetic view of quasi-experimentation as part of a general logical and epistemological framework for research. To support this view that the notion of quasi-experimentation is evolving toward increasing integration, I will discuss a number of themes that seem to characterize our current thinking and that cut across validity typologies and design taxonomies. This list of themes may also be viewed as a tentative description of the advances in our thinking about quasi-experimentation in social research.
The Role of Judgment
One theme that underlies most of the others and that illustrates our increasing awareness of the tentativeness and frailty of quasi-experimentation concerns the importance of human judgment in research. Evidence bearing on a causal relationship emerges from many sources, and it is not a trivial matter to integrate or resolve conflicts or discrepancies. In recognition of this problem of evidence, we are beginning to address causal inference as a psychological issue that can be illuminated by cognitive models of the judgmental process (see Chapter One of this volume and Einhom and Hogarth, 1986). We are also recognizing more clearly the sociological bases of scientific thought (Campbell, 1984) and the fact that science is at root a human enterprise. Thus, a positivist, mechanistic view is all but gone from quasi-experimental thinking, and what remains is a more judgmental and more scientifically sensible perspective.
The Case for Tailored Designs
Early expositions of quasi-experimentation took a largely taxonomic approach, laying out a collection of relatively discrete research designs and discussing how weak or strong they were for valid causal inference. Almost certainly, early proponents recognized that there was a virtual infinity of design variations and that validity was more complexly related to theory and context than their presentations implied. Nonetheless, what seemed to evolve was a “cookbook” approach to quasi-experimentation that involved “choosing” a design that fit the situation and checking off lists of validity threats.
In an important paper on the coupling of randomized and nonrandomized design features, Boruch (1975) explicitly encouraged us to construct research designs as combinations of more elemental units (for example, assignment strategies, measurement occasions) based on the specific contextual needs and plausible alternative explanations for a treatment effect. This move toward hybrid, tailored, or patched-up designs, which involved suggesting how such designs could be accomplished, is one in which I have been a minor participant (Trochim and Land, 1982; Trochim, 1984). It is emphasized by Cordray in Chapter One of this volume. The implication for current practice is that we should focus on the advantages of different combinations of design features rather than on a relatively restricted set of prefabricated designs. In teaching quasi-experimental methods, we need to break away from a taxonomic design mentality and emphasize design principles and issues that cut across the traditional distinctions between true experiments, nonexperiments, and quasi-experiments.
The Crucial Role of Theory
Quasi-experimentation and its randomized experimental parent have been criticized for encouraging an atheoretical “black box” mentality of research (see, for instance, Chen and Rossi, 1984; Cronbach, 1982). Persons are assigned to either complex molar program packages or (often) to equally complex comparison conditions. The machinery of random assignment (or our quasi-experimental attempts to approximate random assignment) are the primary means of defining whether the program has an effect. This ceteris paribus mentality is inherently atheoretical and noncontextual: It assumes that the same mechanism works in basically the same way whether we apply it in mental health or criminal justice, income maintenance or education.
There is nothing inherently wrong with this program-group-versus-comparison-group logic. The problem is that it may be a rather crude, uninformative approach. In the two-group case, we are simply creating a dichotomous input into reality. If we observe a posttest difference between groups, it could be explained by this dichotomous program-versus-comparison-group input or by any number of alternative explanations, including differential attrition rates, intergroup rivalry and communication, initial selection differences among groups, or different group histories. We usually try to deal with these alternative explanations by ruling them out through argument, additional measurement, patched-up design features, and auxiliary analysis. Cook and Campbell (1979), Cronbach (1982), and others strongly favor replication of treatment effects as a standard for judging the validity of a causal assertion, but this advice does little to enhance the validity and informativeness within individual studies or program evaluations.
Chen and Rossi (1984, p. 339) approached this issue by advocating increased attention to social science theory: “not the global conceptual schemes of the grand theorists but much more prosaic theories that are concerned with how human organizations work and how social problems are generated.” Evaluators have similarly begun to stress the importance of program theory as the basis for causal assessment (for example, Bickman, in press). These developments allow increased emphasis to be placed on the role of pattern matching (Trochim, 1985) through the generation of more complex theory-driven predictions that, if corroborated, allow fewer plausible alternative explanations for the effect of a program. Because appropriate theories may not be readily available, especially for the evaluation of contemporary social programs, we are developing methods and processes that facilitate the articulation of the implicit theories which program administrators and stakeholder groups have in mind and which presumably guide the formation and implementation of the program (Trochim, 1985). This theory-driven perspective is consonant with Mark’s emphasis in Chapter Three on the study of causal process and with Cordray’s discussion in Chapter One on ruling in the program as opposed to ruling out alternative explanations.
Attention to Program Implementation
A theory-driven approach to quasi-experimentation will be futile unless we can demonstrate that the program was in fact carried out or implemented as the theory intended. Consequently, we have seen the development of program implementation theory (for example, McLaughlin, 1984) that directly addresses the process of program execution. One approach emphasizes the development of organizational procedures and training systems that accurately transmit the program and that anticipate likely institutional sources of resistance. Another strategy involves the assessment of program delivery through program audits, management information systems, and the like. This emphasis on program implementation has further obscured the traditional distinction between process and outcome evaluation. At the least, it is certainly clear that good quasi-experimental outcome evaluation cannot be accomplished without attending to program processes, and we are continuing to develop better notions of how to combine these two efforts.
The Importance of Quality Control
Over and over, our experience with quasi-experimentation has shown that even the best-laid research plans often go awry in practice, sometimes with disastrous results. Thus, over the past decade we have begun to pay increasing attention to the integrity and quality of our research methods in real-world settings. One way of achieving this goal is to incorporate techniques used by other professions – accounting, auditing, industrial quality control – that have traditions in data integrity and quality assurance (Trochim and Visco, 1985). For instance, double bookkeeping can be used to keep verifiable records of research participation. Acceptance sampling can be an efficient method for checking accuracy in large data collection efforts, where an exhaustive examination of records is impractical or excessive in cost. These issues are particularly important in quasi-experimentation, where it is incumbent upon the researcher to demonstrate that sampling, measurement, group assignment, and analysis decisions do not interact with program participation in ways that can confound the final interpretation of results.
The Advantages of Multiple Perspectives
We have long recognized the importance of replication and systematic variation in research. In the past few years, Cook (1985) and colleagues Shadish and Houts (Chapter Two in this volume) have articulated a rationale for achieving systematic variation that they term critical multiplism. This perspective rests on the notion that no single realization will ever be sufficient for understanding a phenomenon with validity. Multiple realizations – of research questions, measures, samples, designs, analyses, replications, and so on – are essential for convergence on the truth of a matter. However, such a varied approach can become a methodological and epistemological Pandora’s box unless we apply critical judgment in deciding which multiples we will emphasize in a study or set of studies (Chapter Two in this volume and Mark and Shotland, 1985).
Evolution of the Concept of Validity
The history of quasi-experimentation is inseparable from the development of the theory of the validity of causal inference. Much of this history has been played out through the ongoing dialogue between Campbell and Cronbach concerning the definition of validity and the relative importance that should be attributed on the one hand to the establishment of a causal relationship and on the other hand to its generalizability. In the most recent major statement in this area, Cronbach (1982) articulated the UTOS model, which conceptually links the units, treatments, observing operations and settings in a study into a framework that can be used for establishing valid causal inference. The dialogue continues in Chapter Four of this volume, where Campbell attempts to dispel persistent confusion about the types of validity by tentatively relabeling internal validity as local molar causal validity and external validity as the principle of proximal similarity. It is reasonable to hope that we might achieve a clearer consensus on this issue, as Mark argues in Chapter Three, where he attempts to resolve several different conceptions of validity, including those of Campbell and Cronbach.
Development of Increasingly Complex Realistic Analytic Models
In the past decade, we have made considerable progress toward complicating our statistical analyses to account for increasingly complex contexts and designs. One such advance involves the articulation of causal models of the sort described by Reichardt and Gollob in Chapter Six, especially models that allow for latent variables and that directly model measurement error Joreskog and Sorbom, 1979).
Another important recent development involves analyses that address the problem of selection bias or group nonequivalence – a central issue in quasi-experiments because random assignment is not used and there is no assurance that comparison groups are initially equivalent (Rindskopf’s discussion in Chapter Five). At the same time, there is increasing recognition of the implications of not attending to the correct unit of analysis when analyzing the data and of the advantages and implications of conducting analyses at multiple levels. Thus, when we assign classrooms to conditions but analyze individual student data rather than classroom aggregates, we are liable to get a different view of program effects than we are when we analyze at the classroom level, as Shadish, Cook, and Houts argue in Chapter Two. Other notable advances that are not explicitly addressed in this volume include the development of log linear, probit, and logit models for the analysis of qualitative or nominal level outcome variables (Feinberg, 1980; Forthofer and Lehnen, 1981) and the increasing proliferation of Bayesian statistical approaches to quasi-experimental contexts (Pollard, 1986).
Parallel to the development of these increasingly complex, realistic analytic models, cynicism has deepened about the ability of any single model or analysis to be sufficient. Thus, in Chapter Six Reichardt and Gollob call for multiple analyses to bracket bias, and in Chapter Five Rindskopf recognizes the assumptive notions of any analytic approach to selection bias. We have virtually abandoned the hope of a single correct analysis, and we have accordingly moved to multiple analyses that are based on systematically distinct assumptional frameworks and that rely in an increasingly direct way on the role of judgment.
All the developments just outlined point to an increasingly realistic and complicated life for quasi-experimentalists. The overall picture that emerges is that all quasi-experimentation is judgmental. It is based on multiple and varied sources of evidence, it should be multiplistic in realization, it must attend to process as well as to outcome, it is better off when theory driven, and it leads ultimately to multiple analyses that attempt to bracket the program effect within some reasonable range.
In one sense, this is hardly a pretty picture. Our views about quasi-experimentation and its role in causal inference are certainly more tentative and critical than they were in 1965 or perhaps even in 1979. But, this more integrated and complex view of quasi-experimentation has emerged directly from our experiences in the conduct of such studies. As such, it realistically represents our current thinking about one of the major strands in the evolution of social research methodology in this century.
Bickman, L. (ed.). Program Theory and Program Evaluation. New Directions for Program Evaluation, no. 33. San Francisco: Jossey-Bass, in press.
Boruch, R. F. “Coupling Randomized Experiments and Approximations to Experiments in Social Program Evaluation.” Sociological Methods and Research, 1975, 4 (1), 31-53.
Campbell, D. T. “Can We Be Scientific in Applied Social Science?” In R. F. Conner and others (eds.), Evaluation Studies Review Annual. Vol. 9. Beverly Hills, Calif.: Sage, 1984.
Campbell, D. T., and Stanley, J.C, “Experimental and Quasi-Experimental Designs for Research on Teaching.” In N. L. Gage (ed.), Handbook of Research on Teaching. Chicago: Rand McNally, 1963.
Campbell, D. T., and Stanley, J. C. Experimental and Quasi-experimental Designs for Research. Chicago: Rand McNally, 1966.
Chen, H.,and Rossi, P. A. “Evaluating with Sense:The Theory-Driven Approach.” In R.F. Conner and others (eds.), Evaluation Studies Review Annual, Vol. 9. Beverly Hills, Calif.: Sage, 1985.
Cook, T.D. “Postpositivist Critical Multiplism.” In R. L. Shotland and M. M, Mark (eds.), Social Science and Social Policy. Beverly Hills, Calif.: Sage, 1985.
Cook, T.D. and Campbell, D.T. (1979). Quasi-Experimentation: Design and Analysis for Field Settings. Rand McNally, Chicago, Illinois.
Cronbach, L.J. Designing Evaluations of Educational and Social Programs. San Francisco: Jossey-Bass, 1982.
Einhorn, H. J., and Hogarth, R. M. “Judging Probable Cause.” Psychological Bulletin, 1986 99, 3-19.
Feinberg, S. E. The Analysis of Cross-Classified Categorical Data. (2nd ed.) Cambridge, Mass.: M.I.T. Press, 1980.
Forchofer, R. N., and Lehnen, R. G. Public Program Analysis: A New Categorical Data Approach. Belmont, Calif.: Wadsworth, 1981.
Joreskog, K. C., and Sorbom, D. Advances in Factor Analysis and Structural Equation Models. Cambridge: Abt Books, 1979.
Kidder, L. H.,and Judd,C. M. Research Methods in Social Relations. (5th ed.) New York: Holt, Rinehart & Winston, 1986.
Mark, M. M., and Shotland, R. L. “Toward More Useful Social Science.” In R. L. Shotland and M. M. Mark(eds.),Social Science and Social Policy. Beverly Hills,Calif.: Sage, 1985.
McLaughlin, M. W. “Implementation Realities and Evaluation Design.” In R. L. Shotiand and M. M. Mark (eds.), Social Science and Social Policy. Beverly Hills, Calif.: Sage, 1984.
Pollard,W. E.Bayesian Statistics for Evaluation Research. Beverly Hills, Calif.:Sage, 1986.
Rossi, P. H., and Freeman, H. E. Evaluation: A Systematic Approach. (3rd ed.) Beverly Hills, Calif.: Sage, 1985.
Trochim, W. Research Design for Program Evaluation: The Regression-Discontinuity Approach. Beverly Hills, Calif.: Sage, 1984.
Trochim, W. “Pattern Matching, Validity, and Conceptualization in Program Evaluation.” Evaluation Review, 1985, 9 (5), 575-604.
Trochim, W., and Land, D. “Designing Designs for Research.” The Researcher. 1982, 1 (1), 1-6.
Trochim, W., and Visco, R. “Quality Control in Evaluation.” In D. S. Cordray (ed.), Utilizing Prior Research in Evaluation Planning. New Directions for Program Evaluation, no. 27. San Francisco: Jossey-Bass, 1985.