The concepts of internal and external validity each have to do with the logical ability to draw certain kinds of conclusions about the meaning of the data collected within an experimental design. Internal validly has to do with whether or not the design used for the collection of observations and measurements logically allows for conclusions about the causal relationships between variables. Specifically, can we conclude that variation in measures of the dependent variable was caused by variation in the independent variable? External validity has to do with whether or not the results of a particular research study or design are generalizable to the population from which the sample was selected. Specifically, can we conclude that the effects of the independent variable on the behavior of the research participants within a particular study also apply to, or represent, the effects the independent variable would have in the larger population? Internal validity is about the logical ability to draw inferences about cause and effect; external validity is about the logical ability to draw inferences about the likelihood of the treatment effect applying to a group of people larger than those observed within the study.
Internal validity is intimately connected to the design of the study and the procedures used to make observations and record measurements. The value of true experimental designs is in their ability to maximize internal validity. In the simplest case, subjects are assigned to experimental and control groups which are treated equally in every way with only one exception; the exception being that the experimental group participants are exposed to some treatment hypothesized to affect observations and measurements of behavior, while the participants in the control group are not. In such circumstances, any systematic variation in the scores observed between the experimental and control groups can be attributed to the treatment effect alone, and a conclusion that the variation in treatments caused the variation in scores is warranted. This is often referred to as the “logic of the experiment.”
However, to construct circumstances where this simple logical inference can be made with some validity requires a high level of control of all sources of variation that could potentially co-exist with the treatment effect and confound the logic of this conclusion. These extraneous sources of variation, or “confounds,” must be controlled either by eliminating all such variation from the research study, or by holding the influence of the extraneous variables constant across the experimental and control groups, so that their effects on the experimental and control group measures are equivalent and do not contaminate the differential effect of the independent variable across the groups. In other words, if an unintended or extraneous source of variation were to affect the scores of the experimental and control groups equally, it would not interfere with the ability to assess whether or not the variation in the independent variable produced variation in the dependent variable.
Extraneous variation that is not controlled and not anticipated is often mistaken as variation produced by the independent variable and erroneously interpreted as evidence supporting the experimental hypothesis. Such mistakes of interpretation amount to Type I errors in the process of hypothesis testing. As such errors are often regarded as the most serious errors of interpretation in scientific research, methods to try and reduce the possibility of such errors by controlling extraneous sources of variation are essential.
There are several general strategies that can be used to control for extraneous or confounding variables. Some of these focus on sources of variation the researcher might be able to anticipate in advance on theoretical grounds, and others attempt to control extraneous variation more generally; variation of the sort that cannot necessarily be anticipated in advance. One method to eliminate a potential source of extraneous variation that can be anticipated is to eliminate that source of variation from the study altogether. For example, if a researcher expected that gender might function as an extraneous variable, but was not interested in examining the effects of gender specifically, gender could be eliminated as a variable from the study altogether by including research participants of only one gender. A study that includes only male, or only female, research participants precludes the potential confounding effects of gender.
Another method to control for potential extraneous variation that can be anticipated in advance is to simply build the suspected extraneous variable into the study design as an additional independent variable. For example, in the case of gender, the treatment and control group observations and measures could be made and collected with a sample of male research participants and then again with female participants. Any differential effects of the independent variable on the dependent variable across the treatment conditions for the male participants could be compared to any potential similar effects found for the female participants. This “building into the study” method for controlling for anticipated sources of extraneous variation creates a factorial design that also allows for the examination of potential interaction amongst independent variables.
One of the most common, and effective, methods for controlling unanticipated extraneous variation is the use of random assignment. Nonrandom assignment methods for assigning research participants to treatment and control conditions often introduce additional sources of variation across the treatment and control groups that cannot be differentiated from any potential variation caused by the independent variable, and thus confound the logic of the experiment. In other words, nonrandom assignment may result in treatment and control group measures that differ for reasons other than the effect of the independent variable. However, random methods of assigning research participants to groups effectively control for all the unanticipated ways in which groups of participants may otherwise differ from one another; for systematic or chance reasons. Random assignment, in combination with a sufficient sample size, results in groups balanced on just about any characteristic or source of variation that might be present. Random assignment does not eliminate extraneous variation, but it balances the effects of those sources of extraneous variation that may exist across the treatment groups so that all groups are affected equally and additional differential effects on the group measures are avoided; differential effects that would otherwise confound the logical ability to access the differential effects of the independent variable.
External validity is often related to internal validity in an inverse fashion. The very same methods that allow for greater experimental control, and the isolation of the effects of one single variable upon another, maximizing internal validity, often create a rather peculiar and artificial experimental situation that is not representative of the kinds of circumstances in which most behavior typically occurs. The research study may tell us with some confidence that one variable very likely causes effects in the other in this sort of experimental context, but it may leave serious doubts about whether these same cause-effect relationships exist in the non-experimental environments in which behavior typically occurs. Thus, the factors that maximize internal validity may simultaneously compromise external validity.
However, attention to external validity is also important. The ultimate goal of research is not to learn how individuals behave or perform in constrained, artificial, unrepresentative environments. Instead, it is to learn how people behave in the “real word.” Thus, studies or research programs that do not attend to optimizing external validity in any fashion have little practical value other than service as fanciful exercises in theoretical abstraction.