Understanding Evaluation

“He uses statistics as a drunken man uses lamp posts – for support rather than illumination” – Scottish writer Andrew Lang

Let’s be honest- evaluation is overwhelming. You begin with a well-intentioned need to ensure that your program or organization is functioning at its highest level and suddenly find that you don’t even know where to begin.

Sometimes being deeply involved in an organization makes it hard to have perspective. If your brain is constantly thinking of all the different projects within your school or business, then it’s hard to focus on what needs to be evaluated first.

That’s where R.E.M. Consultants can help—our goal is fidelity to you, the organization, and to honest, effective evaluation. We don’t have skin in the game, so our goal is simply to help you be the best that you can be!

Data has the power to reveal a unique underlying story. If used properly, an analysis can uncover what you are doing well and what can use improvement. Rather than use data to reinforce or contradict an already existing opinion about a program, R.E.M. Consulting can help you to use data to discover hidden potential that you didn’t even know was there.

“It’s surprising how often we are able to perform an immediate analysis to answer vital questions with data the organization already has. On several occasions, additional data collection wasn’t even necessary after we discovered what was already available to our evaluators.” ~Dr. Karen Larwin, Ph.D.

Before we work on developing your plan, let’s address some myths about evaluation:

Myth: Quantitative and qualitative analyses are one-size-fits-all strategies that can be blindly applied to any situation.

Fact: Just because one organization used one kind of statistical analysis does not mean that strategy will work to answer your questions. In fact, just because you used a particular strategy in the past doesn’t mean that strategy will work for you in the future. Program, process, and impact evaluations are immensely personal. You may start with a broad overview with general data analysis, but inevitably your questions and/or area of concern will emerge. From there, a more precise analysis will be needed if you truly want to get to the bottom of your issues.

Myth: Evaluation of programs, processes, and/or impact only needs to be done when funders demand it. Otherwise, it’s not needed.

Fact: Evaluation should be a regular part of your practice. It should never end with a report to stakeholders. It should guide constant efforts for improvement. Essentially, evaluation is a starting point not an ending point. R.E.M. Consulting can help organizational leaders determine what requires immediate evaluation and possible long-term evaluation measures that can become a regular part of operations to inform sustained growth.

Myth: Evaluation is just a way for someone to punish an organization by identifying areas of under-performance or failure

Fact: Although evaluation is routinely used to assure funders that money is being well spent and to keep stakeholders happy, the biggest benefit is to get an accurate snapshot of current practices and how things are going. The best organizations are always looking for ways to improve, and you cannot improve until you fully understand your strengths and weaknesses.

Myth: Our program is already underway, so it’s too late to get an evaluator involved.

Fact: It’s true that the earlier your evaluator is involved in your program design and implementation the better. However, any evaluator involvement is better than none. R.E.M. Consulting can help during any phase of your program. In an ideal scenario, evaluation within your organization would follow the logic model below. However, our company can jump in at any point to facilitate helpful evaluation, strategy, presentations, or future planning.

Myth: Internal and external validity are essentially the same thing.

Fact: Actually, they are totally different things!


The concepts of internal and external validity each have to do with the logical ability to draw certain kinds of conclusions about the meaning of the data collected within an experimental design.  Internal validly has to do with whether or not the design used for the collection of observations and measurements logically allows for conclusions about the causal relationships between variables.  Specifically, can we conclude that variation in measures of the dependent variable was caused by variation in the independent variable?  External validity has to do with whether or not the results of a particular research study or design are generalizable to the population from which the sample was selected.  Specifically, can we conclude that the effects of the independent variable on the behavior of the research participants within a particular study also apply to, or represent, the effects the independent variable would have in the larger population?  Internal validity is about the logical ability to draw inferences about cause and effect; external validity is about the logical ability to draw inferences about the likelihood of the treatment effect applying to a group of people larger than those observed within the study.

Internal validity is intimately connected to the design of the study and the procedures used to make observations and record measurements. The value of true experimental designs is in their ability to maximize internal validity.  In the simplest case, subjects are assigned to experimental and control groups which are treated equally in every way with only one exception; the exception being that the experimental group participants are exposed to some treatment hypothesized to affect observations and measurements of behavior, while the participants in the control group are not.  In such circumstances, any systematic variation in the scores observed between the experimental and control groups can be attributed to the treatment effect alone, and a conclusion that the variation in treatments caused the variation in scores is warranted.  This is often referred to as the “logic of the experiment.”

However, to construct circumstances where this simple logical inference can be made with some validity requires a high level of control of all sources of variation that could potentially co-exist with the treatment effect and confound the logic of this conclusion.  These extraneous sources of variation, or “confounds,” must be controlled either by eliminating all such variation from the research study, or by holding the influence of the extraneous variables constant across the experimental and control groups, so that their effects on the experimental and control group measures are equivalent and do not contaminate the differential effect of the independent variable across the groups.  In other words, if an unintended or extraneous source of variation were to affect the scores of the experimental and control groups equally, it would not interfere with the ability to assess whether or not the variation in the independent variable produced variation in the dependent variable.

Extraneous variation that is not controlled and not anticipated is often mistaken as variation produced by the independent variable and erroneously interpreted as evidence supporting the experimental hypothesis.  Such mistakes of interpretation amount to Type I errors in the process of hypothesis testing.  As such errors are often regarded as the most serious errors of interpretation in scientific research, methods to try and reduce the possibility of such errors by controlling extraneous sources of variation are essential.

There are several general strategies that can be used to control for extraneous or confounding variables.  Some of these focus on sources of variation the researcher might be able to anticipate in advance on theoretical grounds, and others attempt to control extraneous variation more generally; variation of the sort that cannot necessarily be anticipated in advance.  One method to eliminate a potential source of extraneous variation that can be anticipated is to eliminate that source of variation from the study altogether.  For example, if a researcher expected that gender might function as an extraneous variable, but was not interested in examining the effects of gender specifically, gender could be eliminated as a variable from the study altogether by including research participants of only one gender.  A study that includes only male, or only female, research participants precludes the potential confounding effects of gender.

Another method to control for potential extraneous variation that can be anticipated in advance is to simply build the suspected extraneous variable into the study design as an additional independent variable. For example, in the case of gender, the treatment and control group observations and measures could be made and collected with a sample of male research participants and then again with female participants.  Any differential effects of the independent variable on the dependent variable across the treatment conditions for the male participants could be compared to any potential similar effects found for the female participants.  This “building into the study” method for controlling for anticipated sources of extraneous variation creates a factorial design that also allows for the examination of potential interaction amongst independent variables.

One of the most common, and effective, methods for controlling unanticipated extraneous variation is the use of random assignment.  Nonrandom assignment methods for assigning research participants to treatment and control conditions often introduce additional sources of variation across the treatment and control groups that cannot be differentiated from any potential variation caused by the independent variable, and thus confound the logic of the experiment.  In other words, nonrandom assignment may result in treatment and control group measures that differ for reasons other than the effect of the independent variable.  However, random methods of assigning research participants to groups effectively control for all the unanticipated ways in which groups of participants may otherwise differ from one another; for systematic or chance reasons.  Random assignment, in combination with a sufficient sample size, results in groups balanced on just about any characteristic or source of variation that might be present.  Random assignment does not eliminate extraneous variation, but it balances the effects of those sources of extraneous variation that may exist across the treatment groups so that all groups are affected equally and additional differential effects on the group measures are avoided; differential effects that would otherwise confound the logical ability to access the differential effects of the independent variable.

External validity is often related to internal validity in an inverse fashion.  The very same methods that allow for greater experimental control, and the isolation of the effects of one single variable upon another, maximizing internal validity, often create a rather peculiar and artificial experimental situation that is not representative of the kinds of circumstances in which most behavior typically occurs.  The research study may tell us with some confidence that one variable very likely causes effects in the other in this sort of experimental context, but it may leave serious doubts about whether these same cause-effect relationships exist in the non-experimental environments in which behavior typically occurs.  Thus, the factors that maximize internal validity may simultaneously compromise external validity.

However, attention to external validity is also important.  The ultimate goal of research is not to learn how individuals behave or perform in constrained, artificial, unrepresentative environments.  Instead, it is to learn how people behave in the “real word.”  Thus, studies or research programs that do not attend to optimizing external validity in any fashion have little practical value other than service as fanciful exercises in theoretical abstraction.


Youngstown Ohio

Call Us