Program evaluation

Program evaluation is a systematic method for collecting, analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness (whether they do what they are intended to do) and efficiency (whether they are good value for money).

In the public, private, and voluntary sector, stakeholders might be required to assess—under law or charter—or want to know whether the programs they are funding, implementing, voting for, receiving or opposing are producing the promised effect. To some degree, program evaluation falls under traditional cost–benefit analysis, concerning fair returns on the outlay of economic and other assets; however, social outcomes can be more complex to assess than market outcomes, and a different skillset is required. Considerations include how much the program costs per participant, program impact, how the program could be improved, whether there are better alternatives, if there are unforeseen consequences, and whether the program goals are appropriate and useful. Evaluators help to answer these questions. Best practice is for the evaluation to be a joint project between evaluators and stakeholders.

A wide range of different titles are applied to program evaluators, perhaps haphazardly at times, but there are some established usages: those who regularly use program evaluation skills and techniques on the job are known as program analysts; those whose positions combine administrative assistant or secretary duties with program evaluation are known as program assistants, program clerks (United Kingdom), program support specialists, or program associates; those whose positions add lower-level project management duties are known as Program Coordinators.

The process of evaluation is considered to be a relatively recent phenomenon. However, planned social evaluation has been documented as dating as far back as 2200 BC. Evaluation became particularly relevant in the United States in the 1960s during the period of the Great Society social programs associated with the Kennedy and Johnson administrations.

Program evaluations can involve both quantitative and qualitative methods of social research. People who do program evaluation come from many different backgrounds, such as sociology, psychology, economics, social work, as well as political science subfields such as public policy and public administration who have studied a similar methodology known as policy analysis. Some universities also have specific training programs, especially at the postgraduate level in program evaluation, for those who studied an undergraduate subject area lacking in program evaluation skills.

Conducting an evaluation

Program evaluation may be conducted at several stages during a program's lifetime. Each of these stages raises different questions to be answered by the evaluator, and correspondingly different evaluation approaches are needed. Rossi, Lipsey and Freeman (2004) suggest the following kinds of assessment, which may be appropriate at these different stages:

Assessment of the need for the program
Assessment of program design and logic/theory
Assessment of how the program is being implemented (i.e., is it being implemented according to plan? Are the program's processes maximizing possible outcomes?)
Assessment of the program's outcome or impact (i.e., what it has actually achieved)
Assessment of the program's cost and efficiency

Assessing needs

A needs assessment examines the population that the program intends to target, to see whether the need as conceptualized in the program actually exists in the population; whether it is, in fact, a problem; and if so, how it might best be dealt with. This includes identifying and diagnosing the actual problem the program is trying to address, who or what is affected by the problem, how widespread the problem is, and what are the measurable effects that are caused by the problem. For example, for a housing program aimed at mitigating homelessness, a program evaluator may want to find out how many people are homeless in a given geographic area and what their demographics are. Rossi, Lipsey and Freeman (2004) caution against undertaking an intervention without properly assessing the need for one, because this might result in a great deal of wasted funds if the need did not exist or was misconceived.

Needs assessment involves the processes or methods used by evaluators to describe and diagnose social needs

Evaluators need to also answer the ’how’ and ‘what’ questions

Perform a ‘gap’ analyses
:Evaluators need to compare current situation to the desired or necessary situation. The difference or the gap between the two situations will help identify the need, purpose and aims of the program.
Identify priorities and importance
:In the first step above, evaluators would have identified a number of interventions that could potentially address the need e.g. training and development, organization development etc. These must now be examined in view of their significance to the program's goals and constraints. This must be done by considering the following factors: cost effectiveness (consider the budget of the program, assess cost/benefit ratio), executive pressure (whether top management expects a solution) and population (whether many key people are involved).
Identify causes of performance problems and/or opportunities
:When the needs have been prioritized the next step is to identify specific problem areas within the need to be addressed. And to also assess the skills of the people that will be carrying out the interventions.
Identify possible solutions and growth opportunities
:Compare the consequences of the interventions if it was to be implemented or not.

Needs analysis is hence a very crucial step in evaluating programs because the effectiveness of a program cannot be assessed unless we know what the problem was in the first place.

Assessing program theory

The program theory, also called a logic model, knowledge map, or impact pathway, is an assumption, implicit in the way the program is designed, about how the program's actions are supposed to achieve the outcomes it intends. This 'logic model' is often not stated explicitly by people who run programs, it is simply assumed, and so an evaluator will need to draw out from the program staff how exactly the program is supposed to achieve its aims and assess whether this logic is plausible. For example, in an HIV prevention program, it may be assumed that educating people about HIV/AIDS transmission, risk and safe sex practices will result in safer sex being practiced. However, research in South Africa increasingly shows that in spite of increased education and knowledge, people still often do not practice safe sex. Therefore, the logic of a program which relies on education as a means to get people to use condoms may be faulty. This is why it is important to read research that has been done in the area.

Explicating this logic can also reveal unintended or unforeseen consequences of a program, both positive and negative. The program theory drives the hypotheses to test for impact evaluation. Developing a logic model can also build common understanding amongst program staff and stakeholders about what the program is actually supposed to do and how it is supposed to do it, which is often lacking (see Participatory impact pathways analysis). Of course, it is also possible that during the process of trying to elicit the logic model behind a program the evaluators may discover that such a model is either incompletely developed, internally contradictory, or (in worst cases) essentially nonexisistent. This decidedly limits the effectiveness of the evaluation, although it does not necessarily reduce or eliminate the program.

Creating a logic model is a wonderful way to help visualize important aspects of programs, especially when preparing for an evaluation. An evaluator should create a logic model with input from many different stake holders. Logic Models have 5 major components: Resources or Inputs, Activities, Outputs, Short-term outcomes, and Long-term outcomes Creating a logic model helps articulate the problem, the resources and capacity that are currently being used to address the problem, and the measurable outcomes from the program. Looking at the different components of a program in relation to the overall short-term and long-term goals allows for illumination of potential misalignments. Creating an actual logic model is particularly important because it helps clarify for all stakeholders: the definition of the problem, the overarching goals, and the capacity and outputs of the program. Many of which these elements rely on the prior correct implementation of other elements, and will fail if the prior implementation was not done correctly. This was conclusively demonstrated by Gene V. Glass and many others during the 1980s. Since incorrect or ineffective implementation will produce the same kind of neutral or negative results that would be produced by correct implementation of a poor innovation, it is essential that evaluation research assess the implementation process itself. Otherwise, a good innovative idea may be mistakenly characterized as ineffective, where in fact it simply had never been implemented as designed.

Assessing the impact (effectiveness)

The impact evaluation determines the causal effects of the program. This involves trying to measure if the program has achieved its intended outcomes, i.e. program outcomes.

Program outcomes

An outcome is the state of the target population or the social conditions that a program is expected to have changed. According to Mouton (2009) measuring the impact of a program means demonstrating or estimating the accumulated differentiated proximate and emergent effect, some of which might be unintended and therefore unforeseen.

Outcome measurement serves to help understand whether the program is effective or not. It further helps to clarify understanding of a program. But the most important reason for undertaking the effort is to understand the impacts of the work on the people being served.

Determining causation

Perhaps the most difficult part of evaluation is determining whether the program itself is causing the changes that are observed in the population it was aimed at. Events or processes outside of the program may be the real cause of the observed outcome (or the real prevention of the anticipated outcome).

Causation is difficult to determine. One main reason for this is self selection bias. People select themselves to participate in a program. For example, in a job training program, some people decide to participate and others do not. Those who do participate may differ from those who do not in important ways. They may be more determined to find a job or have better support resources. These characteristics may actually be causing the observed outcome of increased employment, not the job training program.

Evaluations conducted with random assignment are able to make stronger inferences about causation. Randomly assigning people to participate or to not participate in the program, reduces or eliminates self-selection bias. Thus, the group of people who participate would likely be more comparable to the group who did not participate.

However, since most programs cannot use random assignment, causation cannot be determined. Impact analysis can still provide useful information. For example, the outcomes of the program can be described. Thus the evaluation can describe that people who participated in the program were more likely to experience a given outcome than people who did not participate.

If the program is fairly large, and there are enough data, statistical analysis can be used to make a reasonable case for the program by showing, for example, that other causes are unlikely.

Reliability, validity and sensitivity

It is important to ensure that the instruments (for example, tests, questionnaires, etc.) used in program evaluation are as reliable, valid and sensitive as possible. According to Rossi et al. (2004, p. 222), 'a measure that is poorly chosen or poorly conceived can completely undermine the worth of an impact assessment by producing misleading estimates. Only if outcome measures are valid, reliable and appropriately sensitive can impact assessments be regarded as credible'.

Reliability

The reliability of a measurement instrument is the 'extent to which the measure produces the same results when used repeatedly to measure the same thing' (Rossi et al., 2004, p. 218). These steps can happen in a cycle framework to represent the continuing process of evaluation.

Evaluating collective impact

Though program evaluation processes mentioned here are appropriate for most programs, highly complex non-linear initiatives, such as those using the collective impact (CI) model, require a dynamic approach to evaluation. Collective impact is "the commitment of a group of important actors from different sectors to a common agenda for solving a specific social problem" and typically involves three stages, each with a different recommended evaluation approach:

Early phase: CI participants are exploring possible strategies and developing plans for action. Characterized by uncertainty.

Recommended evaluation approach: Developmental evaluation to help CI partners understand the context of the initiative and its development: "Developmental evaluation involves real time feedback about what is emerging in complex dynamic systems as innovators seek to bring about systems change."

Middle phase: CI partners implement agreed upon strategies. Some outcomes become easier to anticipate.

Recommended evaluation approach: Formative evaluation to refine and improve upon the progress, as well as continued developmental evaluation to explore new elements as they emerge. Formative evaluation involves "careful monitoring of processes in order to respond to emergent properties and any unexpected outcomes."

Later phase: Activities achieve stability and are no longer in formation. Experience informs knowledge about which activities may be effective.

Recommended evaluation approach: Summative evaluation "uses both quantitative and qualitative methods in order to get a better understanding of what [the] project has achieved, and how or why this has occurred."

Planning a program evaluation

Planning a program evaluation can be broken up into four parts: focusing the evaluation, collecting the information, using the information, and managing the evaluation. Program evaluation involves reflecting on questions about evaluation purpose, what questions are necessary to ask, and what will be done with information gathered. Critical questions for consideration include:

What am I going to evaluate?
What is the purpose of this evaluation?
Who will use this evaluation? How will they use it?
What questions is this evaluation seeking to answer?
What information do I need to answer the questions?
When is the evaluation needed? What resources do I need?
How will I collect the data I need?
How will data be analyzed?
What is my implementation timeline?

Methodological constraints and challenges

The shoestring approach

The "shoestring evaluation approach" is designed to assist evaluators operating under limited budget, limited access or availability of data and limited turnaround time, to conduct effective evaluations that are methodologically rigorous(Bamberger, Rugh, Church & Fort, 2004). This approach has responded to the continued greater need for evaluation processes that are more rapid and economical under difficult circumstances of budget, time constraints and limited availability of data. However, it is not always possible to design an evaluation to achieve the highest standards available. Many programs do not build an evaluation procedure into their design or budget. Hence, many evaluation processes do not begin until the program is already underway, which can result in time, budget or data constraints for the evaluators, which in turn can affect the reliability, validity or sensitivity of the evaluation. > The shoestring approach helps to ensure that the maximum possible methodological rigor is achieved under these constraints.

Budget constraints

Frequently, programs are faced with budget constraints because most original projects do not include a budget to conduct an evaluation (Bamberger et al., 2004). Therefore, this automatically results in evaluations being allocated smaller budgets that are inadequate for a rigorous evaluation. Due to the budget constraints it might be difficult to effectively apply the most appropriate methodological instruments. These constraints may consequently affect the time available in which to do the evaluation (Bamberger et al., 2004). It was originally developed by Jacobs (1988) as an alternative way to evaluate community-based programs and as such was applied to a statewide child and family program in Massachusetts, U.S.A. The five-tiered approach is offered as a conceptual framework for matching evaluations more precisely to the characteristics of the programs themselves, and to the particular resources and constraints inherent in each evaluation context. The five levels are organized as follows:

Tier 1: needs assessment (sometimes referred to as pre-implementation)
Tier 2: monitoring and accountability
Tier 3: quality review and program clarification (sometimes referred to as understanding and refining)
Tier 4: achieving outcomes
Tier 5: establishing impact

For each tier, purpose(s) are identified, along with corresponding tasks that enable the identified purpose of the tier to be achieved. However, there are many hurdles and challenges which evaluators face when attempting to implement an evaluation program which attempts to make use of techniques and systems which are not developed within the context to which they are applied. Some of the issues include differences in culture, attitudes, language and political process.

Culture is defined by Ebbutt (1998, p. 416) as a "constellation of both written and unwritten expectations, values, norms, rules, laws, artifacts, rituals and behaviors that permeate a society and influence how people behave socially". The understanding and meaning of constructs which the evaluator is attempting to measure may not be shared between the evaluator and the sample population and thus the transference of concepts is an important notion, as this will influence the quality of the data collection carried out by evaluators as well as the analysis and results generated by the data..

Internal versus external program evaluators

The choice of the evaluator chosen to evaluate the program may be regarded as equally important as the process of the evaluation. Evaluators may be internal (persons associated with the program to be executed) or external (Persons not associated with any part of the execution/implementation of the program). (Division for oversight services,2004). The following provides a brief summary of the advantages and disadvantages of internal and external evaluators adapted from the Division of oversight services (2004), for a more comprehensive list of advantages and disadvantages of internal and external evaluators, see (Division of oversight services, 2004).

Internal evaluators

Advantages

May have better overall knowledge of the program and possess informal knowledge of the program
Less threatening as already familiar with staff
Less costly

Disadvantages

May be less objective
May be more preoccupied with other activities of the program and not give the evaluation complete attention
May not be adequately trained as an evaluator.

External evaluators

Advantages

More objective of the process, offers new perspectives, different angles to observe and critique the process
May be able to dedicate greater amount of time and attention to the evaluation
May have greater expertise and evaluation brain

Disadvantages

May be more costly and require more time for the contract, monitoring, negotiations etc.
May be unfamiliar with program staff and create anxiety about being evaluated
May be unfamiliar with organization policies, certain constraints affecting the program.

Three paradigms

Positivist

Potter (2006) identifies and describes three broad paradigms within program evaluation . The first, and probably most common, is the positivist approach, in which evaluation can only occur where there are "objective", observable and measurable aspects of a program, requiring predominantly quantitative evidence. The positivist approach includes evaluation dimensions such as needs assessment, assessment of program theory, assessment of program process, impact assessment and efficiency assessment (Rossi, Lipsey and Freeman, 2004).

A detailed example of the positivist approach is a study conducted by the Public Policy Institute of California report titled "Evaluating Academic Programs in California's Community Colleges", in which the evaluators examine measurable activities (i.e. enrollment data) and conduct quantitive assessments like factor analysis.

Interpretive

The second paradigm identified by Potter (2006) is that of interpretive approaches, where it is argued that it is essential that the evaluator develops an understanding of the perspective, experiences and expectations of all stakeholders. This would lead to a better understanding of the various meanings and needs held by stakeholders, which is crucial before one is able to make judgments about the merit or value of a program. The evaluator's contact with the program is often over an extended period of time and, although there is no standardized method, observation, interviews and focus groups are commonly used.

A report commissioned by the World Bank details 8 approaches in which qualitative and quantitative methods can be integrated and perhaps yield insights not achievable through only one method.

Critical-emancipatory

Potter (2006) also identifies critical-emancipatory approaches to program evaluation, which are largely based on action research for the purposes of social transformation. This type of approach is much more ideological and often includes a greater degree of social activism on the part of the evaluator. This approach would be appropriate for qualitative and participative evaluations. Because of its critical focus on societal power structures and its emphasis on participation and empowerment, Potter argues this type of evaluation can be particularly useful in developing countries.

Despite the paradigm which is used in any program evaluation, whether it be positivist, interpretive or critical-emancipatory, it is essential to acknowledge that evaluation takes place in specific socio-political contexts. Evaluation does not exist in a vacuum and all evaluations, whether they are aware of it or not, are influenced by socio-political factors. It is important to recognize the evaluations and the findings which result from this kind of evaluation process can be used in favour or against particular ideological, social and political agendas (Weiss, 1999). This is especially true in an age when resources are limited and there is competition between organizations for certain projects to be prioritised over others (Louw, 1999).

Empowerment evaluation

Empowerment evaluation makes use of evaluation concepts, techniques, and findings to foster improvement and self-determination of a particular program aimed at a specific target population/program participants. Empowerment evaluation is value oriented towards getting program participants involved in bringing about change in the programs they are targeted for. One of the main focuses in empowerment evaluation is to incorporate the program participants in the conducting of the evaluation process. This process is then often followed by some sort of critical reflection of the program. In such cases, an external/outsider evaluator serves as a consultant/coach/facilitator to the program participants and seeks to understand the program from the perspective of the participants. Once a clear understanding of the participants perspective has been gained appropriate steps and strategies can be devised (with the valuable input of the participants) and implemented in order to reach desired outcomes.

According to Fetterman (2002)

Transformative paradigm

The transformative paradigm is integral in incorporating social justice in evaluation. Donna Mertens, primary researcher in this field, states that the transformative paradigm, "focuses primarily on viewpoints of marginalized groups and interrogating systemic power structures through mixed methods to further social justice and human rights". The transformative paradigm arose after marginalized groups, who have historically been pushed to the side in evaluation, began to collaborate with scholars to advocate for social justice and human rights in evaluation. The transformative paradigm introduces many different paradigms and lenses to the evaluation process, leading it to continually call into question the evaluation process.

Both the American Evaluation Association and National Association of Social Workers call attention to the ethical duty to possess cultural competence when conducting evaluations. Cultural competence in evaluation can be broadly defined as a systemic, response inquiry that is actively cognizant, understanding, and appreciative of the cultural context in which the evaluation takes place; that frames and articulates epistemology of the evaluation endeavor; that employs culturally and contextually appropriate methodology; and that uses stakeholder-generated, interpretive means to arrive at the results and further use of the findings. Many health and evaluation leaders are careful to point out that cultural competence cannot be determined by a simple checklist, but rather it is an attribute that develops over time. The root of cultural competency in evaluation is a genuine respect for communities being studied and openness to seek depth in understanding different cultural contexts, practices and paradigms of thinking. This includes being creative and flexible to capture different cultural contexts, and heightened awareness of power differentials that exist in an evaluation context. Important skills include: ability to build rapport across difference, gain the trust of the community members, and self-reflect and recognize one's own biases.

Paradigms

The paradigms axiology, ontology, epistemology, and methodology are reflective of social justice practice in evaluation. These examples focus on addressing inequalities and injustices in society by promoting inclusion and equality in human rights.

Axiology (Values and Value Judgements)

The transformative paradigm's axiological assumption rests on four primary principles:

Methodology (Systematic Inquiry)

Methodological decisions are aimed at determining the approach that will best facilitate use of the process and findings to enhance social justice; identify the systemic forces that support the status quo and those that will allow change to happen; and acknowledge the need for a critical and reflexive relationship between the evaluator and the stakeholders.

Feminist theory

The essence of feminist theories is to "expose the individual and institutional practices that have denied access to women and other oppressed groups and have ignored or devalued women"

Queer/LGBTQ theory

Queer/LGBTQ theorists question the heterosexist bias that pervades society in terms of power over and discrimination toward sexual orientation minorities. Because of the sensitivity of issues surrounding LGBTQ status, evaluators need to be aware of safe ways to protect such individuals’ identities and ensure that discriminatory practices are brought to light in order to bring about a more just society.

A six-step framework for conducting evaluation of public health programs, published by the Centers for Disease Control and Prevention (CDC), initially increased the emphasis on program evaluation of government programs in the US. The framework is as follows:

Engage stakeholders
Describe the program.
Focus the evaluation.
Gather credible evidence.
Justify conclusions.
Ensure use and share lessons learned.

In January 2019, the Foundations for Evidence-Based Policymaking Act introduced new requirements for federal agencies, such as naming a Chief Evaluation Officer. Guidance published by the Office of Management and Budget on implementing this law requires agencies to develop a multi-year learning agenda, which has specific questions the agency wants to answer to improve strategic and operational outcomes. Agencies must also complete an annual evaluation plan summarizing the specific evaluations the agency plans to undertake to address the questions in the learning agenda.

Types of evaluation

There are many different approaches to program evaluation. Each serves a different purpose.

Utilization-Focused Evaluation
CIPP Model of evaluation
Formative Evaluation
Summative Evaluation
Developmental Evaluation
Principles-Focused Evaluation
Theory-Driven Evaluation
Realist-Driven Evaluation

CIPP Model of evaluation

History of the CIPP model

The CIPP model of evaluation was developed by Daniel Stufflebeam and colleagues in the 1960s.CIPP is an acronym for Context, Input, Process and Product. CIPP is an evaluation model that requires the evaluation of context, input, process and product in judging a programme's value. CIPP is a decision-focused approach to evaluation and emphasises the systematic provision of information for programme management and operation.

CIPP model

The CIPP framework was developed as a means of linking evaluation with programme decision-making. It aims to provide an analytic and rational basis for programme decision-making, based on a cycle of planning, structuring, implementing and reviewing and revising decisions, each examined through a different aspect of evaluation –context, input, process and product evaluation.