Why Your Evaluation Strategy Cannot Be an Afterthought

In March of 2017, one of my proteges, Stephanie, interviewed me for a school assignment regarding the 4 Levels of Evaluation Model. The following Question and Response exchange has been slightly edited for clarity.

Vivian Bringslimark, President and Owner, HPIS Consulting, Inc., started with a very influential quote from a client of hers that spoke of the importance of evaluation: “When you look at the ADDIE model, without the A, you die. And then without the E, you’ve only just added another course to the curriculum.”

She feels it’s very insightful because the evaluation strategy is not an afterthought. It has to be part of the design so that learners can engage with the material and practice in order to be successful with what they’ll be assessed on. Evaluation, while having the potential to be controversial, problematic, and open to a variety of opinions, has to be added to the agenda of the course so that time can be allotted to implement it. 

QUESTION 1: How do you evaluate a learners’ Reaction (or Level 1) to the training, such as [a] Participants’ satisfaction with the training [b] Participants’ level of involvement (or engagement) in the learning experience [c] Level to which participants’ found value (or relevance) to what they learned?

RESPONSE: While there are problems and flaws with smile sheets, Vivian states that they are most popular, easiest to administer, and results can be quantifiable to generate reports. However, while participants can love the course, smile sheets cannot guarantee effectiveness. And participants may not be able to apply what they learned back on the job, even though they liked a course. Another problem with smile sheets is that the emphasis tends to be on the presenter, whether or not they were on target and delivered the right course for the audience.

QUESTION 2: How do you evaluate Learning (or Level 2), such as [a] Degree to which participants acquired knowledge and skill [b] Participants’ beliefs (attitude) that it will be worthwhile to implement what is learned on the job [c] Participants’ level of confidence in doing what they learned on the job

RESPONSE: Tests or quizzes can measure comprehension, knowledge, and recall, but they cannot measure attitude. Ideally, knowledge assessments should be more situation- or scenario-based and problem-solving-oriented, and they should be written to assess higher levels of understanding on Bloom’s Taxonomy. In reality, tests are not ideal because it takes too much time to develop a proper test. SMEs have a lack of experience in writing what they think are knowledge checks but are actually only measuring recall and comprehension. 

QUESTION 3: How do you evaluate Behavior ( or Level 3), such as [a] Degree to which participants apply what they learned during training when they are back on the job [b] Organizational support which reinforces, monitors, encourages, and rewards the performance of critical behaviors on the job [c] Organizational support for on-the-job learning opportunities

RESPONSE: Vivian said, “I’m in love with Level 3!” She calls this level “transfer” and notes a difference between changing behavior and changing performance, which Thomas Gilbert said are not the same. When you focus only on behavior, she explained, it’s implied that it will show itself observably on the job. The focus has to be on performance (with behavior being an aspect of it). One shouldn’t focus only on behavior and think that you’ll get change in performance on the job.

In the pharmaceutical industry, the ultimate measurement of effectiveness is whether you can do what you learned on the job without error.  The concern is not necessarily what you learned or how you learned it – what matters most is if you can perform your job without making a mistake. 

Vivian mentioned a book written by Donald and Jim Kirkpatrick called “Transfer of Learning” which expanded on organizational support and accountability.  She emphasized something very prevalent when you get to level 3 that is, “most failures of transfer have nothing to do with training.” There are “barriers to the transfer of training” that have very little to do with the quality of training or the quality of the trainer, and prevent the ability for employees to transfer learning on the job. Management doesn’t often acknowledge these barriers but blames them on bad training. [See Article – Training Does Not Stand Alone | Transfer Paragraphs]

QUESTION 4: How do you evaluate Results (or Level 4) or the impact of training on organizational performance?

RESPONSE: Vivian mentioned Jack and Patti Phillips, who happened to write Chapter 30 in our ASTD Handbook.  When I told her that I read about the ROI model, she responded that most business managers (not necessarily executives) are not really looking for the bottom dollar figure because it’s a tremendous amount of work. 

Vivian said that there is a concept called Return On Expectation (ROE), which she found very appealing. Demonstration that objectives were achieved could be a measure of effectiveness. She has become a huge fan of Robert Brinkerhoff’s “Success Case Method” that suggests Level 4 can be measured with qualitative data. 

QUESTION 5: How do you evaluate Return on Investment (or Level 5)?

RESPONSE: Jack and Patti Phillips did a lot of research pointing to a lack of support for evaluation studies and that not everything you deliver is worthy of a level 4 and 5 evaluation. The reason is that it’s very expensive and companies do not have the finances, people, etc. to commit to it. Jack Phillips said only 5% of courses and programs are really worthy of an ROI.

QUESTION 6: Please provide an example of the kind of data you would collect for each level?

RESPONSE: For clients desperate for metrics, the first and easy metric is completion against training requirements. But, she notes it’s a false metric because simply being at 100% in satisfying requirements doesn’t mean you won’t have deviations and errors on the job.

Another metric can be knowledge checks (targeting level 2) such as verbal or paper-and-pencil tracking scores. Metrics can also be obtained through a performance review or performance demonstration (targeting level 3).  Vivian has not seen a solid level 4 package in her specific industry but hopes to see one in the future.

QUESTION 7: Which level(s) is the most important to you? 

LEVEL 3 (see Question 3 Response)

QUESTION 8: Which level(s) is the least important to you? 

RESPONSE: Vivian has seen the use of the Level 2 written assessment abused by serving as a metric for terminating employees. This is not what tests were created for. Level 2 written assessments can lose their educational intention and value when turned into a performance issue under the guise that employees are not getting the knowledge. 

Remember, the goal of the training is for employees to perform their procedures without human errors after they return to work. It’s less about receiving 100% on a knowledge recall immediately after the session is over.

Vivian Bringslimark, HPIS Consulting, Inc.

QUESTION 9: Is there ever a time when training does not require evaluation?

RESPONSE: Yes, evaluation is not supported by employers when they only need documentation that employees attended the training. She has heard an HR director say, “I just need a checkmark. I need to produce training rosters to show the [agency] that we delivered what we promised”.

Vivian implements activities more indicative of transfer (of learning) than a written evaluation such as practice with simulated scenarios. One unique activity has learners develop marketing posters for future attendees and present them in class as their final activity, which is extremely engaging for learners and serves as a measure of effectiveness for the trainer. 

QUESTION 10: What do you think about The Kirkpatrick Model or the New World Kirkpatrick Model? What do you like or dislike and why?

RESPONSE: Vivian admits to mixed feelings about the Kirkpatrick Model. She noted that Donald Kirkpatrick, whom she has met, was intending to write a research paper (for his dissertation), but the model went rogue because people were grasping for anything to make evaluation tangible.  She feels the Kirkpatrick model can be a very plausible solution when management is demanding proof of learning. The Kirkpatrick Model works well when you’re in a classroom setting and you have a course.

Vivian explains the model’s limitations: it cannot be solely relied upon when expanding beyond the course into larger-scale program evaluations for performance improvement. The Kirkpatrick Model becomes challenging to scale up when enlarging the scope of the training programs to include site-wide initiatives.

However, Vivian likes the New World Model’s critical behaviors and leading indicators that were added because it makes the model more practical and even useful in program initiatives. By identifying the critical behaviors, the organization can likely sustain the new behaviors and therefore improve the performance outcome(s).  She said that leading indicators can serve as post points or check-in intervals that are more advantageous than waiting for the end when the ROI study is done.

RE: Assignment 6: Training Manager Interview Comments

1.) Wow, Stephanie, you did a great job with this interview on a very interesting interviewee. So interesting that Jack and Patti Phillips feel that not every course is worthy of a Level 4 or 5 evaluation, and that Jack said only 5% are really worth a level 5. I have never done or designed even a level 4 so I don’t have first-hand experience with the effort and expense involved but this is eye-opening – so it sounds part of the analysis might be to assess whether a course is worth going beyond level 3, making it a conscious, careful decision. This would be somewhat determined by the organization too – it would interesting to know what typical decision points are to conduct (finance) the more advanced assessments.

I really enjoyed this – in fact, I am about to read it the 3rd time!

2.) There is so much great content in your interview summary. I particularly liked the section on how Ms. Bringslimark is “in love with Level 3.” The distinction between changing behavior and changing performance is an important one. It really is all about how you can perform your job, not how you learned it. Over the last couple of weeks, I have also gained a measurable appreciation for Level 3 evaluations. I have been developing training for a workshop and feeling stressed over how to perform an effective level 2 evaluation. The idea of focusing more on the transfer of information and performance in the workplace has allowed me to concentrate my efforts on Level 3 evaluations.

Looking at performance evaluations for performance demonstration also resonates. In a new competency-based training (being developed) both performance evaluations and training need analyses based on the individual competencies will be used to determine transfer of learning and behavioral changes in the workplace. By using both the performance evaluations and the training needs analysis, both non-training and training issues can be determined.

Amazing job, Stephanie. Truly educational.

3.) Excellent thorough interview and post.  It looks like Vivian has built very strong evaluation processes in her design and execution.  

A few comments:

Level One – I agree “smile sheets” have many downsides, however, if they only identify a logistic item (the room was too small for effective learning) or as you indicated about the facilitator (the facilitator wasn’t boring) both of these bring value in the set up for the next training event.  

Level Three – Absolutely agree with her assessment of performance versus behavior.  I thought her take on the barriers to training was incredibly insightful.

Q # 8 – Her answer surprised me about level 2 being the least important. It is very hard to see which ones are least important because they all have value if used correctly.  I have found level 2 is critical to assess whether learning is actually taking place and hard to move on to 3 without a strong level 2 execution. 

Q # 10 – Great way to end the interview.  It was great to hear her perspective on the new world model. 

Again an excellent post.  Thank you.

Do you have an evaluation procedure or a training effectiveness process? Yes, you need to make this formal. Allow me to explain why.

(c) HPIS Consulting, Inc.

What’s Your Training Effectiveness Strategy?

It needs to be more than a survey or knowledge checks.

When every training event is delivered using the same method, it’s easy to standardize the evaluation approach and the tool. Just answer these three questions:

  • What did they learn?
  • Did it transfer back to the job?
  • Was the training effective?

In this day and age of personalized learning and engaging experiences, one-size training for all may be efficient for an organizational rollout but not the most effective for organizational impact or even change in behavior. The standard knowledge check can indicate how much they remembered. It might be able to predict what will be used back on the job. But be able to evaluate how effective the training was? That’s asking a lot from a 10 question multiple-choice/ true-false “quiz”.

Given the level of complexity of the task or the significance of improvement for the organization such as addressing a consent decree or closing a warning letter, it would seem that allocating budget for proper training evaluation techniques would not be challenged.

Do you have a procedure for that?

Perhaps the sticking point is explaining to regulators how decisions are made using what criteria. Naturally, documentation is expected and this also requires defining the process in a written procedure. It can be done. It means being in tune with training curricula, awareness of the types of training content being delivered, and recognizing the implication of the evaluation results. And of course, following the execution plan as described in the SOP.   Three central components frame a Training Effectiveness Strategy: Focus, Timing, and Tools.


Our tendency is to look at the scope (the what) first. I ask that you pause long enough to consider your audience, identify your stakeholders; determine who wants to know what. This analysis shapes the span and level of your evaluation policy. For example, C-Suite stakeholders ask very different questions about training effectiveness than participants.

The all purpose standard evaluation tool weakens the results and disappoints most stakeholders. While it can provide interesting statistics, the real question is what will “they” do with the results? What are stakeholders prepared to do except cut training budget or stop sending employees to training? Identify what will be useful to whom by creating a stakeholder matrix.

Will your scope also include the training program (aka Training Quality System) especially if it is not included in the Internal Audit Quality System? Is the quality system designed efficiently to process feedback and make the necessary changes that result from the evaluation results? Assessing how efficiently the function performs is another opportunity to improve the workflow by reducing redundancies thus increasing form completion speed and humanizing the overall user experience. What is not in scope? Is it clearly articulated?

TRAINING EFFECTIVENESS STRATEGY: Timing is of course, everything

Your strategy needs to include when to administer your evaluation studies. With course feedback surveys, we are used to immediately after otherwise, the return rate drops significantly. For knowledge checks we also “test” at the end of the session. Logistically it’s easier to administer because participants are still in the event and we also increase the likelihood of higher “retention” scores.

But when does it make more sense to conduct the evaluation? Again, it depends on what the purpose is.

  • Will you be comparing before and after results? Then baseline data needs to be collected before the event begins. I.e. current set of Key Performing Indicators, Performance Metrics
  • How much time do the learners need to become proficient enough so that the evaluation is accurate? I.e. immediately after, 3 months or realistically 6 months after?
  • When are metrics calculated and reported? Quarterly?
  • When will they be expected to perform back on the job?

Measuring Training Transfer: 3, 6 and maybe 9 months later

We can observe whether a behavior occurs and record the number of people who are demonstrating the new set of expected behaviors on the job. We can evaluate the quality of a work product (such as a completed form or executed batch record) by recording the number of people whose work product satisfies the appropriate standard or target criteria. We can record the frequency with which the target audience promotes the preferred behaviors in dialogue with peers and supervisors and in their observed actions.

It is possible to do this; however, the time, people, and budget to design the tools and capture the incidents are at the core of management support for a more vigorous training effectiveness strategy. How important is it to the organization to determine if your training efforts are effectively transferring back to the job? How critical is it to mitigate the barriers that get in the way when the evaluation results show that performance improved only marginally? It is cheaper to criticize the training event(s) rather than address the real root cause(s). See Training Does Not Stand Alone (Transfer Failure Section).

TRAINING EFFECTIVENESS STRATEGY: Right tool for the right evaluation type

How will success be defined for each “training” event or category of training content? Are you using tools/techniques that meet your stakeholders’ expectations for training effectiveness? If performance improvement is the business goal, how are you going to measure it? What are the performance goals that “training” is supposed to support? Seek confirmation on what will be accepted as proof of learning, evidence of transfer to the workplace, and identification of leading indicators of organizational improvement. These become the criteria by which the evaluation has value for your stakeholders. Ideally, the choice of tool should be decided after the performance analysis is discussed and before content development begins.

Performance Analysis first; then possibly a training needs analysis

Starting with a performance analysis recognizes that performance occurs within organizational systems. The analysis provides a 3-tiered picture of what’s encouraging/blocking performance for the worker, work tasks, and/or the workplace and what must be in place for these same three levels in order to achieve sustained improvement. The “solutions” are tailored to the situation based on the collected data and not on an assumption that training is needed. Otherwise, you have a fragment of the solution with high expectations for solving “the problem” and relying on the evaluation tool to provide effective “training” results. Only when the cause analysis reveals a true lack of knowledge, will training be effective.

Why aren’t more Performance Analyses being conducted?
For starters, most managers want the quick fix of training because it’s a highly visible activity that everyone is familiar and comfortable with. The second possibility lies in the inherent nature of performance improvement work. Very often the recommended solution resides outside of the initiating department and requires the cooperation of others.   Would a request to fix someone else’s system go over well where you work? A third and most probable reason is that it takes time, resources, and a performance consulting skill set to identify the behaviors, decisions and “outputs” that are expected as a result of the solution. How important will it be for you to determine training effectiveness for strategic corrective actions?

You need an execution plan

Given the variety of training events and level of strategic importance occurring within your organization, one standard evaluation tool may no longer be suitable. Does every training event need to be evaluated at the same level of rigor? Generally speaking, the more strategic the focus is, the more tedious and timely the data collection will be. Again, review your purpose and scope for the evaluation. Refer to your stakeholder matrix and determine what evaluation tool(s) is better suited to meet their expectations.

For example, completing an after-training survey for every event is laudable; however, executive leadership values this data the least. According to Jack and Patricia Phillips (2010), they want to see the business impact the most. Tools like balanced scorecards can be customized to capture and report on key performing indicators and meaningful metrics. Develop your plan wisely, generate a representative sample size initially and seek stakeholder agreement to conduct the evaluation study.

Life after the evaluation: What are you doing with the data collected?

Did performance improve? How will the evaluation results change future behavior and/or influence design decisions? Or perhaps the results will be used for budget justification, support for additional programs or even a corporate case study? Evaluation comes at the end but in reality, it is continuous throughout. Training effectiveness means evaluating the effectiveness of your training: your process, your content and your training quality system. It’s a continuous and cyclical process that doesn’t end when the training is over. – VB

Jack J. Phillips and Patricia P. Phillips, “How Executives View Learning Metrics”, CLO, December 2010.

Recommend Reading:

Jean-Simon Leclerc and Odette Mercier, “How to Make Training Evaluation a Useful Tool for Improving L &D”, Training Industry Quarterly, May-June, 2017.

Who is the Author, Vivian Bringslimark?

Training Does Not Stand Alone |HPISC Published Article

Need some advice with developing your effectiveness strategy? Want a planning tool?

(c) HPIS Consulting, Inc.