How “Experience Sampling” Could Revolutionize Teacher Grade Reports


Originally published on

Think of a typical progress or grade report a teacher writes each quarter. Most of them require a teacher to fill out a rubric or checklist to indicate progress on a variety of behaviors, like active listening, respecting others and staying on task. But what are teachers actually basing these ratings off of after several weeks of school have passed?

If we’re honest with ourselves (and I spent eight years as a middle and high school history teacher, so I’m included here!), we’d admit that most of us base them off of vague recollections of past behavior that was either exceptional or very poor.

Johnny interrupted others twice in one class period two weeks ago? I’ll give him a “2” instead of a “3” on “Cooperates with others.” Sarah stayed after class a month ago to clean up the art supplies? She’ll definitely get an “Excellent” in “Positive classroom attitude.” Sound familiar

This is clearly not good enough for accuracy’s sake, and what’s more, it’s unfair to kids. Currently, as a nation we are becoming more focused on accurately measuring and evaluating academics within schools. To improve our own instruction, improve communication with parents, and to enrich the socioemotional needs of our students we need reliable and meaningful reports on behavior. How can we do this?

The main issue here is that the life of a teacher is totally nuts. There’s a million things to keep track of, and our brains often just can’t cognitively keep up. Trying to reflect on weeks of interaction time with each student and give an accurate assessment of behavioral progress is simply impossible. But it’s what we’re tasked with, regardless.

Some teachers attempt to overcome this issue with valiant attempts at a personal behavior tracking system. I used to use a paper checklist on a clipboard, complete with eight carefully defined behaviors and a 1 through 5 number system.

But I we should be honest with ourselves once again: How often do we lose momentum with these systems because they require conscious effort to maintain them amidst the daily barrage of pressing student concerns? I know I could never quite keep up with my clipboard system much past the first three weeks of school.

The main issue here is that if a teacher is left to choose when to capture behavior they observe, a phenomenon known well to academic researchers starts to creep in: halo effects (1).

Much like it sounds, the “good kids” (the angels with the halos) have a kind of positive emotional valence in the minds of their teachers, which leads to two troubling things with regard to accuracy:

1) Negative behavior has a tendency to be overlooked or downplayed, and therefore, goes unrecorded.

2) Ratings on unrelated behaviors become inflated.

To explain #2 above a bit more, everyone would probably agree that there’s little overlap between “Cooperates with others” and “Exhibits a positive attitude.” But the “good kids” who so often exhibit that positive attitude may not actually cooperate with others that regularly. Yet they still receive high ratings on cooperation because that positive attitude overshadows everything in the mind of their teachers.

The reverse happens with kids that teachers have labeled as “bad” in their minds. These students’ positive behavior has a tendency to be overlooked while behaviors unrelated to overall attitude (like cooperating with others) are inaccurately rated lower than they are in reality.

What we need is a system where teachers have to actually think less. Automate the process, take out the psychological burden of having to remember to record those observed student behaviors, and narrow the timeframe a teacher must reflect upon when considering a student’s behavior.

Here’s where “experience sampling” comes in. Also known as “ecological momentary assessment,” it’s a technique that’s come into favor in the past 20 years in social science research that asks questions about one day’s experiences rather than asking participants to reflect on a long time period or on a “global” statement.

For instance, rather than asking “How happy are you in your life right now?” (Very Happy, Somewhat Happy, etc.) an experience sampling approach would ask “How happy do you feel about your life today?” multiple times over the course of a longer time period. Combine all those individual check-ins over time into an average score and you get a much more accurate picture of how that person actually feels about the question.

So why couldn’t we use experience sampling techniques with teachers trying to accurately assess student behavior?

In other words, what if there was a way to be gently reminded to watch one particular student and then record what you saw, just that day? And what if you could record what you saw with the tap of a finger on your phone rather than dealing with a paper checklist or your laptop

Conceivably, if you only have to think about a small number of kids (perhaps no more than three per day) and only look for one type of behavior, the cognitive load will be much lower. This would reduce halo effects and also allow you to “rate and forget about it,” knowing that over time you’re getting a series of accurate snapshots that you can refer to later when filling out those quarterly progress or grade reports.

This is what we’re working on here at EduMetrics. We’ve called it iNOTED, with a version for students and for teachers. Put simply, it’s a flexible question-asking platform that works on smartphones, tablets and desktop or laptop computers and pushes a small number of easy-to-answer questions each day, at times decided upon by the school.

We’ve already launched the school version (see a video of it in action and more information at and the teacher version is in a pilot phase with a handful of partner schools as you read this.

The teacher version will sync with a teacher’s class rosters, send a morning reminder message telling that teacher which students to observe that day and what behaviors to assess, and then will send one question at a time at the end of the day for each student. Just go about your day, think less, do less, and let the system do the work for you.

More very soon, but we’re excited about the possibilities! It’s a paradigm shift, and it may take some time to catch on, but we feel confident that the data on student behavior will be more accurate. This, in the end, is more fair to the kids, the ones we’re supposed to be most concerned about in the end.

(1) Hartung et al (2010). Halo effects in ratings of ADHD and ODD: Identification of susceptible symptoms. Journal of Psychopathology and Behavioral Assessment.

Abikoff et al (1993). Teachers’ ratings of disruptive behaviors: The influence of halo effects. Journal of Abnormal Child Psychology.

Phelps et al (1986). The effects of halo and leniency on cooperating teacher reports using Likert-type rating scales. The Journal of Educational Research.

Also published on Medium.

Leave a Reply