Observational instruments are increasingly being used in early childhood education contexts to describe and evaluate teacher performance or classroom and program quality. In order to obtain credible data, well-trained raters who can follow a standardized observation protocol and reliably assign scores are crucial. The Classroom Assessment Scoring System™ (CLASS) is an observational instrument developed at the University of Virginia to assess classroom quality in PK–12 classrooms. Read how researchers from UVA’s Center for Advanced Study of Teaching and Learning, where the CLASS was developed, presented CLASS trainings as opportunities to build staff capacity to assess and improve classroom quality in their programs.

Observation instruments and their purposes may vary. For example, the Arnett Caregiver Interaction Scale is used to rate teacher responsiveness, tone, and discipline style. The Early Childhood Environment Rating Scale (ECERS and its revision, ECERS-R) is used to assess child care quality, and includes measurement of classroom routines, activities, materials, and interactions among children and staff.

The Classroom Assessment Scoring System™ (CLASS) is used to observe the quality of teacher–child interactions and is currently being used in Quality Rating Systems in several states. It is also being used to monitor program quality for Head Start programs in all 50 states.

In 2009, the Center for Advanced Study of Teaching and Learning (CASTL) at the University of Virginia gathered data during a nationwide effort to train Head Start staff members to use the CLASS tool. At that time, use of the CLASS was voluntary for Head Start grantees and was used primarily as a professional development tool to assess and improve quality in their programs.

After two days of training on the CLASS using video segments of real classrooms, raters from Head Start were initially tested on three 20-minute video segments. Their scores (on a scale of 1–7) were compared to the scores of master coders for each section of the instrument. To pass the test, 80 percent of their scores had to fall within one point of the master codes. Seventy-one percent of the 2,093 raters who participated passed on this initial attempt and others completed similar follow-up tests. Before beginning their training, some of the raters also completed a survey about their professional experience and beliefs regarding children and teaching practices.

Our analysis of the data turned up a number of interesting findings that may be useful to school administrators and others responsible for training raters to use an observational assessment tool.

Considering Rater Bias

The clearest message coming out of this study was that rater beliefs matter. When administrators are selecting or supporting individuals to become classroom raters, they should attend to whether rater beliefs are aligned with the underpinning theories of the assessment tool they will be using.

Our study found that raters whose beliefs about the role of a teacher in a classroom were aligned with the CLASS approach (for example, those who agreed that intentional teaching practices are important) scored more closely to the master codes than those whose beliefs differed significantly. Organizations may want to select raters whose beliefs align with the purpose of the assessment.

Rater beliefs can be explored with a screening questionnaire or discussion before training. Individuals with differing beliefs do not have to be excluded, however. Trainers and/or administrators can allow sufficient time for discussions that can fully reveal rater bias and provide ongoing support to maintain assessment standards. Raters should be reminded to set contrary beliefs aside for the purpose of the observations.

Try and Try Again

About 30% of raters were unsuccessful in passing our assessment on the first try. About 100 of those raters attempted a similar assessment a second time, and 40 of them passed. Forty-five made a third attempt, and 33 of them passed.

When planning large-scale classroom observations, you may consider provisionally hiring more raters than you eventually need and retaining only those who pass the assessment standards on the first attempt. Alternatively, hiring only the number of raters actually needed may require planning to provide extra support for a percentage of raters who require additional calibration attempts with the observation instrument.

What Didn’t Matter As Much

Surprisingly, level of education and job responsibilities were not significantly related to raters’ ability to calibrate to the CLASS. This suggests that schools and administrators may be able to use existing or current resources to align raters with assessment standards when there is on-going training and support available. However, this finding should be interpreted cautiously given that most raters in this study had at least a bachelor’s degree and an average of 9 years of experience supervising teachers.


Overall, observational assessments provide valuable data for schools in supporting teachers and childrens’ development. As our findings highlight, it is important for schools to be aware of rater beliefs and thoughtful about the resources available and necessary to train raters as they engage in planning observation systems.

For more information on this research, please review a 2-page CASTL Research Brief with citation to full article.
For more information about CASTL, visit the website or email castl@virginia.edu.

Submitted by: Lynn Bell, Dr. Anne Cash, and Leslie M. Booren at the University of Virginia