Inter-rater Reliability Sampling:
This session provides information about the reliability of the Idaho Alternate Assessment process. The State Department of Education annually conducts an Inter-rater reliability study that monitors the reliability of teacher judgment when rating student performance.
Validation of Proficiency Levels :
All rating scales that are measurements of human behavior have some error in them. However, confidence in a ratings scale and the results can increase when a second person or group independently provides a similar rating to those of the initial rater.
Each school that has at least one student or up to 20 percent of the students statewide will be randomly selected to be rated a second time.
Inter-rater Reliability Sampling:
Special education teachers will be notified of the students selected before March 1.
The special education teacher selects the second rater, which may be either an individual or a team. The special education teacher may also want to draw on the help of IEP team members in making the selection.
Both raters rate the studentís proficiency level so like Rater 1, the person must be familiar and knowledgeable with the student. Good candidates are individuals who have worked with the student and know their program and how they respond to instruction.
State accountability measures are intended to ensure all students reach a level of proficiency, therefore the reliability of the IAA wants to ensure that when two people rated the student performance, both raters judge the performance accurately as "proficient" or "not yet proficient."
Confidentiality and Elimination of Bias:
If a second rater who is a non-employee is involved, such as parent or service coordinator, that person must mark a printed rating form and an authorized employee must enter the ratings online.
Regardless of whether an individual or a team is the second rater, one of the responsibilities is to review all sources of performance data and it is recommended that raters clarify the meaning of the alternate knowledge and skills along with the achievement levels and progress level rubrics.
Results of the Inter-rater Reliability:
The second rater is for reliability purposes only therefore the scores from the second rater are not part of the total score for the student. The results from the 1st rater are finalized once there is agreement and reliability of the results is reached.
The Idaho State Department of Education monitors the number of initial disagreements to ensure the reliability of the statewide assessment tool.
In summary, the process for Inter-rater Agreement involves these six steps after the random selection of students have been made and second raters have been selected and informed of their responsibilities:
1. Rater 1 enters ratings
2. Rater 2 enters ratings
3. Raters confirm if agreement is reached
4. Disagreements are settled by 1) discussing the data and information collected, 2) reviewing additional performance data, 3) determine what items they disagreed on, and 4) establishing a clear understanding of the student performance. Then both raters decide whether they want to make any changes in their ratings. Raters then confirm whether agreement has been reached a second time.
5. Rater 1 finalizes results
6. SDE verifies reliability of the assessment tool
Inter-rater Reliability Guide:
The computer program calculates the scores from both raters and determines the studentís proficiency level for each of the content areas assessed. Agreement is defined as both raters judging a studentís performance to be "proficient" or "not yet proficient." Therefore, this quick table indicates how the scores from two raters may end up in a disagreement.
In conclusion, it is important to remember that the Inter-rater Reliability Sampling is conducted statewide to increase the confidence of the results from the Idaho Alternate Assessment tool and process. It is also important that all individuals involved in the interpretation of the IAA results recognize that some error will probably exist because we are measuring human behavior but we are taking measures to minimize the error in this measurement.