Mathematics Curriculum Materials Analysis Reliability Study

Gerald Kulm
Laura Grier
AAAS -- Project 2061
May 6, 1998

Project 2061 is developing two initiatives and products for reviewing and reporting on the analysis of mathematics and science curriculum materials: (1) a print and CD-ROM tool, Resources for Science Literacy: Curriculum Materials Evaluation, and (2) a data base of reports on middle grades mathematics and science textbooks. For users to have confidence in the analysis procedure and in the published textbook reviews, the analysis must be reliable. That is, the procedure for producing reports must be one in which independent reviewers can come to similar judgments for similar reasons.

The Project 2061 procedure for analyzing mathematics curriculum materials and the support documents that clarify the procedure have been revised extensively based on feedback from the numerous educators who have used it. In addition, we have developed a list of indicators and a rating scheme to make the procedure more useful for producing ratings and reports. This study was carried out to determine the effectiveness of the procedure arid training materials in producing consistent, valid, and reliable ratings of middle grades mathematics textbooks.

Method

Raters: To help us test the reliability and validity of the revised procedure, we convened twelve of our most highly able analysts for a rater reliability study. Each of the raters had been trained to analyze middle grades mathematics materials. Some of the raters had been trained by Gerald Kulm, Project 2061, as part of a project directed by Bill Bush at the University of Kentucky. Others had been trained by Kulm as part of the Expert Panel effort of the U. S. Department of Education. This training was based on the Project 2061 procedure. Both of these experiences showed that the most reliable ratings are produced when analysts work in teams of two persons. In addition, the most valid ratings were produced by a team consisting of a middle grades mathematics teacher and a university mathematics or mathematics education faculty member or supervisor.

The analysts for the study were chosen with these factors in mind. Eight experienced mathematics teachers and six university mathematics education faculty were selected. The analysts received a $1500 consulting fee as well as expenses for (1) attending the training meeting and (2) submitting a satisfactorily completed report. The names and positions of the analysts are provided in Table 1.

Table 1. Mathematics Curriculum Analysts
Diane Surati
Mathematics Teacher
Montpelier, VT
Bill Kunnecke
Mathematics Teacher
Calvert City, KY
Mark Deegan
Mathematics Teacher
Alexandria, VA
Michele Crowley
Mathematics Education Instructor
Northern Kentucky University
Kathleen Morris
Mathematics Teacher
Lorton, VA
Sue P. Reehm
Mathematics Education Professor
Eastern Kentucky University
Linda Hackett
Mathematics Education Professor
American University
Peg Darcy
Mathematics Teacher
Louisville, KY
Marshall Gordon
Mathematics Teacher
Columbia, MD
Jan McDowell
Mathematics Teacher
Louisville, KY
Alice Mikovch
Mathematics Education Professor
Western Kentucky University
Faye Stevens
Mathematics Teacher
Cadiz, KY

Training: The raters attended a three-day meeting in Washington, DC to become familiar with the revised procedure and to practice the rating criteria. Using a mathematics benchmark and a sixth-grade textbook, the training consisted of the following steps:

  1. After clarifying the benchmark, the participants identified sightings in the textbook. The sightings were discussed, then used for the remainder of the training session.
  2. For each Instructional Cluster criterion, analysts were guided in a discussion of the cluster, the criterion, and the rating indicators.
  3. Working in teams, the indicators were used to make a rating of each criterion. The ratings for the six teams were displayed, and the discrepancies were discussed as a way to strengthen understanding of the criteria, indicators, and rating procedure.

Following the meeting, the indicators and rating criteria that were unclear or inconsistent were modified to produce a final set of instructions and a rating form. The instructions, along with a full set of examples from textbooks, illustrated a range of materials rated from low to high on how well they addressed each criterion. This notebook was available to the analysts as they studied the procedure and when they returned home to do their own analysis and ratings.

Design: Following the training, two sets of middle grades mathematics materials were sent to the analysts: Transitions Mathematics and Connected Mathematics. For the latter material, two units from each of three mathematics strands were selected for rating. Each team rated two mathematics benchmarks, one conceptual and one skill, for each of the two sets of curriculum materials. Each of the three pairs of teams was assigned to one of three mathematics strands: Number, Geometry, and Algebra. The two teams for each strand rated the same benchmarks and the same materials independently. Table 2 summarizes the mathematics strands, materials, analysts, and benchmarks that were used in the study.

Rating: The analysis and rating was done during March, 1998. Team members were encouraged to consult with each other and to ask questions of the director. They were asked not to consult or communicate with the members of other teams, especially the team that was analyzing the same set of materials. Teams submitted reports that included the (1) sightings for each indicator, (2) the justifications for the sightings, (3) the rating [Met, Not Met, Unsure] of each indicator, (3) the overall rating of each criterion [High, Medium, Low, None], and a justification of the overall rating of each criterion. In all, 24 criteria across 7 instructional clusters were rated.

With the exception of two or three analysts, the ratings were completed within the month, with the remainder being completed within two more weeks. All of the reports were complete and useable in the study.

Table 2. Design of Reliability Study
Strand Analyst Teams Materials Benchmarks
Number Diane Surati
Bill Kunnecke

Mark Deegan
Michele Crowley
Connected Mathematics:
Bits And Pieces I

Connected Mathematics:
Comparing And Scaling

Transition Mathematics
Concept
9A 6-8#5 The expression a/b can mean different things: a parts of size 1/b each, a divided by b, or a compared to b.

Skill
12B 6-8#2 Use, interpret, and compare numbers in several equivalent forms such as integers, fractions, decimals, and percents.
Geometry Kathleen Morris
Sue Reehm

Linda Hackett
Peg Darcy
Connected Mathematics:
Stretching And Shrinking

Connected Mathematics:
Looking For Pythagoras

Transition Mathematics
Concept
9C 6-8#l Some shapes have special properties: Triangular shapes tend to make structures rigid, and round shapes give the least possible boundary for a given amount of interior area. Shapes can match exactly or have the same shape in different sizes.

Skill
12B 6-8#3 Calculate the circumference and areas of rectangles, triangles, and circles, and the volumes of rectangular solids
Algebra Marshall Gordon
Jan McDowell

Alice Mikovch
Faye Stevens
Connected Mathematics:
Variables And Patterns

Connected Mathematics:
Thinking With Mathematical Models

Transition Mathematics
Concept
9B 6-8#3 Graphs can show a variety of relationships between two variables. As one variable increases uniformly, the other may do one of the following: increase or decrease steadily, increase or decrease faster and faster, get closer and closer to some limiting value, reach some intermediate maximum or minimum, alternately increase and decrease indefinitely, increase or decrease in steps, or do something different from any of these.

Skill
11C 6-8#4 Symbolic equations can be used to summarize how the quantity of something changes over time or in response to other changes.

Summary of Results

There are 24 criteria across the seven instructional clusters. Overall, six separate ratings were done for each of these criteria on each of the two materials, resulting in 288 ratings. The results are summarized in Table 3.

There were 34 disagreements that differed by more than one step on the 4-point [High, Medium, Low, None] rating scale. Therefore, the overall percentage agreement was 88.2 percent. The percentage agreement on the two materials differed considerably. For Transition Mathematics, there were 29 out of 144 differences, resulting in a rater agreement of 79.9 percent. For Connected Mathematics, there were 5 out of 144 differences, which is a 96.6 percent rater agreement. A closer look at the individual benchmarks shows that 14 of the 34 rating differences were on concept-related benchmarks. This result is due primarily to the Transition Mathematics data, indicating that in this material, skill-related benchmarks are more difficult to rate.

As shown in Table 3, some of the criteria appeared more difficult to rate, regardless of the benchmark or type of material. For example, there were difficulties in rating criteria 4.4 Connecting Ideas and 7.1 Teacher Content Learning for both materials. For Transition Mathematics, there were three rating differences on criterion 4.1 Building a Case. Cluster 4 Developing and Using Mathematical Ideas resulting in the greatest number of rater differences for Transition Mathematics.

Table 3. Summary of Rater Agreements and Differences in Ratings on Benchmarks
Transition Mathematics
Benchmarks Rater agreement (%) Criteria with disagreements > 1
9A#5 83 4.1
5.2
6.2
7.1
9C#l 96 4.2
9B#3 75 4.1, 4.2, 4.4, 4.5
5.1
7.1
12B#2 63 2.1
4.1,4.3,4.5
5.2,5.3
6.2
7.1, 72
12B#3 100

 

11C#4 63 1.3
2.1, 2.4
3.2
4.1, 4.3, 4.4
5.1
7.1
Connected Mathematics
Benchmarks Rater agreement (%) Criteria with disagreements > 1
9A#5 100

 

9C#1 100

 

9B#3 88 2.2
4.4
7.1
12B#2 100

 

12B#3 100

 

11C#4 92 2.2
4.4

 


Kulm, G., Grier, L. 1998. Mathematics Curriculum Materials Analysis Reliability Study.