AAAS - Project 2061 - Mathematics Curriculum Materials Analysis Reliability Study

Mathematics Curriculum Materials Analysis Reliability Study

Gerald Kulm
Laura Grier

AAAS -- Project 2061
May 6, 1998

Project 2061 is developing two initiatives and products for reviewing and reporting on the analysis of mathematics and science curriculum materials: (1) a print and CD-ROM tool, Resources for Science Literacy: Curriculum Materials Evaluation, and (2) a data base of reports on middle grades mathematics and science textbooks. For users to have confidence in the analysis procedure and in the published textbook reviews, the analysis must be reliable. That is, the procedure for producing reports must be one in which independent reviewers can come to similar judgments for similar reasons.

The Project 2061 procedure for analyzing mathematics curriculum materials and the support documents that clarify the procedure have been revised extensively based on feedback from the numerous educators who have used it. In addition, we have developed a list of indicators and a rating scheme to make the procedure more useful for producing ratings and reports. This study was carried out to determine the effectiveness of the procedure arid training materials in producing consistent, valid, and reliable ratings of middle grades mathematics textbooks.

Method

Raters: To help us test the reliability and validity of the revised procedure, we convened twelve of our most highly able analysts for a rater reliability study. Each of the raters had been trained to analyze middle grades mathematics materials. Some of the raters had been trained by Gerald Kulm, Project 2061, as part of a project directed by Bill Bush at the University of Kentucky. Others had been trained by Kulm as part of the Expert Panel effort of the U. S. Department of Education. This training was based on the Project 2061 procedure. Both of these experiences showed that the most reliable ratings are produced when analysts work in teams of two persons. In addition, the most valid ratings were produced by a team consisting of a middle grades mathematics teacher and a university mathematics or mathematics education faculty member or supervisor.

The analysts for the study were chosen with these factors in mind. Eight experienced mathematics teachers and six university mathematics education faculty were selected. The analysts received a $1500 consulting fee as well as expenses for (1) attending the training meeting and (2) submitting a satisfactorily completed report. The names and positions of the analysts are provided in Table 1.

Table 1. Mathematics Curriculum Analysts

Diane Surati Mathematics Teacher Montpelier, VT	Bill Kunnecke Mathematics Teacher Calvert City, KY
Mark Deegan Mathematics Teacher Alexandria, VA	Michele Crowley Mathematics Education Instructor Northern Kentucky University
Kathleen Morris Mathematics Teacher Lorton, VA	Sue P. Reehm Mathematics Education Professor Eastern Kentucky University
Linda Hackett Mathematics Education Professor American University	Peg Darcy Mathematics Teacher Louisville, KY
Marshall Gordon Mathematics Teacher Columbia, MD	Jan McDowell Mathematics Teacher Louisville, KY
Alice Mikovch Mathematics Education Professor Western Kentucky University	Faye Stevens Mathematics Teacher Cadiz, KY

Training: The raters attended a three-day meeting in Washington, DC to become familiar with the revised procedure and to practice the rating criteria. Using a mathematics benchmark and a sixth-grade textbook, the training consisted of the following steps:

After clarifying the benchmark, the participants identified sightings in the textbook. The sightings were discussed, then used for the remainder of the training session.
For each Instructional Cluster criterion, analysts were guided in a discussion of the cluster, the criterion, and the rating indicators.
Working in teams, the indicators were used to make a rating of each criterion. The ratings for the six teams were displayed, and the discrepancies were discussed as a way to strengthen understanding of the criteria, indicators, and rating procedure.

Following the meeting, the indicators and rating criteria that were unclear or inconsistent were modified to produce a final set of instructions and a rating form. The instructions, along with a full set of examples from textbooks, illustrated a range of materials rated from low to high on how well they addressed each criterion. This notebook was available to the analysts as they studied the procedure and when they returned home to do their own analysis and ratings.

Design: Following the training, two sets of middle grades mathematics materials were sent to the analysts: Transitions Mathematics and Connected Mathematics. For the latter material, two units from each of three mathematics strands were selected for rating. Each team rated two mathematics benchmarks, one conceptual and one skill, for each of the two sets of curriculum materials. Each of the three pairs of teams was assigned to one of three mathematics strands: Number, Geometry, and Algebra. The two teams for each strand rated the same benchmarks and the same materials independently. Table 2 summarizes the mathematics strands, materials, analysts, and benchmarks that were used in the study.

Rating: The analysis and rating was done during March, 1998. Team members were encouraged to consult with each other and to ask questions of the director. They were asked not to consult or communicate with the members of other teams, especially the team that was analyzing the same set of materials. Teams submitted reports that included the (1) sightings for each indicator, (2) the justifications for the sightings, (3) the rating [Met, Not Met, Unsure] of each indicator, (3) the overall rating of each criterion [High, Medium, Low, None], and a justification of the overall rating of each criterion. In all, 24 criteria across 7 instructional clusters were rated.

With the exception of two or three analysts, the ratings were completed within the month, with the remainder being completed within two more weeks. All of the reports were complete and useable in the study.

Table 2. Design of Reliability Study

Strand	Analyst Teams	Materials	Benchmarks

Number	Diane Surati Bill Kunnecke Mark Deegan Michele Crowley	Connected Mathematics: Bits And Pieces I Connected Mathematics: Comparing And Scaling Transition Mathematics	Concept 9A 6-8#5 The expression a/b can mean different things: a parts of size 1/b each, a divided by b, or a compared to b. Skill 12B 6-8#2 Use, interpret, and compare numbers in several equivalent forms such as integers, fractions, decimals, and percents.

Geometry	Kathleen Morris Sue Reehm Linda Hackett Peg Darcy	Connected Mathematics: Stretching And Shrinking Connected Mathematics: Looking For Pythagoras Transition Mathematics	Concept 9C 6-8#l Some shapes have special properties: Triangular shapes tend to make structures rigid, and round shapes give the least possible boundary for a given amount of interior area. Shapes can match exactly or have the same shape in different sizes. Skill 12B 6-8#3 Calculate the circumference and areas of rectangles, triangles, and circles, and the volumes of rectangular solids

Algebra	Marshall Gordon Jan McDowell Alice Mikovch Faye Stevens	Connected Mathematics: Variables And Patterns Connected Mathematics: Thinking With Mathematical Models Transition Mathematics	Concept 9B 6-8#3 Graphs can show a variety of relationships between two variables. As one variable increases uniformly, the other may do one of the following: increase or decrease steadily, increase or decrease faster and faster, get closer and closer to some limiting value, reach some intermediate maximum or minimum, alternately increase and decrease indefinitely, increase or decrease in steps, or do something different from any of these. Skill 11C 6-8#4 Symbolic equations can be used to summarize how the quantity of something changes over time or in response to other changes.

Summary of Results

There are 24 criteria across the seven instructional clusters. Overall, six separate ratings were done for each of these criteria on each of the two materials, resulting in 288 ratings. The results are summarized in Table 3.

There were 34 disagreements that differed by more than one step on the 4-point [High, Medium, Low, None] rating scale. Therefore, the overall percentage agreement was 88.2 percent. The percentage agreement on the two materials differed considerably. For Transition Mathematics, there were 29 out of 144 differences, resulting in a rater agreement of 79.9 percent. For Connected Mathematics, there were 5 out of 144 differences, which is a 96.6 percent rater agreement. A closer look at the individual benchmarks shows that 14 of the 34 rating differences were on concept-related benchmarks. This result is due primarily to the Transition Mathematics data, indicating that in this material, skill-related benchmarks are more difficult to rate.

As shown in Table 3, some of the criteria appeared more difficult to rate, regardless of the benchmark or type of material. For example, there were difficulties in rating criteria 4.4 Connecting Ideas and 7.1 Teacher Content Learning for both materials. For Transition Mathematics, there were three rating differences on criterion 4.1 Building a Case. Cluster 4 Developing and Using Mathematical Ideas resulting in the greatest number of rater differences for Transition Mathematics.

Table 3. Summary of Rater Agreements and Differences in Ratings on Benchmarks
Transition Mathematics
Benchmarks	Rater agreement (%)	Criteria with disagreements > 1
9A#5	83	4.1 5.2 6.2 7.1
9C#l	96	4.2
9B#3	75	4.1, 4.2, 4.4, 4.5 5.1 7.1
12B#2	63	2.1 4.1,4.3,4.5 5.2,5.3 6.2 7.1, 72
12B#3	100
11C#4	63	1.3 2.1, 2.4 3.2 4.1, 4.3, 4.4 5.1 7.1
Connected Mathematics
Benchmarks	Rater agreement (%)	Criteria with disagreements > 1
9A#5	100
9C#1	100
9B#3	88	2.2 4.4 7.1
12B#2	100
12B#3	100
11C#4	92	2.2 4.4

Kulm, G., Grier, L. 1998. Mathematics Curriculum Materials Analysis Reliability Study.