Middle Grades Mathematics Textbooks
A Benchmarks-Based Evaluation
|Part 2 (Continued)
The Project 2061 curriculum-materials analysis procedure generates a wealth of in-depth information about the textbook being evaluated. Most users will want to begin their study of the evaluation results with the In Brief charts (found in Part 3 of this report) that provide profiles showing how each textbook scored on content and instructional quality. For the content profile, the coverage of each specific mathematical idea in the selected benchmark was rated on a 0 to 3 scale (no coverage to substantive coverage). These ratings were then averaged to obtain an overall rating for each benchmark (Most content 2.6-3.0, Partial content 1.6-2.5, Minimal content 0-1.5). For the instruction profile, the score for each instructional category was computed by averaging the criterion ratings for the category. This was repeated for each benchmark, to produce ratings of instructional quality on a 0 to 3 scale (High potential for learning to take place 2.6-3.0, Some potential for learning to take place 1.6-2.5, Little potential for learning to take place 0.1-1.5, Not present 0).
Using these profiles, educators can begin to identify some textbook series that may be worthy of further investigation. More specifically, the profiles can help educators draw some conclusions about what the textbook series can be expected to accomplish in terms of its potential for helping students to learn the selected mathematics content. For example, the profiles may indicate that a textbook covers number skills well and provides thorough instructional guidance for teaching these skills yet does a poorer job of dealing with algebra concepts.
Here is an example of an In Brief chart:
In addition to the In Brief charts, further detail is available on how well each textbook series meets the 24 instructional criteria across the selected benchmarks. This Instruction Highlights chart points to major strengths and weaknesses in the instructional guidance provided by the material. For example, the Highlights chart may indicate that the textbook includes appropriate firsthand experiences with mathematics benchmark ideas but only occasionally provides students with opportunities to reflect on these experiences.
To compare the instructional profiles of all the books that were evaluated, we developed an overall score for each book by averaging the scores from all 12 teams on all 24 of the criteria and for all six of the benchmarks. We then used SPSSâ software to graph the data showing the median score and the distribution of the scores across all of the criteria averaged over all of the benchmarks. Textbooks were then ranked by their median scores; those with a median score of more than 2.0 out of 3.0 points were rated as satisfactory. The top three textbooks received median scores of more than 2.5 points. [Click here to see the chart.]
Limitations of the Evaluation
As with most research efforts, this evaluation of middle school mathematics textbooks was constrained in a number of ways that affect its results and how they will be used. To begin with the most obvious, there were constraints on the time and funding available for the evaluation. As a result, we could not include every textbook that is likely to be of interest to every district or school. For the same reasons, analysts evaluated only the printed student and teacher editions and did not evaluate all of the support materials, test banks, software, and other supplements that are available from some publishers. We expect that schools themselves can decide whether these materials add to or subtract from the quality of a textbook series.
Other constraints on the evaluation were due simply to the focus of Project 2061s curriculum-materials analysis procedure itselfthat is, on alignment with important benchmark goals and with a carefully selected set of research-based instructional criteria.
For this reason, the evaluation did not attempt a broad and comprehensive review of content. Instead, the procedure consisted of an in-depth analysis of a sample of six benchmarks that represented key skills and concepts from three core strands of mathematics content. Other important content, such as probability and statistics, patterns and functions, and measurement, was not reviewed. Similarly, overall content accuracy was not a focus either, although analysts did note errors when they occurred in the specific lessons that addressed the sample benchmarks. And even though AAAS benchmarks represent a national consensus about mathematical literacy, they may not align perfectly with the mathematics standards or framework of a given state or school district.
The choice of the 24 instructional criteria, even though they are soundly supported by research on mathematics teaching and learning, may not represent the philosophy of a particular district, school, or teacher. The criteria are intended to address features of materials that are most important for teaching and learning for the large majority of students and teachers. A particular teacher or set of students may not require a textbook to address every one of the criteria. For example, highly able students often can learn a concept with only a few examples or may need less teacher guidance. If these criteria are not rated highly for a textbook and other important criteria are, then a book that is not satisfactory for all students may be appropriate for this particular class. On the other hand, some schools or teachers may find that the 24 instructional criteria we have used in this evaluation do not reflect what is most important to them. For example, there is no Project 2061 criterion that requires textbooks to include or exclude the use of calculators or computers. This limitation must be taken into account in schools that emphasize or discourage these skills.
Other constraints relate to the procedure itself. The time allotted for the evaluation of each textbook series, while sufficient for an in-depth analysis, was not infinite. Even though independent teams rated every textbook series, it is possible that a small number of lessons or activities that addressed benchmark ideas were overlooked, or that, occasionally, some instructional criteria were not given credit in a lesson. Also, while analysts made use of very explicit clarifications, indicators, and scoring guides, they also used their judgments (which they supported with appropriate evidence from the textbooks) in assigning ratings to criteria. Finally, although the analysts used the procedure carefully and objectively, individual biases from their experiences as teachers and researchers might filter through the process.
All of these factors and others have the potential to contribute to uncertainties in the ratings, especially on individual benchmark ideas or instructional criteria. Project 2061 did not, however, note any systematic patterns of error or variance from the prescribed procedure and rating guidelines. With a reliability rate of 90%, the overall scores on each of the criteria and the relative rankings of the textbook series reflect consistent judgments by the reviewers, regardless of the particular benchmark, instructional criterion, or textbook series.