Reprinted here with the permission of Science Books & Films. No further republication or redistribution is permitted without the written permission of the editor. Source: 
A BenchmarksBased Approach to Textbook Evaluation
By Gerald Kulm, JoEllen Roseman, and Michelle Treistman
PROJECT STAFF The Project 2061 curriculum materials evaluation project was directed by Gerald Kulm, mathematics, and Jo Ellen Roseman, science. Staff included: Laura Grier and Kathleen Morris, mathematics; and Ann Caldwell, Sofia Kesidou, and Luli Stern, science. 
In today's classrooms, textbooks serve as tool and tutor, guidebook and gauge. Teachers throughout the world use texts to guide their instruction, so textbooks greatly influence how content is delivered (Association for Supervision and Curriculum Development, 1997). Schmidt, McKnight, and Raizen (1997) identified textbooks as playing an important role in making the leap from intentions and plans to classroom activities, by making content available, organizing it, and setting out learning tasks in a form designed to be appealing to students.
To make the most effective use of a textbook, however, teachers must decide which textbooks are appropriate for their needs. A teacher needs to determine the extent to which a textbook focuses on and is aligned with a coherent set of significant, ageappropriate student learning goals that the teacher, school, or district has identified as integral to the understanding of and progress in a particular academic subject. They must also assess how well a textbook's instructional design effectively supports the attainment of those specified learning goals. The only way to gain this information is through careful evaluations of textbooks and other curriculum materials.
Project 2061, the longterm science education reform initiative of the American Association for the Advancement of Science, began work on a curriculummaterials analysis process in 1995 with funding from the National Science Foundation. Since then, support for an evaluation of textbooks for their match to benchmarks and standards has grown. For example, the National Education Goals Panel (1998) has called for "an independent and credible 'consumer reports' review service" to inform educators, policymakers, and the general public about "the degree to which instructional materials are aligned with challenging academic standards." The panel also recommended that "students and teachers should have instructional materials  whether textbooks or other classroom materialsthat directly help students achieve challenging academic standards." (National Education Goals Panel, 1998).
Earlier this year, Project 2061 released the results of an evaluation of middle schoolgrades 6 through 8mathematics textbooks using its curriculummaterials analysis procedure. Results from a similar evaluation of middle school science textbooks will be released this fall. Together, these evaluation reports will be the first components of a K12 database of curriculum reviews that will be easily available to educators online and in print.
While there are other, more abbreviated methods for evaluating curriculum materials, the Project 2061 procedure is unique. It reveals how well a textbook can support teachers in their efforts to help students learn specific ideas and skills, specifically those in nationally accepted standards and benchmarks. A Project 2061 textbook evaluation gives busy educators the solid information they need to make informed choices about which textbooks can help their students improve their knowledge and skills in science and mathematics.
Content Analysis
The first step in evaluating a textbook is to identify the learning goals with which the textbooks should be aligned. Although the Project 2061 curriculummaterials analysis procedure was developed using the learning goals in its own Benchmarks for Science Literacy and the national standards for mathematics and science, subsequent work has indicated that state education frameworks also can be used (Kulm, 1999). The process can be applied to any K12 school subject for which welldefined learning goals have been agreed upon. There are, however, two conditions that the learning goals must meet: (1) they must reflect a consensus on what all students should know and be able to do, and (2) their intent must be clear, specific, and unambiguous.
The Project 2061 procedure is based on the assumption that an indepth examination of the quality of a material's treatment of a few, carefully selected, learning goals is more revealing than a superficial look at many learning goals. In the course of developing its analysis procedure, Project 2061 did indeed find that by studying a material's treatment of a small set of learning goals the strengths and weaknesses of the material's instructional design and support can be identified. For example, to conduct its evaluation of middle grades mathematics and science textbooks, Project 2061 chose learning goals representing three important mathematical strandsnumber, geometry, and algebraand ideas that encompass several important concepts in physical, life, and earth science the kinetic molecular theory, the flow of matter and energy in ecosystems, and processes that shape the earth.
Once the learning goals are selected, the analysis of the content begins with making "sightings" in the material  specific activities, lessons, exercises, and other learning opportunities in the student or teacher material in which the specific benchmarks and standards are addressed.
The judgement on whether the material actually addresses these learning goals is based on two main ideas: substance and sophistication. Reviewers keep both ideas in mind as they evaluate the material. They consider whether the activities address the specific substance of a learning goal or if there is only a "topic" match. It is easy for a material to achieve alignment at the topic levelthe table of contents of most textbooks reveals that they cover the same topic heading. However, although there are many different textbooks that cover the same topicfractions, states of matter, graphing, weather, etc. they can differ greatly in the specific ideas, or substance, that they cover. The distinction between activities that correspond only to the general topic of the content learning goal and activities that actually address its substance, is based on a careful study of the ideas contained in that learning goal. Reviewers also consider whether the activities are developmentally appropriate. That is, do they reflect the level of sophistication of the learning goal or are the activities targeting a learning goal at an earlier or later grade level.
Classroom teachers and higher education faculty learn Project 2061's
analysis procedure during a threeday workshop.
Instructional Analysis
Project 2061's analysis doesn't stop with an examination of content but goes further to evaluate the quality of instructional support for the included content. The purpose here is to estimate how well each activity addresses the targeted learning goal from the perspective of what is known about student learning and effective teaching. Rather than looking at the textbook's instructional design as a whole, reviewers must consider whether the instructional strategies that relate to an activity will help students learn the specific concepts and skills contained in the learning goals used in the evaluation.
Working with science and mathematics educators and cognitive researchers, Project 2061 identified important instructional criteria that represent a set of features that are characteristic of good instructional design. The criteria were derived from research on learning and teaching and from the knowledge of experienced educators. Primary sources for the criteria included: Chapter 13, "Effective Learning and Teaching," of Science for All Americans (AAAS, 1989); Chapter 15, "The Research Base," of Benchmarks for Science Literacy (AAAS, 1993); Research Ideas for the Classroom: Middle Grades Mathematics (Owens, 1993); and Handbook of Research on Mathematics Teaching and Learning (Grouws, 1992).
The procedure requires textbook reviewers to focus only on those textbook activities and lessons that are aligned with the identified content learning goals, and to examine the specific guidance provided to help students learn that content. To evaluate the quality of instructional support reviewers use specific criteria within each of the following categories:
 Identifying a Sense of Purpose. Part of planning a coherent curriculum involves deciding on its purposes and on what learning experiences will likely contribute to achieving those purposes. Reviewers determine how effective the material is at conveying a unit purpose and a lesson purpose and justifying the sequence of activities.
 Building on Student Ideas. Fostering better understanding in students requires taking time to attend to the ideas they already have, both ideas that are incorrect and ideas that can serve as a foundation for subsequent learning. Reviewers determine how well the material specifies prerequisite knowledge, alerts teachers to commonly held student ideas, assists teachers in identifying student ideas, and addresses misconceptions.
 Engaging Students. For students to appreciate the power of mathematics and science, they need to have a sense of the range and complexity of ideas and applications that mathematics and science can explain or model. Reviewers determine how well the material provides a variety of phenomena or mathematical contexts and makes them vivid to students, particularly through an appropriate number of firsthand experiences.
 Developing Ideas. Science and mathematics literacy requires that students see the link between concepts and skills, see them as logical and useful, and become skillful at using them. Reviewers determine how well material justifies ideas, introduces terms and procedures, represents ideas, connects ideas, demonstrates/models procedures and applications of knowledge, and provides practice opportunities.
 Promoting Student Thinking. No matter how clearly materials may present ideas, students (like all people) will devise their own meaning, which may or may not correspond to targeted learning goals. Students need to make their ideas and reasoning explicit, hold them up to scrutiny, and recast them as needed. Whether or not the material is effective in promoting student thinking is determined by how much the material encourages students to explain their reasoning, guides students in their interpretation and reasoning, and encourages them to think about what they've learned.
 Assessing Student Progress. Assessments must address the range of knowledge and skills that students are expected to learn, as well as the kinds of applications and contexts in which such knowledge and skills are useful. Reviewers determine how well assessments align with the learning goals addressed in the material, assess students' ability to apply them, and use assessment to inform instruction.
 Enhancing the Learning Environment. Providing features that enhance the use and implementation of the textbook for all students is important. Reviewers determine whether the material provides teacher content support, establishes a challenging classroom, and supports all students.
To evaluate a textbook, reviewers examine each contentmatched activity in light of the instructional criteria and rate the set of activities according to a prescribed set of indicators and scoring scheme for each one. Their findings are presented as profiles of judgments for each learning goal across the set of criteria with evidence provided to support each judgment.
Assuring Reliability
Reliability comes from several aspects of the procedure. First, the criteria are specific and well defined, and each is explained and clarified with indicators and examples. Second, the analysis procedure is carried out by carefully trained reviewers who are experienced, practicing classroom teachers and higher education faculty who are knowledgeable about research on learning and teaching. Each textbook is analyzed by all of the reviewers, who are organized into independent teams of two and assigned one learning goal. Finally, each team must provide evidencebased arguments for their judgments, which are used to reconcile ratings with the other team, if necessary, and then made available in the final report.
Project 2061 tested its curriculummaterials analysis procedure for consistency of results from reviewer to reviewer. In one reliability study, 14 reviewers who had received extensive training in the procedure independently evaluated two sets of middle grades mathematics materials. There was agreement on 80% of the analysts' ratings on one set and 97% on the other (Kulm & Grier, 1998). In a similar reliability study for science materials, there was agreement on 87% of the reviewers' ratings (Kesidou, 1999). The analysis procedure continued to produce a high level of reviewer agreement across all of the learning goals and all of the textbooks. These results provided sufficient confidence in the procedure's reliability to proceed with the fullscale evaluation of middle grades mathematics and science textbooks.
Putting the Procedure to Work: Evaluating Middle Grades Mathematics and Science Textbooks
By 1998, Project 2061 was ready to begin its first largescale application of its procedure. With funding from the Carnegie Corporation of New York, the project began the firstever benchmarksbased evaluation of middle grades mathematics and science textbooks. In both subjects, Project 2061 looked specifically at middle grades because data on poor student performance from the Third International Mathematics and Science Study and other research indicate that the middle school curriculum requires urgent attention. In the case of mathematics, it is in middle school that many students find themselves in mathematics programs that are repetitious and nonchallenging. As a result, their achievement and interest in mathematics stalls, and they are unable to take advantage of the full range of academic and career options in the future. For these and other reasons, middle school is a critical leverage point for education reform efforts and offers a productive focus for Project 2061's first evaluation effort. (Data on similar studies of middle grade science will be available in the fall of 1999.)
Project 2061 began its evaluations with three basic propositions. First, good textbooks can play a central role in improving education for all students. Second, the quality of the textbooks should be judged mainly on their likely effectiveness in helping students to achieve important science and mathematics learning goals for which there is a broad national consensus. And, third, as mentioned previously, a thorough examination of a material's treatment of a few carefully selected learning goals would be more revealing than a superficial look at the content alignment to many learning goals
The project selected two different types of textbook series for review. Some are "best sellers" that are representative of the textbooks that most middle school teachers are likely to be using in their classrooms or considering for adoption. Others represent the current efforts of curriculum developers, researchers, and textbook publishers. These are just entering the textbook market and are not as well known or well established as the more commercial series.
Because the analysis of textbooks requires a great deal of resources, Project 2061 decided to focus the first round of evaluations on printed materials and not to include supplemental software or other media resources. The project also decided to focus on programs written specifically for the middle grades rather than on K8 basal series or on supplementary materials that did not span grades 6, 7, and 8.
The following mathematics textbook series were reviewed:
Heath Mathematics Connections. D.C. Heath and Company, 1996
Heath Passport. McDougal Littell, 1996
Math Advantage. Harcourt Brace & Company, 1998
Math 65, Math 76, Math 87. Saxon Publications, 1997, 1995
Mathematics in Context. Encyclopedia Britannica Educational Corporation, 1998
Mathematics: Applications and Connections. Glencoe/McGrawHill, 1998
Mathematics Plus. Harcourt Brace & Company, 1994
Mathscape. Creative Publications, 1998
Middle Grades Math. Prentice Hall, 1997
Middle Grades Math Thematics. McDougal Littell, 1999
Middle School Math. ScottForesmanAddison Wesley, 1998
Transition Mathematics. ScottForesman, 1995
The following science textbook series were reviewed:
Glencoe Life, Earth, and Physical Science. Glencoe/McGrawHill,
1997
Macmillan McGrawHill Science. Macmillan/McGrawHill, 1995
Matter and Molecules. Michigan State University, 1988
Middle School Science & Technology. Kendall/Hunt, 1999
New Directions Teaching Units. Michigan Department of Education,
Chemistry That Applies, 1993 Food, Energy, and Growth, 1992
Prentice Hall Science. Prentice Hall, 1997
PRIME Science. Kendall/Hunt, 1994
Science Insights. AddisonWesley, 1997
Science Interactions. Glencoe/ McGrawHill, 1995
Science 2000. Decision Development Corporation, 1995
SciencePlus. Holt, Rinehart and Winston, 1997
For the mathematics textbooks evaluation, the project identified six mathematics learning goals that were examples of the core content likely to appear in any middle grades material. They were chosen from AAAS's Benchmarks for Science Literacy. (See Figure 1.) A comparison of Benchmarks to the National Council of Teachers of Mathematics' Curriculum and Evaluation Standards revealed a close correspondence in their content, especially through the 8th grade (AAAS, 1999).
Figure 1. The following learning goals were used for content analysis in the middle grades mathematics textbook evaluations. Number ConceptsThe expression a/b can mean different things: a parts of size 1/b, a divided by b, or a compared to b. Number SkillsUse, interpret, and compare numbers in several equivalent forms such as integers, fractions, decimals, and percents. Geometry ConceptsSome shapes have special properties: Triangular shapes tend to make structures rigid, and round shapes give the least possible boundary for a given amount of interior area. Shapes can match exactly or have the same shape in different sizes. Geometry SkillsCalculate the circumferences and areas of rectangles, triangles, and circles, and the volumes of rectangular solids. Algebra Graph ConceptsGraphs can show a variety of possible relationships between two variables. As one variable increases uniformly, the other may do one of the following: increase or decrease steadily, increase or decrease faster and faster, get closer and closer to some limiting value, reach some intermediate maximum or minimum, alternately increase and decrease indefinitely, increase or decrease in steps, or do something different from any of these. Algebra Equation ConceptsSymbolic equations can be used to summarize how the quantity of something changes over time or in response to other changes. 
For the science textbooks evaluation, Project 2061 chose topics within physical, life, and earth sciences because they were most consistently included in state frameworks. Three specific topics were selected to correspond to the high priority placed on these topics in the two major sets of nationally recommended learning goals, the National Resource Council's National Science Education Standards and AAAS's Benchmarks for Science Literacy, as well as with state frameworks and educator surveys. (See Figure 2.) The content analysis learning goals used to analyze these topics were crafted based on statements in both Benchmarks and Standards, and represent core science concepts that any middle grades science textbook should cover. Because of their specialized expertise, science reviewers remained topic specifici.e., physical science educators reviewed physical science textbooks onlyand a separate team of reviewers was chosen for each science subject
Figure 2. The following learning goals were used for content analysis in the middle grades science textbook evaluations. Physical science (the kinetic molecular theory):
Life science (flow of matter and energy in ecosystems):
Earth science (processes that shape the earth):

Each textbook series was rated according to both its degree of alignment with the selected learning goals and the quality of instructional support in its student and teacher materials. For the mathematics content profile, the coverage of each specific idea in the selected learning goal was rated on a 0 to 3 scale (no coverage to substantive coverage). These ratings were then averaged to obtain an overall rating for each benchmark (Most content 2.63.0, Partial content 1.62.5, and Minimal content 01.5). For the instruction profile, the score for each instructional category was computed by averaging the criterion ratings for the category. This was repeated for each learning goal, to produce ratings of instructional quality on a 0 to 3 scale (High potential for learning to take place 2.63.0, Some potential for learning to take place 1.62.5, Little potential for learning to take place 0.11.5, Not present 0). Figure 3 shows a sample chart profiling both the content and instructional quality of a sample textbook.
The Results
In January 1999, Project 2061 released the results of its middle grades mathematics textbooks evaluations. Reviewers found the following textbooks to be satisfactory: Connected Mathematics, Mathematics in Context, MathScape, and Middle Grades Math Thematics. The actual ranking values and a complete report of the evaluations for each textbook can be found online at http://www.project2061.org/tools/textbook/matheval/default.htm. Overall, the evaluation yielded both good news and bad.
GOOD NEWS:
 There are a few excellent middlegrades mathematics textbook series.
 The best series contain both indepth mathematics and excellent instructional support.
 Most of the textbooks do a satisfactory job on number and geometry skills.
 A majority of textbooks do a reasonable job in the key instructional areas of engaging students and helping them develop and use mathematical ideas.
BAD NEWS:
 There are no popular commercial textbooks among the best rated.
 Most of the textbooks are inconsistent and often weak in their coverage of conceptual benchmarks in mathematics.
 Most of the textbooks are weak in their instructional support for students and teachers.
 Many textbooks provide little development in sophistication of mathematical ideas from grades 6 to 8, corroborating similar findings of the Third International Mathematics and Science Study.
Figure 3: Textbook Profiles
The Project 2061curriculummaterials analysis procedure generates a wealth of information about the textbook being evaluated. For example, the sample chart below provides a profile showing how one textbook scored on both content and instructional quality. Using these profiles, educators can draw some conclusions about what the textbook series can be expected to accomplish in terms of its potential for helping students to learn the selected mathematics content. The profiles may indicate that a textbook covers number skills well and provides thorough instructional guidance for teaching these skills yet does a poorer job of dealing with algebra concepts. 
Choosing Textbooks for Your School
Because most textbooks are designed with an eye to sales in as many districts as possible, they include the content specified by the guidelines from a number of different states. As a result, textbooks usually contain much more material than a teacher can cover fully in a year, especially in mathematics and science. As a result, the content of these textbooks is unable to provide focus on specific learning goals to the extent needed in today's classrooms. Often, states and school districts are bombarded with information from textbook publishers claiming their materials are aligned with benchmarks and standards. However, as Project 2061 has found, and recent data from the Third International Science and Mathematics Study (Schmidt et al, 1997) demonstrate, most curriculum materials suffer from a lack of coherence and focus. Each of the many different textbooks includes somewhat different topics from which teachers in various districts can choose.
And choose they do. According to a study by the National Center for Education Statistics, U.S. teachers usually have the latitude to design the content and pace of their courses to suit their perception of their students' needs, and few states or districts closely monitor or enforce compliance with state or district standards (Owen, 1996). This latitude makes essential the need for teachers to have access to resources such as Project 2061's curriculummaterials analysis procedure and the results of its textbook evaluations, along with training in the use of each.
Project 2061's procedure and the evaluation results are just the first step in the project's curriculum analysis plans. Over the next few years, with appropriate funding, Project 2061 will compile a database of information about the quality textbooks and other materials. Teachers will be able to use this database to select appropriate textbooks, or to redesign their curriculum using textbooks they already have, to effectively address the standards and benchmarks they need to teach in their classroom. The next step is to evaluate high school and then elementary school textbooks  a proposal to evaluate high school biology and algebra textbooks has already been submitted.
Early next year, Project 2061 will be releasing Resources for Science Literacy: Curriculum Materials Evaluation, a book and CDROM that will contain: (1) detailed instructions for evaluating curriculum materials in light of Benchmarks, national standards, or other learning goals of comparable specificity; (2) casestudy reports illustrating the application of the analysis procedure to a variety of curriculum materials; (3) information for relating findings in the casestudy reports to state and district learning goals; and (4) a discussion of issues and implications of using the procedure.
In addition to its focus on matching content and instruction to specific learning goals, the Project 2061 procedure is a powerful professional development tool. Most obviously, the evaluation experience builds a strong understanding of the learning goals used for a particular evaluation and the ability to distinguish activities that can help students achieve those goals from activities that cannot. It requires the careful collection of evidence and the ability to make judgments about a material's alignment to specific learning goals based on logical arguments from that evidence. The process also calls for reconciled judgements between two independent review teams, thus providing opportunities for reviewers to defend their own judgements about materials and to question those of other reviewers.
By showing educators how curriculum materials are evaluated, they will be able not only to choose better curriculum materials, but also to demand more effective, standardsaligned materials from publishers. Project 2061's Professional Development Programs department offers a workshop that takes participants through the essential steps of the curriculummaterials analysis procedure. The workshop is designed to help states, districts, or individual schools to develop, revise, or adopt curriculum materials that are wellaligned with national, state, or local standards.
The daily decisions teachers make about which teaching materials to use and how to use themalong with the recommendations made by textbook adoption committees  largely determine what and how students will be expected to learn. The Project 2061 curriculum materials analysis procedure is an invaluable tool teachers, and anyone involved in education, can use to help all students achieve science and mathematics literacy.
For more information about Project 2061, please visit www.project2061.org or call (202) 3266666.
References
American Association for the Advancement of Science. (1999). Middle Grades Mathematics Textbooks: A BenchmarksBased Evaluation. Washington, DC: American Association for the Advancement of Science.
American Association for the Advancement of Science. (1993). Benchmarks for science literacy. New York: Oxford University Press.
American Association for the Advancement of Science. (1989). Science for all Americans. New York: Oxford University Press.
Association for Supervision and Curriculum Development. (1997). Education Update, Vol. 39, No. 1.
Grouws, D. A. (Ed.). (1992). Handbook of research on mathematics teaching and learning. New York: Macmillan.
Kesidou, S. (1999). Producing analytical reports on curriculum materials in science: Findings from Project 2061's 1998 curriculum review study. Presented at the annual meeting of the National Association for Research in Science Teaching in Boston, MA.
Kulm, G. (1999). Making sure that your mathematics curriculum meets standards. Mathematics Teaching in the Middle School, 4(8), 536541.
Kulm, G., & Grier, L. (1998). Mathematics curriculum materials reliability study. Washington, DC: Project 2061, American Association for the Advancement of Science.
National Education Goals Panel. (1998). National education goals panel recommendations regarding the implementation of standards. [Online]. Available http://www.negp.gov/page1139.htm.
Owen, E. (1996). Pursuing Excellence: A study of U.S. eighthgrade mathematics and science teaching, learning, curriculum, and achievement in international context. (NCES 97198). U.S. Washington, DC: Department of EducationNational Center for Education Statistics.
Owens, D. T. (Ed.). (1993). Research ideas for the classroom: Middle grades mathematics. New York: Macmillan
Roseman, J. E., Kesidou, S., & Stern, L. (1997). Identifying Curriculum Materials for Science Literacy. A Project 2061 Evaluation Tool. Based on a paper prepared for the colloquium "Using the National Science Education Standards to Guide the Evaluation, Selection, and Adaptation of Instructional Materials." National Research Council, November 1012, 1996.
Schmidt, W. H., McKnight, C.C., & Raizen, S.A. (1997). A splintered vision: An investigation of U.S. science and mathematics education. Boston/Dordrecht/London: Kluwer Academic Press.
Kulm, G., Roseman, J. E., Treistman, M. 1999. A BenchmarksBased Approach to Textbook Evaluation. Science Books & Films, 35 (4).