Identifying Curriculum Materials for Science Literacy: A Project 2061 Evaluation Tool

This report describes an early version of Project 2061's curriculum-materials evaluation procedure. The report is based on a paper prepared for the Colloquium "Using the National Science Education Standards to Guide the Evaluation, Selection, and Adaptation of Instructional Materials," which was held at the National Research Council, November 10-12, 1996.

Jo Ellen Roseman, Sofia Kesidou, and Luli Stern
Project 2061

Project 2061 of the American Association for the Advancement of Science gratefully acknowledges the National Science Foundation’s support of the work described in this paper.


With Project 2061’s publication of Science for All Americans (1989) and Benchmarks for Science Literacy (1993) and the National Research Council’s release of the National Science Education Standards (1996), there now exists a strong national consensus among educators and scientists on what all K-12 students need to know and be able to do in science, mathematics, and technology. The overwhelming similarity between Benchmarks and the NSES means that curriculum materials that support students’ learning benchmarks will likewise promote learning science standards. Valid identification of such curriculum materials is of great interest to educators nationwide, even more so since state and district frameworks are drawing heavily on the national documents.

Beginning in 1991, Project 2061 sought to quickly develop a database of reviewed materials that could be used in making local adoption decisions. Because the pool of existing materials was large there was no way that as few people as the Project 2061 staff could analyze them all. To engage a large number of people in analysis, it would be necessary to have an analysis procedure that could be used, with a reasonable amount of training, to give valid and reliable results. By valid, we mean that the conclusions reached would derive from accurate interpretations of benchmarks and sound principles of effective teaching. By reliable, we mean that independent analysts would reach similar conclusions and cite similar evidence for them. Unfortunately, no procedure existed for judging whether or how well curriculum materials matched specific learning goals like benchmarks. We found that, by using only the impressionistic or check-list procedures that are common in curriculum evaluation, neither other people’s judgments nor our own yielded consistent or defensible results. We therefore began a major effort to develop an adequately valid and reliable procedure, and we have made considerable progress.


The central proposition in the Project 2061 procedure for analyzing curriculum materials is that they are to be judged primarily in terms of how well they are likely to contribute to the attainment of specific learning goals. While it is certainly important that materials are scientifically accurate, age appropriate, and motivating, if they do not also contribute significantly to students’ learning important and agreed upon ideas and skills, they are not suitable for adoption. Hence the Project 2061 procedure concentrates on examining curriculum materials in the light of a coherent set of learning goals. The particular learning goals used here are those found in Benchmarks and in the content component of National Science Education Standards, but in principle any well thought out goal statements could be used as long as they are learning goals and as long as they are specific.

A second premise of the Project 2061 approach to the analysis of curriculum materials is that account has to be taken of both the content and instructional properties of the materials under examination. It does little good for materials to simply include the content of specific learning goals if the instructional strategies recommended in the material are not consistent with what is known about how students learn. To bring home this point, consider that Benchmarks for Science Literacy is a perfect content match to benchmarks—it contains explicit and specific material that is relevant to all benchmarks and does not contain any material that goes beyond them—yet no one would argue that Benchmarks should be used as a student textbook. Mere statement of the goals is far from enough to help students to achieve them. For materials to be credible, it must be evident how students could actually learn what is intended from them.

The Procedure in Brief

The analysis of curriculum materials described here involves four steps, described briefly here and in greater detail in Appendix A:

Preliminary Inspection.

This is to determine whether the material merits further analysis and, if so, to identify the learning goals that will serve as the focus of further study.

Content Analysis.

The purpose here is to determine whether the content in the material matches specific learning goals—not just whether the topic headings are similar. (At the "topic" level it is possible to align most any curriculum with Benchmarks or NSES. This step in the analysis demands more than a topic correspondence.) Reviewers proceed to the next step only if results of this analysis are promising.

Instructional Analysis.

This looks at the match between the material’s treatment of specific learning goals and what is known about student learning and effective teaching. The purpose here is to estimate how well the instructional strategies in the material support student learning those very ideas and skills for which there is a content match. It should be possible to point to evidence of effective instruction in the material, benchmark by benchmark. (It is possible that materials would both (a) show a content match to particular benchmarks and (b) have plausible instructional strategies in general, yet not focus those strategies on those particular benchmarks.)

Summary Report.

The analysis concludes with a summary of what the material under consideration can be expected to accomplish in terms of specific learning goals.

We have found this four-step procedure is helpful to those doing the analysis for the first time. More experienced users of the procedure tend to combine some of the steps. For example, as knowledge of benchmarks and standards increases, the preliminary examination may be combined with the content analysis.


There are several important features of the 2061 procedure that are key to its design. The first four are strategic elements of the procedure itself, the last four characterize the tools available to users. Taken together, these features distinguish the 2061 analysis procedure from other evaluation forms.

  1. Specific Learning Goals. Both Benchmarks and NSES specify what students should know or be able to do at a fairly fine grain size. The 2061 analysis procedure examines the alignment of materials to specific learning goals—benchmarks and fundamental understandings—rather than to section heading or general topics. For example, within the section "The Earth," Benchmarks specifies precisely what students should know about the water cycle by the end of grades 2, 5, 8, and 12. By the end of grade 2 students should know that "Water left in an open container disappears, but water in a closed container does not disappear." By the end of grade 5 students should know the (more sophisticated) idea that "When liquid water disappears it turns into a gas…" and by the end of grade 8 students should be able to explain evaporation in terms of molecules. A first grade activity that has students comparing yearly rainfall patterns around the world addresses the topic "water cycle" but not the substance of the K-2 benchmark "Water left in a closed container disappears, but water in an open container does not disappear." But having students observe the water loss in their classroom fish tank after a holiday and investigating whether covering it solves the problem is more to the point. Benchmarks and NSES provide a set of developmentally appropriate goals based on learning research, making it unnecessary for reviewers to intuit what is appropriate for students in different grades.
  2. Instruction Tied to Learning Goals. The 2061 procedure examines how well the instructional and assessment strategies in the material contribute to students’ learning the specific benchmarks. For example, criteria that probe whether or not the material "includes activities that provide first-hand experiences with phenomena" and "includes question sequences to guide student interpretation and reasoning about phenomena" ask evaluators to examine whether the material contains activities and calls for reflection on each benchmark to be learned. Similarly, the assessment tasks are examined for their match to specific benchmarks. It is important to note that in looking at many materials in terms of many different benchmarks, we have seen that a material may treat one benchmark quite well and another quite poorly. That makes it difficult to form good global judgments. It’s possible, of course, to consider whether a material employs generally supportive instructional strategies such as hands-on activities and cooperative groups, independent of what learning goals they may contribute to. But the instructional strategies might turn out to be aimed at only some of the learning goals of interest—or none at all. (This is the most central proposition of our work: The desirability of instruction and content cannot be considered separately!)
  3. Evidence-based Arguments. The 2061 analysis procedure requires that judgments are supported by evidence-based arguments and take account of both the quantity and the quality of the evidence. The 2061 criteria are not used as a check list, nor can analysts get by with giving unsupported opinions. Analysis reports include descriptions of the supporting evidence and page number references—benchmark by benchmark—so that readers can check them and form independent judgments (and judge the credibility of the reviewer). In coming to conclusions about the quality of evidence, a reviewer might argue that a material that includes carefully sequenced questions to prompt student reflection on a particular phenomenon is deserving of a higher rating than a material that simply instructs teachers to "encourage students to discuss their ideas about the motion of molecules." We have found that when reviewers are required to justify their conclusions, the quality and reliability of their judgments improves.
  4. Feedback. The 2061 procedure involves teams of educators who examine and comment on one another’s judgments and supporting evidence. For example, one team member may question whether activities in a curriculum material that claimed to match a particular benchmark really do. Another may question whether the supporting evidence included in a report truly responds to a criterion. Holding judgments and evidence up to such scrutiny improves the quality of the analysis reports, and having successfully defended their claims builds reviewers’ confidence in their judgments. Team members have indicated both how much they value good feedback when it is given and miss it when it is not.
  5. Clarification of Learning Goals. The meaning of any learning goal, however specific and clearly written it may be, is nonetheless subject to the interpretation of users. We have observed that simply reading a benchmark is often insufficient to grasp its intent, because users typically couch it in their own idiosyncratic understanding. This causes them to both overestimate and underestimate what a benchmark is expecting students to know or be able to do (Roseman, 1997). Before attempting to match activities in a curriculum material to the content of any benchmark, review teams study Science for All Americans and Benchmarks for Science Literacy to clarify the benchmark’s meaning. Science for All Americans provides a narrative context that clarifies where the benchmark is aiming. Other benchmarks before or after a benchmark in a K-12 list clarify the level of sophistication intended in the one under consideration. A strand map from Benchmarks on Disk provides still more context in terms of what goes into understanding a benchmark and where it will lead. Benchmarks essays describe difficulties students may have with the benchmark topic and offer some suggestions for helping students achieve the benchmark. Research summaries suggest likely limitations in student understanding of the benchmark and provide rationale for its grade-level placement.
  6. Specific Criteria. The 2061 procedure uses highly specific analysis criteria. For example, materials are examined for how well they alert teachers to commonly held student ideas (both troublesome and helpful) such as those described in Benchmarks Chapter 15: The Research Base and then for how well the material explicitly addresses those commonly held student ideas. In contrast, other procedures ask for more general impressions: For example, "Do the materials reflect current knowledge about effective teaching and learning practices based on research related to science education?" The more general the criterion, the more open it is to varied interpretations and sampling variations and hence the less likely it is that different analysts will reliably interpret it and respond accordingly.
  7. Clarification of Criteria. To help users interpret the criteria, the 2061 analysis procedure gives the rationale for the analysis criteria and elaborates what a response to each should include. For example, the question "Does the material alert teachers to commonly held student ideas?" is clarified with the following paragraphs:

    Students usually have ideas about how the world works even before instruction. Some ideas are intuitions that are in basic agreement with the scientists’ views, others (labeled often as misconceptions) are in disagreement/conflict with currently accepted scientific theories. Some of the students' misconceptions work fairly well in familiar contexts and are highly resistant to change. Knowing the ideas that students typically have helps teachers decide what ideas to build on and what changes to promote…

    Responding to this question involves examining a) whether there is research on commonly held student ideas in the topic area(s) that the material addresses, b) whether the material alerts teachers to such ideas, c) whether the material accurately represents research findings, and d) what proportion of commonly held ideas identified by the research are described in the material. Summaries of research on students ideas in science (such as those included in Benchmarks Chapter 15: The Research Base or Making Sense of Secondary Science: Research Into Children’s Ideas, by Rosalind Driver, Ann Squires, and Valerie Wood-Robinson. New York: Routledge, 1994) will be helpful to reviewers who want to know what ideas students typically have about the topics that the curriculum material they are examining addresses. If there is no research on student ideas in the topic area(s) that the material addresses, the material should not be faulted for not addressing this criterion.

  8. Concrete Examples of Applying Criteria. The 2061 analysis procedure employs examples of applying the criteria to specific goals and specific materials. Case study reports of fully analyzed materials are provided to illustrate the use of the criteria, what a good argument consists of, and what evidence justifies each rating—high, medium, and low. To illustrate a "high" score on the criterion "Reflecting on Activities" the following example is used. The material in question, Matter and Molecules (Berkheimer, et. al, 1988), is attempting to teach the idea that "All matter is composed of molecules that are constantly moving." During an activity in which students place hard candy in both hot and cold water, they are asked to make some predictions:
  1. How do you think what happens in the two cups will be the same?
    How do you think what happens in the two cups will be different?
    Explain your predictions.

[After 10 minutes, students are asked to look at the two cups and compare them.]

  1. How are the two cups the same?
    How are they different?

  2. There are many ways that the two cups are the same after 10 minutes, and one important way is that some of the candy dissolved in each cup. Try to write an explanation of how this happened. Remember to answer the question about substances and the question about molecules in your explanation.

  3. An important difference is that the candy dissolved faster in one of the cups. In which cup did the candy dissolve faster?
    What was different about the molecules of hot and cold water that would make the candy dissolve faster?

As was evident in comparing our first cycle to our second cycle of development (described below), including all of the features described above greatly increases the validity and reliability of the analysis procedure. Furthermore, because of the specificity and detail provided in the analysis reports, results of the 2061 procedure can facilitate attempts to better align curriculum materials with benchmarks and standards.

Developing and Testing the Procedure

In developing and trying out the analysis procedure, Project 2061 has involved over 100 K-12 teachers, teacher educators, materials developers, cognitive researchers, and scientists. In two cycles of materials evaluation 3-person teams used the procedure to analyze a few materials, reported their findings, and suggested modifications to the procedure. In each cycle, the reliability of the analysis reports was tested by having two teams analyze each curriculum material independently. A similar strategy was used at 6 national sites—both statewide and school-district—to test the procedure under field conditions. A list of materials examined is provided in Appendix B.

We have learned that the procedure can be reliably used only if the following conditions are met:

    • Review teams have more than a superficial understanding of the content to be learned.
    • Review teams understand what constitutes an appropriate level of sophistication of that content for K-12 students. (It is not enough that analysts are familiar with the content, they should also know what of that content contributes to literacy at various grades—for Benchmarks this has been spelled out for K-2, 3-5, 6-8, and 9-12; for Standards, for K-4, 5-8, and 9-12.)
    • Review teams are knowledgeable about reported difficulties that students have learning that content. (Benchmarks Chapter 15: The Research Base summarizes research on students’ ideas that contributed to the substance and grade-level placement of benchmarks. The new Project 2061 tool Resources for Science Literacy (AAAS, 1997) supplements Benchmarks with descriptions of over 100 articles, reports, books, and videos that summarize difficulties students have with Benchmarks ideas)
    • Review teams have undergone considerable training in the use of the analysis procedure that includes practice with feedback. (A minimum of 4 days is needed to teach the procedure but this needs to be followed by practice with feedback to ensure learning.)
    • The materials to be analyzed are not too dissimilar in type of content from the materials used as examples in training. (Case study materials illustrate the application of the analysis criteria to benchmarks in commonly taught topics in life, earth, and physical science. Analysts have difficulties transferring the procedure to benchmarks less commonly taught—for example, to benchmarks in the nature of science or common themes. We are planning to develop additional case studies to illustrate how the procedure applies to less familiar topics.)

Reviewers were enthusiastic about their involvement in the analysis work. Over 80% indicated on a follow-up questionnaire that they are interested in reviewing other materials according to the 2061 procedure.


Results of using the procedure are often surprising. Analysis teams find that quick judgments about alignment to benchmarks or content standards are frequently contradicted by a more rigorous analysis. This held for single units or across several units in a program. A superficial examination most often overestimates what a material can be expected to accomplish. For example, when educators were asked how well River Cutters (a grade 6-9 curriculum module) addressed benchmarks, their initial judgments were far more optimistic than their judgments after completing the 2061 analysis. After initially listing 22 benchmarks that a cursory read led them to suspect were addressed in River Cutters, they found actual sightings for only 12 of them. After studying the meaning of the benchmarks carefully and revisiting the sightings with this more sophisticated understanding, they found that only 6 had a respectable content match. And on considering the instructional strategy of the material, only 1 was found to be instructionally well-supported, as shown below.

Suspected Benchmarks Sighted Benchmarks Content- Matched Benchmarks Instructionally- Supported Benchmarks
1A3-5#1 1A3-5#1    
1B6-8#2 1B6-8#2 1B6-8#2  
3B6-8#2 3B6-8#2 3B6-8#2  
3C6-8#6 3C6-8#6    
4C3-5#1 4C3-5#1 4C3-5#1 4C3-5#1
4C6-8#2 4C6-8#2 4C6-8#2  
4C6-8#3 4C6-8#3    
4C6-8#6 4C6-8#6 4C6-8#6  
11B6-8#1 11B6-8#1 11B6-8#1  
12A6-8#1 12A6-8#1    
12A6-8#3 12A6-8#3    
12C6-8#5 12C6-8#5    
Figure 1: Benchmarks identified at different stages of analysis: "Suspected benchmarks, "sighted" benchmarks, content-matched benchmarks, instructionally-supported benchmarks. Suspected benchmarks are identified after briefly looking at the material; sighted benchmarks have identifiable treatment in the material; content-matched benchmarks survive after the meaning of the benchmarks are clarified; instructionally-supported benchmarks are rated "high" or "medium on most of the instructional analysis criteria.

These results are meant less as a criticism of River Cutters (which could certainly be modified to address more benchmarks) than to illustrate how one can be easily mislead by a superficial analysis. A similar pattern is obtained from analysis of a variety of K-12 curriculum materials. To the extent that other evaluation procedures’ and developers’ claims about their own materials are made at the suspected or at the "sightings" level, their reports will not be credible.

The good news is that, through the use of the procedure, we have identified highly credible materials that are likely to support students’ learning of benchmarks and standards. For example, stand-alone units developed over a period of years by the Institute for Research on Teaching and by the Michigan Department of Education are well-aligned in terms of content and instruction. These units, though not commercially polished, are readily available at relatively low cost and could be used as is by those eager to get started. And some selected units within elementary, middle, and high-school courses can be used as is or readily modified to be better aligned with national learning goals. Even a small module like River Cutters could contribute to students’ understanding of the utility of models by, for example, including question sequences to guide student interpretation of their river runs and their reasoning about the usefulness of their river cutters in understanding how real rivers shape the earth.

Because of the effort required by the procedure, it would be helpful if large-scale curriculum materials could be evaluated by means of sampling a few typical units. Unfortunately, we have found some year-long and grade-range materials to be quite uneven in their treatment of benchmarks and standards. Hence, the analysis results from single units, whether favorable or unfavorable, cannot be generalized to whole programs. Project 2061 is currently exploring sampling techniques for grade-range curriculum materials that do not compromise the reliability and validity of the analysis procedure and that still yield a fairly accurate picture of the material. In addition, we are identifying content and instructional analysis criteria that are especially important for the analysis of programs.

An important consequence of involving a variety of educators in the development and testing of the 2061 analysis procedure is the creation of a pool of reformers who understand what alignment involves and who can bring that knowledge to bear on their work. A cadre of K-12 educators now exists with increased knowledge of specific learning goals in Benchmarks and NSES and with skills to evaluate curriculum materials for their fit to these goals. Moreover, teacher educators are adapting the procedure for use in preservice and professional development programs, which will produce a larger cadre still.

Additional Considerations

Empirical Verification.

It is important to note that the 2061 analysis procedure, however meticulous, produces judgments of the likelihood of effectiveness. The potential of curriculum activities can be estimated by examining whether they address specific, important learning goals (content match), and whether they are based on effective principles of teaching and learning (instructional match) for these goals. However, the "implemented" curriculum will depend on teachers’ interpretations and use of the materials, and the "achieved" curriculum will depend on individual students’ skills, interests, and prior knowledge. Until activities are tried out with students, there will not be hard evidence for what is actually learned. Yet sound studies of effects on student learning are expensive and difficult to do. Such data are available for only a handful of materials and, where available, they correspond well to the results of the 2061 analysis.

Cost and support.

As noted earlier, the Project 2061 procedure concentrates on only some aspects of materials analysis: their alignment with specified learning goals and learning psychology. There are other important variables—such as affordability and availability of publisher support—that influence the usefulness of materials and whether or not they will even be used at all. Other evaluation procedures currently in use focus on these other variables.

Uses of the Project 2061 Procedure

The discussion to this point might seem to suggest that the only purpose of the Project 2061 analysis procedure is to improve decisions about the selection of curriculum materials. That might indeed be its initial use. But the procedure has other important uses, including:

  • Identifying shortcomings in existing curriculum materials and suggesting ways to improve them.

In the short term, schools and districts may not be able to replace all existing materials with new ones. Indeed, given the short time that Benchmarks and Standards have been widely available, in contrast to the much longer time it takes to develop good materials, it is unlikely that sufficient materials for building a K-12 science literacy curriculum currently exist. So for the foreseeable future teachers and curriculum specialists in schools and districts will have to continue the creative improvisations that they have made for so long. The specific information provided about materials from a 2061 analysis can help them with this task.

  • Increasing teachers’ knowledge of characteristics of well-aligned materials and developing skills to distinguish materials that are well-aligned from those that are not.

All of the teachers who have been involved in the development of the procedure (and even those exposed to it less rigorously during 1- or 2-day workshops) tell us that the experience has changed forever how they look at curriculum materials. They are less likely to assume alignment based on developers’ claims or a cursory look at materials themselves. Many claim that it is the best professional development experience they’ve ever had, because it highlights distinctions between effective and ineffective instruction toward specific learning goals. Teacher educators who have participated in the project are already building components of the training into their preservice teacher preparation programs so that new teachers will start to develop these important skills.

  • Stimulating the development of and the market for well-aligned materials.

On the one hand, an important part of the Project 2061 rationale for involving materials developers in the development of the analysis procedure was to encourage them to attend to both content and instructional analysis criteria in the new materials they are developing. If funders encourage and support the use of these criteria then materials developers will be further encouraged to use them. On the other hand, teachers who understand what constitutes well-aligned materials are likely, as informed consumers, to increase the demand for materials that are instructionally well aligned with Benchmarks and Standards.

But while it may be important for every teacher to have some experiences evaluating materials, to have every teacher evaluating every material is not an effective use of their time or the nation’s dollars. We recommend three levels of involvement:

The first attends to immediate practicality. Some choices of materials have to be made now in order to sustain the momentum of the systemic reform movement. In the short run, the evaluation procedure we have researched is too demanding for processing all of the materials that would fairly have to be included in a resource pool. Yet people who are necessarily going to be using a simpler procedure should be aware of what a more thorough evaluation would be like and what sort of results it produces. That raised sensitivity should improve the quality of simpler judgments that are made. A couple of days of tutored practice—soon—would be enough for this beginning.

The second level looks ahead over the next couple of years. The number of qualified evaluators has to grow enough to mount a more searching analysis and build a base of reviews of well evaluated materials. A fairly large number of practitioners should be trained over a period of a week or so and brought back together periodically to share conclusions and assess reliability. A number of these more experienced evaluators should be curriculum developers, so that they can plan new projects (and revise old ones) that are aligned with standards from the start.

The third level of involvement continues to improve the method itself, preparing the way for more efficient evaluation of more diverse materials in the future. This R&D will also provide updates to the second-layer work that can improve and simplify it along the way. Since we believe that Project 2061 (thanks to foresighted NSF funding) is well ahead on this, we see ourselves as playing a leading role in that work, and we would welcome collaborators.

A Possible Approach: The Philadelphia Story

The Philadelphia School District had already committed itself to teaching toward specific learning goals in Benchmarks and NSES and was searching for suitable materials. Faced with the need to select materials fairly quickly, it is using a strategy that combines the selection, fixing-up, and professional development benefits of the 2061 analysis procedure.

Selecting materials.

A curriculum review committee (about a dozen K-12 teachers and teacher educators) was trained to use the 4-step analysis procedure, used it to produce draft analysis reports, compared their reports to others on the same material, and attempted to reconcile, or at least account for, differences. This group will review existing materials and recommend promising alternatives for district use. Due to time constraints, their recommendations will be based on results from only the first step in the 2061 procedure—eliminating materials that do not appear to focus a significant amount of instruction on specific learning goals. Although they acknowledge the desirability of subjecting materials to both a content and instructional analysis before using them with students, these more rigorous steps in the analysis procedure will be postponed and used over time on the greatly reduced list that survives the preliminary inspection. The committee will rely on members’ first-hand experience with the more rigorous analyses to inform their preliminary judgments. Efforts will focus on the materials already at least partially examined and found promising by 2061 analysis teams.


To use the 2061 procedure to improve existing materials, teachers will (a) undertake the more rigorous analysis of the content and instructional match of the chosen materials to benchmarks and (b) use the results of the analysis as a basis for modifying the materials to better align them. This could include such remedies as developing questions to focus students reflection on benchmark ideas, adding activities to address reported student learning difficulties, providing evidence-based arguments to foster student generalization of concepts, and/or explicitly demonstrating how benchmark ideas are useful for making sense of the students’ world outside the classroom.

Professional development.

At the same time, a larger and more diverse group of educators is becoming knowledgeable, through a series of workshops, about specific learning goals in Benchmarks and NSES and about the analysis criteria used to judge materials in light of these goals. This knowledge will help them to recognize both the strengths and weaknesses in existing curriculum materials in terms of their treatment of specific learning goals -- important because even relatively good materials still have some distance to go before they are well aligned. As new, better aligned, materials become available the District will have a cadre of informed consumers who can recognize and appreciate them.

Professional development of teachers is being extensively supported by USI, Eisenhower, and other funds. To help improve curriculum and instruction in both undergraduate and graduate college courses, the District will encourage the college faculty themselves to develop knowledge of science literacy as defined by Science for All Americans, to recognize the importance of specific learning goals, and to use them in designing professional development for Philadelphia teachers. In this way, both teachers pursuing graduate degrees and new teachers will have significant blocks of time focused on science literacy.

References Cited

American Association for the Advancement of Science. (1997). Resources for Science Literacy: Professional development. New York, NY: Oxford University Press.

American Association for the Advancement of Science. (1993). Benchmarks for Science Literacy. New York, NY: Oxford University Press.

American Association for the Advancement of Science. (1989). Science for All Americans. New York, NY: Oxford University Press.

Berkheimer, G.D., Anderson, C.W., Lee, O., and Blakeslee, T.D. with Eichinger, D. and Sands, K. (1988). Matter and Molecules: Teachers Guide: Science Book and Activity Book. East Lansing, MI: The Institute for Research on Teaching, College of Education, Michigan State University. Occasional paper No. 121.

Driver, R., Squires, A., and Wood-Robinson, V. (1994). Making sense of secondary science: Research into children’s ideas. New York, NY: Routledge.

National Research Council (1996). National Science Education Standards. Washington, DC: National Academy Press.

Roseman, J. (1997). Implementing benchmarks and standards: Lessons from Project 2061. The Science Teacher 64 (1), pp. 26-29.

Appendix A: A More Detailed Look at the Procedure

Preliminary Inspection.

Let’s assume for the moment that we are starting with materials that appear promising—the content doesn’t appear too far outside the scope of science literacy and the material includes lots of hands-on activities. The task becomes listing some specific learning goals on which the material appears to focus.

First, reviewers search fairly quickly through the material (both student material and teachers guides) to make a preliminary list of all the specific learning goals that would seem likely to be targeted. The material is then examined more carefully to locate and record all places where each learning goal is actually served—e.g., particular readings, experiments, discussion questions. (A sighting must be explicit in the material.) Then, based on the number and types of sightings, a decision is made about which benchmarks and standards warrant a more careful analysis.

Content Analysis.

This analysis is a more rigorous examination of the link between the subject material and the selected learning goals. This involves giving precise attention to both ends of the match – the precise meaning of the benchmark on one end, and the precise intention of the material on the other. The material is examined with respect to such questions as:

Do the activities called for in the material address the substance of a specific benchmark or only the benchmark’s general "topic?"

Do the activities reflect the level of sophistication of the specific benchmark or are the activities more appropriate for targeting benchmarks at an earlier or later grade level?

Do the activities address all parts of a specific benchmark, or only some? If the latter, what is the consequence? (While it is not necessary that any particular activity or unit would address all the ideas in a benchmark or standard, the K-12 curriculum as a whole should do so. The purpose of this question is to provide an account of precisely what ideas are treated.)

For the material as a whole an attempt is made to estimate the degree of overlap between its content and the learning goals of interest. Thus it strives to answer questions such as these:

Does the material address all benchmarks of interest for a given topic and grade level? Which, if any, are not treated? Are they useful or even essential to the development of already included benchmarks?

Does the material contain content not required for reaching science literacy learning goals? If so, in what proportion? Does the material clearly distinguish between the two kinds of content? (While distinguishing content essential for literacy from non-essential content might seem to be a luxury in a material, its presence increases the range of students for which the material can be used. Distinguishing excess material makes it easier for the teacher to direct better students to enrichment activities and allows students themselves to avoid overload from ideas that go beyond.)

Instructional Analysis.

The purpose of the instructional analysis is to estimate how well the material addresses targeted benchmarks from the perspective of what is known about student learning and effective teaching. The criteria for making such judgments are derived from research on learning and teaching and on the craft knowledge of experienced educators. In the context of science literacy, summaries of these have been formulated in CHAPTER 13: Effective Learning and Teaching in Science for All Americans, in Chapter 15: The Research Base of Benchmarks for Science Literacy, and of science education alone in CHAPTER 3: Science Teaching Standards in National Science Education Standards.

From those sources, seven criteria clusters have been identified to serve as a basis for the instructional analysis. (One could view these as standards for instructional materials.) A draft of the specific questions within each cluster is shown below. The proposition here is that (1) in the ideal all questions within each cluster would be well addressed in a material – they are not alternatives; and (2) this analysis has to be made for each benchmark separately – if we are serious about having science literate high school graduates then we want to focus effective instruction on every single one of the important ideas in Science for All Americans.

Cluster I. Providing a Sense of Purpose:

Part of planning a coherent curriculum involves deciding on its purposes and on what learning experiences will likely contribute to achieving those purposes. But while coherence from the designers’ point of view is important, it may be inadequate to give students the same sense of what they are doing and why. This cluster includes criteria to determine whether the material attempts to make its purposes explicit and meaningful, either by itself or by instructions to the teacher.

Framing. Does the material begin with important focus problems, issues, or questions about phenomena that are interesting and/or familiar to students?

Connected sequence. Does the material involve students in a connected sequence of activities (versus a collection of activities) that build toward understanding of a benchmark(s)?

Fit of frame and sequence. If there is both a frame and a connected sequence, does the sequence follow well from the frame?

Activity purpose. Does the material prompt teachers to convey the purpose of each activity and its relationship to the benchmarks? Does each activity encourage each student to think about the purpose of the activity and its relationship to specific learning goals?

Cluster II. Taking Account of Student Ideas:

Fostering better understanding in students requires taking time to attend to the ideas they already have, both ideas that are incorrect and ideas that can serve as a foundation for subsequent learning. Such attention requires that teachers are informed about prerequisite ideas/skills needed for understanding a benchmark and what their students’ initial ideas are -- in particular, the ideas that may interfere with learning the scientific story. Moreover, teachers can help address students’ ideas if they know what is likely to work. This cluster examines whether the material contains specific suggestions for identifying and relating to student ideas.

Prerequisite knowledge/skills. Does the material specify prerequisite knowledge/skills that are necessary to the learning of the benchmark(s)?

Alerting to commonly held ideas. Does the material alert teachers to commonly held student ideas (both troublesome and helpful) such as those described in Benchmarks Chapter 15: The Research Base?

Assisting the teacher in identifying students’ ideas. Does the material include suggestions for teachers to find out what their students think about familiar phenomena related to a benchmark before the scientific ideas are introduced?

Addressing commonly held ideas. Does the material explicitly address commonly held student ideas?

Assisting the teacher in addressing identified students’ ideas. Does the material include suggestions for teachers on how to address ideas that their students hold?

Cluster III. Engaging Students With Phenomena:

Much of the point of science is explaining phenomena in terms of a small number of principles or ideas. For students to appreciate this explanatory power, they need to have a sense of the range of phenomena that science can explain. "Students need to get acquainted with the things around them—including devices, organisms, materials, shapes, and numbers—and to observe them, collect them, handle them, describe them, become puzzled by them, ask questions about them, argue about them, and then try to find answers to their questions." (SFAA, p. 201) Furthermore, students should see that the need to explain comes up in a variety of contexts.

First-hand experiences. Does the material include activities that provide first-hand experiences with phenomena relevant to the benchmark when practical and when not practical, make use of videos, pictures, models, simulations, etc.?

Variety of contexts. Does the material promote experiences in multiple, different contexts so as to support the formation of generalizations?

Questions before answers. Does the material link problems or questions about phenomena to solutions or ideas?

Cluster IV. Developing and Using Scientific Ideas:

Science for All Americans includes in its definition of science literacy a number of important yet quite abstract ideas—e.g., atomic structure, natural selection, modifiability of science, interacting systems, common laws of motion for earth and heavens. Such ideas cannot be inferred directly from phenomena and the ideas themselves were developed over many hundreds of years as a result of considerable discussion and debate about the cogency of theory and its relationship to collected evidence. Science literacy requires that students see the link between phenomena and ideas and see the ideas themselves as useful. This cluster includes criteria to determine whether the material attempts to provide links between phenomena and ideas and to demonstrate the usefulness of the ideas in varied contexts.

Building a case. Does the material suggest ways to help students draw from their experiences with phenomena, readings, activities, etc. to develop an evidence-based argument for benchmark ideas? (This could include reading material that develops a case.)

Introducing terms. Does the material introduce technical terms only in conjunction with experience with the idea or process and only as needed to facilitate thinking and promote effective communication?

Representing ideas. Does the material include appropriate representations of scientific ideas?

Connecting ideas. Does the material explicitly draw attention to appropriate connections among benchmark ideas (e.g., to a concrete example or instance of a principle or generalization, to an analogous idea, or to an idea that shows up in another field)?

Demonstrating/modeling skills and use of knowledge. Does the material demonstrate/model or include suggestions for teachers on how to demonstrate/model skills or the use of knowledge?

Practice. Does the material provide tasks/questions for students to practice skills or using knowledge in a variety of situations?

Cluster V. Promoting Student Reflection:

No matter how clearly materials may present ideas, students (like all people) will make their own meaning out of it. Constructing meaning well is facilitated by having students (a) make their ideas and reasoning explicit, (b) hold them up to scrutiny, and (c) recast them as needed. This cluster includes criteria for whether the material suggests how to help students express, think about, and reshape their ideas to make better sense of the world.

Expressing ideas. Does the material routinely include suggestions (such as group work or journal writing) for having each student express, clarify, justify, and represent his/her ideas? Are suggestions made for when and how students will get feedback from peers and the teacher?

Reflecting on activities. Does the material include tasks and/or question sequences to guide student interpretation and reasoning about phenomena and activities?

Reflecting on when to use knowledge and skills. Does the material help or include suggestions on how to help students know when to use knowledge and skills in new situations?

Self-monitoring. Does the material suggest ways to have students check their own progress and consider how their ideas have changed and why?

Cluster VI. Assessing Progress:

There are several important reasons for monitoring student progress toward specific learning goals. Having a collection of alternatives can ease the creative burden on teachers and increase the time available to analyze student responses and make adjustments in instruction based on them. This cluster includes criteria for whether the material includes a variety of goal-relevant assessments.

Alignment to goals. Assuming a content match of the curriculum material to this benchmark, are assessment items included that match the content?

Application. Does the material include assessment tasks that require application of ideas and avoid allowing students a trivial way out, like using a formula or repeating a memorized term without understanding?

Embedded. Are some assessments embedded in the curriculum along the way, with advice to teachers as to how they might use the results to choose or modify activities?

Cluster VII. Enhancing the Learning Environment:

Many other important considerations are involved in the selection of curriculum materials—for example, the help they provide teachers in encouraging student curiosity and creating a classroom community where all can succeed, or the material’s scientific accuracy or attractiveness. Each of these can influence student learning, even whether the materials are used. The criteria listed in this cluster provide reviewers with the opportunity to comment on these and other important features.

Teacher content learning. Would the material help teachers improve their understanding of science, mathematics, and technology and their interconnections?

Classroom environment. Does the material help teachers to create a classroom environment that welcomes student curiosity, rewards creativity, encourages a spirit of healthy questioning, and avoids dogmatism?

Welcoming all students. Does the material help teachers to create a classroom community that encourages high expectations for all students, that enables all students to experience success, and that provides all different kinds of students a feeling of belonging into the science classroom?

Connecting beyond the unit. Does the material explicitly draw attention to appropriate connections to ideas in other units?

Other strengths. What, if any, other features of the material are worth noting?

Summary Report.

Having analyzed both the content and the instruction aimed at that content, the final step in the process is to prepare a report that summarizes the material’s treatment of specific benchmarks and, drawing on that evidence, comments more generally on strengths and weaknesses of the material. Nonetheless, the report stops short of an overall recommendation. It is for educators to decide whether to use the material as is, to use it with modifications, or not use it at all. A goal-centered analysis of the kind developed by Project 2061 should help them make better decisions than would otherwise be possible.

Appendix B: Curriculum Materials Analyzed

K-5 units

Insights: Changes of State (EDC)

ESS: Where is the Moon? (EDC)

Science and Technology for Children: Food Chemistry (NSRC)

Nuffield Primary Science: Earth & Space (Collins Educational)

Nuffield Primary Science: Living Things in Their Environment (Collins Educational)

FOSS: Models & Designs (LHS, Britanica)

FOSS: The Structures of Life (LHS, Britanica)

Used Numbers: The Shape of the Data (TERC)

Grades 6-8 Units

Changes in Matter (Macmillan)

Food, Energy, and Growth (Michigan Department of Education)

GEMS: River Cutters (LHS)

Matter and Molecules (Institute for Research on Teaching, MSU)

Power Plant (Institute for Research on Teaching, MSU)

Science 2000 (D.C. Heath)

Science Focus: The Salters’ Approach: Drinks (Heinemann)

SciencePlus: Life Processes (Holt, Rinehart, Winston)

SEPUP: Issues, Evidence and You (LHS)

Technology Units

Materials World Modules: Composites (Northwestern University)

Nuffield Design and Technology (Collins Educational)

Introduction To Design & Technology: Control Technology Systems (Todd et. al., Taylor)

TSM Integration Project: Cabin Insulation (LaPorte and Sanders, Glencoe)

Middle School Program

SciencePlus (Holt, Rinehart, Winston)

Grades 9-12 Biology Units

Insights in Biology: The Matter of Life (EDC)

Biological Science: A Human Approach: Evolution (BSCS, Kendall Hunt)

Biological Science: An Ecological Approach (BSCS, Kendall Hunt)

Heath Biology: Unit II (D.C. Heath)

Grades 9-12 Chemistry Units

ChemCom: Conserving Chemical Resources (American Chemical Society, Kendall Hunt)

Chemistry that Applies (Michigan Department of Education)

Salters Chemistry: Burning & Bonding

Visualizing Matter: Atomic Structure (Holt)

Grades 9-12 Physics Units

Active Physics: Predictions (AAPT)

Project Star: Chapters 10-15 (Coyle et. al., Kendall Hunt)

Conceptual Physics: Chapters 2-6 (Hewitt, Addison-Wesley)

Roseman, J. E., Kesidou, S., and L. Stern (1997). Identifying Curriculum Materials for Science Literacy. A Project 2061 Evaluation Tool. Based on a paper prepared for the colloquium "Using the National Science Education Standards to Guide the Evaluation, Selection, and Adaptation of Instructional Materials." National Research Council, November 10-12, 1996.