Creating Benchmarks For Science Education

Andrew Ahlgren
Project 2061

Progression of understanding maps were one important tool in coordinating the research and information needed for creating K-12 science standards.

Project 2061 has been constructing goals for science, mathematics, and technology education since 1985. During our first three years of work. we recommended what students should remember by the time they leave high school (Science for All Americans1989). Since 1988, we've been working on reasonable expectations for students at earlier grade levels (Benchmarks for Science Literacy, in draft). This new volume will include benchmark lists, some of our progression-of-understanding maps, and essays related to the benchmark topics.

We intend the benchmarks to be used by school districts or curriculum developers in constructing alternative K- 12 curriculum models adapted to their own populations and circumstances. Before we reach that point, though, we believe some reflection on how we created the benchmarks can be a stimulus to other curriculum reform efforts.

The experience of writing benchmarks is highly stimulating. But make no mistake about it: The work is difficult, and getting started seems to discourage many people from the undertaking. Still the quality of thinking and conversation that goes on is often impressive, even when the tentative product may not be.

Benchmark Grades

The National Assessment of Educational Progress (NAEP) has popularized grades 4 and 8 as benchmark grades, and the National Council of Teachers of Mathematics (NCTM) has followed that pattern (Curriculum and Evaluation Standards for School Mathematics 1989). However, the district teams working with Project 2061 decided that the psychological distance from K to 8 requires more than a single benchmark. The end of grades 2 and 5 were recommended as being more meaningful developmental breaks, and we are designing benchmarks for those grades.

It is not our intention that 2nd graders should be subjected to formal, national examinations on their progress in science (as seems to be in the cards for older children). The feel of the expectations for grade 2 is distinctly different from that for grade 5, so we believe it is important to discourage kindergarten teachers from embarking immediately toward grade 5 goals. By their modest, nontechnical nature, grade 2 benchmarks suggest what students cannot be counted on to know as they begin grade 3, and this may temper expectations for what children can learn in grades 3 through 5.

Inferring Benchmarks

In crafting the lower-glade expectations. we drew partly on an analysis of what ideas would be needed to achieve the 12th-grade understandings in Science for All Americans. We also considered estimates of what students are capable of at different ages, drawing information from the experienced teachers on our district teams and from researchers who study how children understand and learn science. Unfortunately, the availability of published research on children s understanding of science is very uneven over different content areas.

We have found that it is seldom possible to work backward from 12th grade goals one at a time to create a neat stack of previous levels of sophistication. Usually there are convergences (several ideas required to understand a subsequent idea) and divergences (several ideas depending on one prior idea). The natural medium to express such goals is therefore a diagram with boxes and arrows. We called our speculative charting a progression-of-understanding map (to distinguish it from the concept map currently popular in science education).

Fig. 1

Figure 1 is an example of a draft of a progression-of-understanding map, depicting ideas related to a section on the structure of matter (Science for All Americans Chapter 4: The Physical Setting). Reading from bottom to top, the map shows a rough progression in time, beginning from notions students hold when they enter school. When we sketched this map, the sequence of ideas was more important than tying ideas to any particular grade. Estimation of approximate grade placement for each idea usually came later.

Unfortunately, the availability of published research on children's understanding of science is very uneven over different content areas. The toughest part of the map was in the middle. Without the firm guidance of research on many topics, we were in the same place as the Geography Task Force of the National Council on Education Standards and Testing when they wrote: "It was difficult to set 8th grade standards, other than indicating that students should be expected to know more than they did in the 4th grade and less than the 12th grade" (Raising Standards for American Education 1992, p. L-4). Compounding our uncertainty was the possible difference between what children could be expected to do now, with their current history in the school system, and what 5th or 8th graders eventually might be able to do if they had optimal experiences.

When we began mapping, we intended to cover one conceptual strand at a time, leaving some loose ends that would later connect to other strands. For example, the structure of matter shows obvious connections to the flow of matter and energy. The structure of matter was a whole section in Science for All Americans, and perhaps it was too large a conceptual chunk to represent comfortably on a single map. (Notice that it is incomplete in fig. 1.) It soon became evident that a progression of understanding map for a single strand was already more complex than most people would find inviting.

Software support for constructing each map and for making connections among them would be very helpful (and we now have a grant to develop such software). We intend to create a curriculum resource data base that will link appropriate parts of the progression-of-understanding maps, giving users the option of choosing the complexity of information they want to consider. The resource base would also link benchmarks to blocks, to activities and materials, and to appropriate assessment suggestions.

We plan to accompany benchmark lists with essays that will call attention to the progressions of various strands and connections among them. For example, the essay accompanying the benchmarks for the structure of matter would draw attention to the parallel development of four different strands: properties of substances, combinations of parts, invisibly small pieces, and conservation of matter.

Essays, which are being prepared by our teams and staff, will also draw on the available educational and psychological research, calling attention to difficulties that students are likely to have at each level and, in particular, to persistent previous conceptions that may interfere with learning (see fig. 2). We are still uncertain about how far essays should go beyond suggesting appropriate kinds of instruction. (The research has much less to say about instruction to overcome difficulties than about the difficulties themselves.)

Figure 2
Literacy Goal: The Structure of Matter
The following example of an essay and benchmark list is taken from the draft of Benchmarks for Science Literacy (in draft).

Students will learn about the nature of atoms and molecules and the structure of matter.

Of all sections, this one may have the most implications for students eventual understanding of the picture that science paints of how the world works. However, it may also offer the most difficulties. The theory of atoms and molecules is powerful in explaining our world, but it requires bringing together a number of lines of evidence and imagination: about the properties of materials and their combinations, changes of state, effects of temperature, behavior of large collections of pieces, the construction of objects from parts- even about the desirability of simplicity in explanation. All of these should be grasped by children during middle school, so that the unifying ideas of atoms can be developed by the end of grade 8.

The scientific understanding of atoms and molecules requires students to entertain the notion that all visible things are composed of invisible particles. Another notion is that everything might be made up of a relatively few ingredients. An idea preliminary to this is that materials combined in different ways can have different properties. And still preliminary to that is the very notion of properties of materials.

Parallel to consideration of properties of combinations is the notion that the bulk properties of materials can be very different from the properties of their minute parts-an idea counter to the students intuition.1 And parallel too is the idea of an unchanging total amount of matter, beginning with the evidence that total weight stays the same during all sorts of changes in materials.2

Grades 3 through 5

The study of materials should continue throughout these years but become more systematic and quantitative. Students should design and build things that put different requirements on the properties of materials. They should be expected to write clear descriptions of their designs and experiments, present their findings whenever possible in tables and graphs (designed by the students, not the teacher) and enter their data and results in a computer database.

Students should measure (weight, dimensions. temperature), estimate (dimensions, weight, population size), and calculate (area, volume, population size) using hand-held calculators when necessary.3 With magnifiers, they should inspect substances composed of large collections of particles-sand, spices. powders-to discover the unexpected details at smaller scales. They should observe and describe the (sometimes solid-like, sometimes liquid-like) behavior of large populations of pieces-powders, marbles, sugar cubes, or wooden blocks.

By the end of the 5th grade, students should know that
  • Heating and cooling cause changes in the properties of materials. Many kinds of changes occur faster under hotter conditions.
  • However parts are assembled, the weight of the thing made is always the same as the sum of the parts; and when a thing is broken into parts, the parts together weighed the same as the original thing.
  • Materials may be composed of parts that are too small to be seen without magnification.
  • When a new material is made by combining two or more other materials, it can have properties that are different from any of them. For that reason, a lot of different kinds of materials can be made from a small number of basic kinds.
  • A collection of a large number of pieces may keep its shape or flow like a liquid, depending on how the pieces stick together or how they are stacked.

1Brook et al 1984, Driver 1987.
2At the beginning, children have various ideas about what is matter For many, gases and even liquids are not seen as having weight or as being matter (Lee et al In press, Driver 1987, Stavy 1990). very tiny solid particles are also not seen as having weight-because their weight cannot be felt (Smith, Carey, and Wiser 1985, Carey 1991).
3Research shows that children may consider anything so light that they cannot feel its weight to have no weight at all (Smith. Carey, and Wiser 1985; Carey 1991) Lots of weighing on increasingly sensitive balances, including weighing piles of small things and dividing them to find the weight of each, will help.

Benchmark Adjustments

Once in the thick of producing benchmarks, occasions will arise when a benchmark statement doesn't t seem well suited to its designated grade level. The easiest option is to move the benchmark intact to another grade level, making adjustments to any benchmarks connected to it.

A second option is to rewrite the statement at a level of sophistication more appropriate to the current grade level, but this is more than a stylistic transformation. Grade adjustments can seldom be made so simply as changing the vocabulary. Rewriting usually requires rethinking what students should be able to do-in their heads or behaviorally-at that level.

A third rewriting option is the most difficult but probably the most fruitful: tease apart the substance of the benchmark and create two new ones, keeping one at the current level and putting the other at a different one. Again, merely making style changes in language won't change the substance of the benchmark. One must reconsider what students could understand and what the likely sequence of understanding is.

Knowledge vs. Belief

Research shows that children may understand a scientific explanation of phenomena before they believe it (for example, Hewson and Hewson 1992, Osborne and Freyberg 1985). The longer the time gap between being able to state an idea and eventually believing it, the greater the problem for writing benchmarks. Should a benchmark about children's ability to explain something specify that they can produce a scientific explanation, or should we try to require their acceptance of it as well?

From a philosophical point of view, Project 2061 would prefer to require knowledge rather than belief. A similar dilemma appeared in writing the Values and Attitudes section in Science for All Americans. We rejected the goal that everyone should like science, mathematics, and technology or should believe these endeavors are of net benefit to humankind. We agreed instead on the goal that students attitudes-whether they turn out to be positive, negative, or neutral-should be based on a sound understanding.

A poignant case might be that of evolution through natural selection. We can reasonably require that students understand what the scientific theory is, but do not have to require students to believe that is how present life on earth necessarily came to be.

Final Thoughts

Benchmark drafts should be tried out with a variety of readers, not just for approval or minor editing, but to see how they are likely to be interpreted and used. Writing good benchmarks may not require setting fixed rules as much as it requires being continually vigilant about how one's intent might be misunderstood.

Researcher Pat Heller (of the University of Minnesota) summarized the task for us after a recent writing retreat:

  • Make benchmarks not so specific as to be limiting and not so general that no one is quite sure what you re talking about.
  • Have a clear sequence where necessary within a grade level.
  • Have a progression from one grade level to the next that illustrates increasing sophistication.
  • Show connections between benchmarks under different goals.
  • Write them to be developmentally appropriate, assessable, and relevant to the child's world.


Hewson, P., and M. Hewson. (1992). "The Status of Students' Conceptions." In Research in Physics Learning: Theoretical Issues and Empirical Studies edited by R. Duit, F. Goldberg, and H. Niedderer. Kiel, Germany: Institute for Science Education.

Osborne, R., and P. Freyberg. (1985). "Roles for the Science Teacher." In Learning in Sciences edited by R. Osborne and P. Freyberg. Auckland: Heinemann.

Raising Standards for American Education. (January 24, 1992). Washington, D.C.: National Council on Education Standards and Testing.

Science for all Americans. ( 1989). Washington, D.C.: American Association for the Advancement Of Science.

Ahlgren, A. 1993. Creating Benchmarks For Science Education. Educational Leadership, 50 (5).