Linking Middle and Early High School Science and Mathematics Assessment Items to Local, State, and National Content Standards

A Proposal Submitted to the Division of Elementary, Secondary, and Informal Education under the IMD Assessment Program by the American Association for the Advancement of Science Project 2061 August 2003

Project Description

The purpose of this five-year project is to develop a bank of high-quality assessment items and related tools in middle- and early high-school science and mathematics that are aligned with state and national content standards; that are easily accessible by users; and that can be utilized throughout the educational system by curriculum researchers, curriculum developers, teachers, test developers, and the general public. The items will also be valuable as demonstration models of what aligned assessment looks like, models that can be used in college and university teacher education programs.

The requirements of the new federal No Child Left Behind Act of 2001 have given high-quality assessment new importance. By mandating tests that are based on state standards, the legislation provides the impetus to design assessment tasks that measure understanding of the content specified in those standards.

As one of the first organizations to focus on content standards and their role in curriculum, instruction, and assessment, Project 2061 of the American Association for the Advancement of Science (AAAS) has been studying the alignment and effectiveness of hundreds of test items drawn from a variety of sources, including items from the Third International Mathematics and Science Study (TIMSS) and National Assessment of Educational Progress (NAEP) tests, items from state tests, released state test items, and items from various curriculum materials. Using specially designed criteria, Project 2061 and teams of experienced educators and assessment specialists have developed a procedure for analyzing and profiling items for their alignment with content standards and for other characteristics that affect the usefulness of the items to measure student understanding of those content standards (American Association for the Advancement of Science [AAAS], 2003). Application of the procedure leads to a detailed analysis of each item on such features as content alignment; comprehensibility; test wiseness; bias related to gender, class, race, and ethnicity; and item context. The results of the analysis can then be used as the basis for revising those items. In the project proposed here, we will extend our ongoing work with assessment items to further develop this analysis procedure by examining assessment items for additional linguistic features and for their suitability for testing students with limited English proficiency. This new work will draw on research being done by Rebecca Kopriva at the University of Maryland (Kopriva, 2000).

Our current work also involves developing assessment maps that can be used as conceptual frameworks for creating multi-item tests that measure student understanding of targeted content standards and related ideas. The assessment maps identify common misconceptions, prerequisite ideas, and ideas that come later in the developmental progression. These maps draw from the strand maps that we have developed for the Atlas of Science Literacy (AAAS, 2001) and from the work on progress variables in learning (Wilson & Draney, 1997). Tests built around assessment maps can be used to provide a diagnostic analysis of student understanding of ideas identified in the content standards.

Because the new federal legislation requires states to hold students accountable for the specific content standards of each state, being able to cross-link standards documents is essential in order to provide national resources to the states and to share resources from state to state. To accomplish this cross-linking, our proposed project will draw on existing efforts to create connections between the content standards of each state and to national content standards. The work done by Mid-continent Research for Education and Learning (McREL) and Align-to-Achieve, for example, allows one to match the content standards of approximately 40 states, the National Science Education Standards (NRC, 1996), and their own Compendix (, a set of benchmarks and standards drawn from primary national documents such as AAAS's Benchmarks for Science Literacy (1993) and the National Research Council's National Science Education Standards, as well as various state documents. (AAAS's Benchmarks was used extensively in the creation of the Compendix and will soon become part of a complete set of content standards in the Align-to-Achieve database.) The proposed project will work with the linked standards already included in the Align-to-Achieve database to create a utility that will allow users to access test items matched to national content standards or to the content standards of any state.

The Need for Standards-Based Assessment Items and Tools

Alignment of all elements of the education system to content standards is at the heart of the standards-based reform movement that has taken hold over the past dozen years or so. Standards-based reform of K-12 education is founded on the premise that fundamental improvement begins with (and continues to be tied to) a coherent, well-articulated set of specific content standards. This vision for the reform of education in science and mathematics has been promoted for over a decade through AAAS Project 2061 and its Science for All Americans (1989) and Benchmarks for Science Literacy (1993), through the National Council of Teachers of Mathematics (NCTM) and its Curriculum and Evaluation Standards for School Mathematics (1989) and Principles and Standards for School Mathematics (2000), and through the National Research Council (NRC) and its National Science Education Standards (1996).

For the standards-based reform agenda in science and mathematics education to continue to move forward, the field needs access to high-quality assessment items that are aligned to the content standards specified in national and state standards documents. As stated in the National Science Education Standards: "…assessment is a primary feedback mechanism [that]…leads to changes in the science education system by stimulating changes in policy, guiding teacher professional development, and encouraging students to improve their understanding…." (NRC, 1996, p. 76).

Ultimately, the quality of any test comes down to the specific tasks that students are asked to perform. Obviously, one item, or even a single set of items, can never give us complete confidence that students understand or do not understand an idea, but every item should contribute some evidence of student understanding. At present, however, there is general awareness in the field that there are too many poorly written assessment items that do not align properly with the content standards for which students are being held responsible. One reason for this lack of alignment between content standards and assessment items, which we will discuss later in this proposal, is that it is not always clear exactly what the content standards themselves are saying about the ideas and skills students are expected to learn and what they should be able to do on a test.

At the present time, released items are available from such sources as state tests and tests administered by NAEP and TIMSS. There is also a large item bank developed by the Council of Chief State School Officers (CCSSO) as part of their State Collaborative on Assessment and Student Standards project (SCASS), which is available to participating states. However, in general, these items are linked to content standards at a broader level of specificity than we are proposing for this project, as is evident, for example, on the CCSSO Web site (

None of the alignment models listed on that Web site, including those of Achieve (n.d.), Webb (1999), the Council on Basic Education (n.d.), and the CCSSO Survey of Enacted Curriculum (SEC) (n.d.), provides the level of precision in their alignment procedures that we propose. The SEC model, for example, examines alignment at the level of topics such as multiple-step equations, inequalities, linear equations, etc. (Porter, 2002). Whereas these alignment models are an important first step in moving instruction, materials, and assessment toward alignment, without test items that assess the specific ideas and skills in the content standards, teaching and learning will still lack the precision that is called for in a standards-based environment. Standards-based teaching and learning requires accuracy in measuring student progress toward the attainment of those content standards. If we are not committed to creating and using test items that actually assess the specific ideas and skills identified in widely accepted content standards, then the purpose of assessment and of the standards themselves are unclear, and standards-based reform is jeopardized.

The findings from Project 2061's current NSF-funded assessment study described earlier (ESI-9819018) provide a useful framework for the new work that is proposed here. Project 2061's ongoing efforts related to assessment and to curriculum materials research enable us to identify some of the most urgent needs that the proposed work will address. Key areas where the proposed products and tools are needed include:

Curriculum materials research. Assessment items that are aligned with content standards but not specific to any single materials development project will enable researchers to compare the effectiveness of various instructional materials objectively. Curriculum researchers need assessment items that policy makers and the public regard as fair measures of student knowledge. Without credible evidence that new and innovative materials can help students learn, stakeholders may decide that the benefits of implementing such materials do not justify the costs.

Researchers also need high-quality assessment items linked to content standards to test such things as the comparative effectiveness of instructional sequences, the viability of particular visual representations of abstract concepts, and the value of using certain phenomena and real world examples to make ideas concrete and understandable to students. Broad stroke evaluation of the effectiveness of curriculum materials is not enough. As a result of the work of our new Center for Curriculum Materials in Science (ESI-0224186), we recognize that items that are aligned to content standards are essential for conducting rigorous, fine-grained research on materials as they are being developed. Existing assessment items are not focused enough on specific content standards to be used for these purposes. Without assessment tasks that provide precise measures of student understanding of the specific ideas and skills addressed in the curriculum material, it is impossible to conduct rigorous research studies with replicable results.

Materials-embedded assessment. High-quality assessment items linked to content standards should also be integrated into instructional materials themselves. Through our evaluations of curriculum materials and through our role as consultants on curriculum development projects, we have learned that developers do not consistently include high-quality assessment items in their materials. Even when assessment items are present in the materials, they are not deployed strategically so that teachers can gauge students' understanding of the ideas or skills being taught and modify their instruction accordingly. Instructional materials generally include questions, but they are not explicitly linked to specific content standards and are not included as a way to give teachers feedback on how they can improve their instruction based on how students respond to those questions. Most assessment items seem intended simply to provide students with additional practice and to document their success or failure rather than to guide instruction (AAAS, 2003).

Classroom assessment. Teachers need high-quality assessment items that are linked to the content standards that their students are being held accountable for on local and state tests. For teachers to take a standards-based approach seriously and think in terms of moving their students toward the attainment of specific content standards, they need assessment resources that are aligned with those content standards. It is one thing to say that students should know, for example, that: "an unbalanced force acting on an object changes its speed or direction of motion, or both," (AAAS, 1993, p. 90), but without assessment items and other resources that focus directly on that content standard, it will be difficult for teachers to find out exactly what their students know or do not know about the ideas specified in that standard. Assessment items should offer a way to test student understanding of key ideas independent of the instructional contexts used by a particular teacher or textbook to ensure that students can demonstrate an understanding of important ideas that goes beyond merely parroting back the words they hear in class. Items should enable teachers to interpret students' thinking about the ideas and skills targeted in content standards and provide diagnostic information on what may be impeding student learning. Assessment items that are aligned to content standards also enable teachers to keep track of their students' understanding of selected ideas over time and to conduct classroom research on the effects of various instructional strategies on learning.

Large-scale assessment. Most states have adopted content standards in science and mathematics, and many have moved toward state-wide testing in these subjects. But according to the National Science Foundation's Science & Engineering Indicators 2002 report, there are persistent concerns over "the degree to which state tests align with state standards." The report goes on to identify several groups-from the American Federation of Teachers to the Council for Chief State School Officers-that have issued studies in which "the problem of alignment between standards, testing, instruction, and accountability remains a common theme" (2002).

There is also unease that given the rapidly increasing pressure to test, the demand for new items will lead to the development and use of assessment items of low quality. Concern about the quality of test items has reached the popular press as can be seen in a July 16, 2003, New York Times piece entitled "Before the Answer, the Question Must Be Correct." The article rightly suggests that "…no amount of wizardry can create a good test out of poorly written items…" (Dillon, 2003).

To be useful in a standards-based context, items in large-scale assessments that purport to assess the ideas and skills specified in content standards need to be linked explicitly to exact ideas and skills, not just to broadly defined topical areas. Test developers and test administrators, particularly those at the state and district levels, need models for items that are well aligned to the content standards targeted in state and national documents and that also conform to rigorous psychometric, linguistic, and cognitive requirements. Tests that are developed from such high-quality items can reliably inform education policy and decision making and ensure that the consequences for students, teachers, administrators, and schools are fair. The item bank we propose is not meant to be used by large-scale test developers as a major source of items. Most large-scale test developers have item security requirements that our test bank does not provide and need many more items than this project can supply. However, commercial test developers and state assessment officers can use the proposed item bank as a source of models for rigorous alignment of assessments with standards. In addition, the tools that we develop will give test developers the means to revise existing items in their own item banks so that they are more carefully aligned with targeted content standards.

Public support. Parents and other members of the public need to know what it is that children, teachers, and schools are being held accountable for with respect to the content standards of their state and local communities and what alignment to those content standards means. Clear statements of the standards themselves, as well as assessment items that measure understanding of the ideas in the standards, are essential for parents to contribute meaningfully to their children's education. Research with parents has shown that when well-informed, parents can be vital allies in education reform efforts, but, according to focus groups of parents convened by Project 2061 for its 1998 report Blueprints for Reform (AAAS, 1998), without the necessary information, parents often "hesitate to support initiatives that promote untraditional methods of learning science because they are unfamiliar with them." In opinion research conducted by Public Agenda (2000), nearly half of all parents reported that they were not aware of standards-based reform initiatives, even in their own districts. The same survey shows that when parents do become aware of these initiatives, they support them by a large majority. With easy Web-based access to content standards and assessment information, parents and other community members and organizations such as science centers, zoos, and nature museums can become a significant force in focusing the formal and informal educational experiences of children.

The Need to Clarify Content Standards

We view alignment as a precise match between content standards and assessment tasks. However, as stated earlier, the exact meaning of content standards is not always evident. Those who have responsibility for student learning and for measuring that learning must have a clear understanding of what students are expected to know and what constitutes evidence of that knowledge. The Commission on Instructionally Supportive Assessment (2001) identified nine requirements for assessments that support instruction and accountability. One of these requirements says: "A state's high priority content standards must be clearly and thoroughly described so that the knowledge and skills students need to demonstrate competence are evident." This clarification "should result in relatively brief, educator-friendly descriptions of each high priority standard's meaning" (McColskey & McMunn, 2002, p. 5).

AAAS, NCTM, and the NRC already include essays to clarify each cluster of their content standards at each grade band. These essays spell out the instructional implications of content standards by focusing on the activities that can be used to advance student understanding. These activities are based on an overall trajectory of instructional aims. For example, the AAAS essay dealing with the topic of diversity of life at the 6-8 grade band states in part: "Students should begin to extend their attention from external anatomy to internal structures and functions. Patterns of development may be brought in to further illustrate similarities and differences among organisms" (AAAS, 1993, p. 104). In mathematics, the NCTM essay for middle school algebra says in part: "Students in the middle grades should learn algebra both as a set of concepts and competencies tied to the representation of quantitative relationships and as a style of mathematical thinking for formalizing patterns, functions, and generalizations. In the middle grades, students should work more frequently with algebraic symbols than in the lower grades. It is essential that they become comfortable in relating symbolic expressions containing variables to verbal, tabular, and graphical representations of numbers and quantitative relationships" (NCTM, 2000, p. 223). Though helpful for guiding instruction, these essays do not say how students at a given grade band should be asked to demonstrate their understanding; nor are they written for each individual content standard. To ensure that assessment tasks are aligned with content standards, more needs to be done to make the meaning of each content standard clear with respect to what students can be asked to do with their knowledge.

Products and Activities

Working with teams of experienced teachers, scientists, mathematicians, and curriculum researchers and developers, Project 2061 will address the assessment needs identified above and will produce the following:

  • 20 assessment maps focusing on selected content standards to provide a context for choosing sets of items to gauge student progress and to diagnose their problems in understanding the ideas targeted in the content standards;
  • a bank of approximately 400 test items for grades 6 through 10 (including multiple choice and both short and extended open-response items) and full descriptions of each item's alignment to specific science or mathematics standards and other salient features;
  • clarifying statements for the content standards on each of the 20 assessment maps to provide insights on which ideas are-and are not-targeted in the content standard and to suggest ways in which students might demonstrate or apply the targeted ideas; and
  • an online tool for accessing the assessment items and related resources from a variety of starting points, such as a state standard, a topic, or type of assessment item.

This product development will be accomplished through the following activities to be undertaken over the course of this five-year project. Development of the item bank will be the focus of our work and involves the following efforts:

Screen and analyze assessment items. We will screen hundreds of existing middle- and early high-school science and mathematics assessment items from as many sources as possible, including released items from the TIMSS and NAEP tests and state tests. In the initial screening, items will be sorted by the content standards and the related ideas we will be targeting. (How we will define the domain of ideas around which items are to be selected is described in the section on assessment maps below.) Following the initial screening and sorting, items will undergo a more rigorous analysis to describe precisely their alignment to the ideas being targeted and to make sure that they meet specific effectiveness criteria. This analysis will be based on an examination of the items themselves and score reports of student performance on the items from states, NAEP, TIMSS, and similar instruments. The analysis procedure that we will use is modeled after a procedure previously developed by AAAS (2003) and involves the following considerations: (1) Are the ideas and skills specified in the targeted content standard needed to successfully complete the assessment item or can the item be answered without that knowledge and skill? (2) Are the ideas and skills specified in the content standard enough by themselves to successfully complete the assessment item or is other knowledge and skill needed? (3) Are students likely to understand the task statement, diagrams, symbols, etc.? (4) Are students likely to understand what they are expected to do and what sort of response is considered satisfactory? (5) Is the task context appropriately familiar, engaging, and realistic to students? (6) Could students respond satisfactorily to the task by guessing or employing other general test-taking strategies? (7) Are scoring rubrics for open-ended items accurate, clear, complete, and specific?

In addition, to fully address equity concerns and to ensure that items are accessible to the greatest number of students, we will also review items for various linguistic features. Items will be analyzed on the basis of linguistic criteria that support student access to assessment items, especially for English language learners. Items that meet these criteria will increase the validity of interpretations that can be made about student understanding for a wider range of students. Kopriva (2000) lists the following as important considerations for making assessment items accessible: (1) Item sentences or stems must be kept brief and straightforward, with a simple sentence or phrase structure. (2) Consistency in paragraph structures should be employed. (3) The present tense and active voice should be used as much as possible. Concerning the use of visuals in test items: (1) Visuals should mirror, or parallel, the item statements and expectations. (2) No supplementary or unnecessary information should be placed in the visual to distract students from the requirements in the item. (3) Simple text can and should be used in the visuals that correspond to important words in the item. Although these recommendations were written in the context of English language learners, the principles apply to all students. The point is that to draw valid inferences regarding a construct being measured, linguistic issues must be taken into account. The question that needs to be asked is: are there any features of an item that may limit access by any particular group of students?

We will also examine items for the cognitive demands that they place on students. We will draw in part from work currently being done by Baker et al. (2002) at the Center for Research on Evaluation, Standards, and Student Testing (CRESST) at UCLA and also the classification of knowledge and process categories of Anderson and Krathwohl (2001) adapted from Bloom (1956). We will also make use of the cognitive demand designations used in the alignment models of Achieve (n.d.), Webb (1999), the Council on Basic Education (n.d.), and the CCSSO Survey of the Enacted Curriculum (n.d.) mentioned earlier in this proposal. Finally, using psychometric consultants from the Department of Measurement, Statistics, and Evaluation at the University of Maryland, items will be reviewed for psychometric features and the impact that inclusion of items might have on whole-test construction.

Revise items. Using the written reports from the analysis of the items described above, items will be revised to correct deficiencies that hinder alignment. Based on these analyses, the context of the item might be changed, language clarified or simplified, or distracters replaced. In addition to addressing issues raised by Kopriva's work described above, the revision process will also make use of work being done by Jim Minstrell (1992) on facets of knowledge. Facets are bits of knowledge or strategies for reasoning (both correct and incorrect) used by students when faced with problem situations. Facets can be very specific or quite general. Examples include: "Active objects [like hands] exert forces." "Passive objects [like tables] cannot exert forces." "Heavier objects fall faster." Some facets are generic and cut across subject areas: "More of one thing means more of another thing." We will draw on Minstrell's work (1982a, 1982b, 1984, 1989, 1992, 2001) as well as other available research in this area to incorporate what is known about student thinking into distracters and for redesigning test items to probe student thinking.

For some content standards, a significant body of research already exists on the preconceptions that students often hold (see, for example, AAAS, 1993 and Driver et al., 1994). For content standards where the research on student learning is more limited, we will administer open-ended tasks related to the content standards to a representative sample of students from schools serving diverse populations. These open-ended tasks will be used to gain additional insights into student thinking and to draw attention to productive areas for further research on student thinking. As part of this work we will interview a sub-sample of students about their responses. We will use a modified version of a procedure used by Driver et al. (1994). Interviews will include, for example, the introduction of a discrepant event to challenge the students' explanations and further probe the rigidity of their knowledge frameworks. Results of these interviews will help us design distracters for assessment items that will enable us to more effectively probe student understanding.

Reanalyze items. Following the initial analysis and revision of items, the revised items will be field tested in a wide sampling of school districts around the country. Teachers with whom we have worked over the years will provide us with access to students from a wide range of backgrounds. Items will then be reanalyzed using the student data and the analysis procedures described above. Following this reanalysis, reviewers will make recommendations to accept items, further modify the items, or to eliminate them from the item pool.

Describe item features. Each item that is retained will be accompanied by descriptive information concerning the details of its alignment with content standards, the knowledge needed to answer the item correctly, whether the item tests for common misconceptions, and whether the item is likely to be approached differently by diverse learners-taking into account the item's use of visuals, linguistic demands, etc.

The description will contain a statement about whether the assessment item measures declarative knowledge (concepts), procedural knowledge (skills), or contextual knowledge (applications). These categories are similar to the categories of conceptual knowledge, scientific investigation, and practical reasoning used in the Science Framework for the 1996 and 2000 National Assessment of Educational Progress (U.S. Department of Education, 1999). For items in which students are asked to apply their knowledge, a further description of the type of application will be included as well. For example, items may ask students to rephrase an idea in their own words, explain a phenomenon, identify a generalization based on relevant instances, etc. In mathematics, items will be categorized according to the levels of complexity described in the 2004 Mathematics Framework for the National Assessment of Educational Progress (U.S. Department of Education, 2001). These categorizations of items will allow them to be classified, and thus retrieved, by item type.

To guide our selection, screening, and revision of each assessment item, we will frame our work through the following activities. The resulting products will be available to those who want to create or revise items themselves and to those who want to construct assessment scales based on the conceptual framework provided in the assessment maps.

Create assessment maps and link items to maps. We will create an assessment map for each of 20 middle- and early high-school science and mathematics content standards selected from national standards documents. (See Appendix A for an example of an assessment map.) Assessment maps reflect the interconnectedness of ideas by showing a progression of learning from prerequisite ideas to targeted ideas to more sophisticated ideas. Each map will be built around one or more content standards. The maps will include the ideas from the content standard itself, prerequisite ideas, one or more related ideas that come later in the developmental trajectory, and common misconceptions that have been confirmed through research on student learning. The maps will allow test developers to choose assessment items that can yield diagnostic information about student learning, especially with respect to misconceptions and prerequisite knowledge that pertain to specific ideas on the maps.

Maps are also a practical device to provide test developers with a convenient visual boundary around the ideas they might want to test at any particular time. The maps are not, however, a template for test construction. They simply present in a convenient format the targeted ideas and related ideas that could be tested. A test might be constructed around all of the ideas or just one and might take into account some of the prerequisite ideas and misconceptions or only a subset. Nor are the maps restrictive. Single maps can be combined with other maps to focus test design on a larger set of ideas at the same time.

We will provide 20 assessment items for each of 20 assessment maps, at least one item per idea represented on a map for a total of at least 400 items. There will be a range of items-from low cognitive demand items to high cognitive demand items, and both multiple-choice and free-response items. There will be items where visual representations play a large part in describing the problem and those where word descriptions are used. Having this range of items is particularly important when testing students with differing capabilities and learning preferences (Kopriva, 2000).

We have already developed 10 maps. In science the maps deal with Control of Variables, Changes in the Earth's Surface, Flow of Matter and Energy in Living Systems, Newton's First Law, Kinetic Molecular Theory, Conservation of Matter, and Light and Sight. In mathematics the maps deal with ideas in Number, Algebra, and Data. The new maps that we will develop for the proposed project are in addition to the 10 existing maps.

The maps will be interactive so that users will be able to click on specific ideas on the maps to access the items in the test bank as well as their own comparable state standard. In fact, all resources will be accessible from the assessment maps. This is described in the section on integration of products below.

Create content standard clarification statements. We will write and include in a Web-based utility clarifying statements for each content standard identified on the 20 assessment maps. These clarifying statements will focus on what each content standard does and does not suggest regarding what students should be able to do with their knowledge and skills. As stated above, the existing essays that accompany AAAS's Benchmarks and the NRC's National Science Education Standards were written primarily to provide guidance on the kinds of learning activities that students should be engaged in. The statements that we will write will help users to see in more detail how assessment items are related to the ideas in the content standards. The statements will describe what knowledge is and is not included in the content standard, the ways that the knowledge in the content standard might be demonstrated by students, task contexts that are appropriate and engaging to students at that age, and the range of cognitive skills that students might reasonably be expected to use to demonstrate their understanding of the idea. The clarification statements will be linked to, and therefore accessible from, the content standards on the 20 assessment maps.

With finalized items and other assessment resources in hand, we will then focus on making the items and resources easily accessible through the following activities:

Integrate assessment items, maps, and accompanying information and link to state and national content standards. We will directly link assessment items, maps, and accompanying information to national benchmarks and standards and indirectly to state standards through the McREL/Align-to-Achieve Academic Standards e-Library. This e-library is a database of state and national content standards that have been explicitly linked together based on grade level and the ideas targeted in the standards. The e-library also contains its own synthesis of these standards, which is called the Compendix. The Compendix ( lists a total of 154 middle-school benchmarks in science and mathematics plus additional benchmarks that are appropriate for early high-school students. The Compendix benchmarks closely overlap with the content standards produced by AAAS, NCTM, and the NRC.

Although we are purchasing the Align-to-Achieve e-library database and the software that links the state and national standards, we will create our own customized user interfaces and functions. This will allow us to seamlessly integrate the assessment maps, clarification statements, prerequisite ideas, common student misconceptions, items, and item descriptions with the database of cross-linked content standards. Eventually we will be able to provide access to additional resources-visual representations of scientific concepts, examples of scientific phenomena, and question sequences-that can help students to learn and teachers to teach the ideas targeted in the selected content standards.

The online utility, hosted on the Project 2061 Web site, will provide access from any set of standards-whether at the state, local, or national level. The utility will offer users free access to all of the resources developed. Users will be able to find resources using topic and key word searches or by browsing the assessment maps or the section headings from the various standards documents.

To be sure that the utility is easy to use and meets the needs of its potential users, we will conduct interviews with teachers and administrators, parents, materials developers, curriculum researchers, and state assessment and curriculum directors and will incorporate their ideas into the design of the utility interfaces. A prototype utility will be tested with users to be sure that it functions according to design specifications and that it can accomplish the intended tasks (e.g., easily access assessment items from any K-12 standard document). It will then be revised based on the feedback we receive.

Once the database is created, it will be a permanent feature of the Project 2061 Web site and will be updated regularly as we develop additional assessment maps and continue to add high-quality assessment items to the item bank. Over time it will become part of our growing collection of Web-based resources. The site license purchased from Align-to-Achieve provides free updates as states modify their standards. The Web site will also be linked to the National Science Digital Library (NSDL).

Dissemination. Disseminating the assessment items and tools that we create is essential to the success of the project. Dissemination efforts will target researchers in science education (including faculty, doctoral students, and postdoctoral fellows) through the NSF Centers for Learning and Teaching; curriculum developers; state directors of curriculum and assessment; teacher educators; and classroom teachers. Centers that will be particularly interested in these tools for curriculum research purposes include the Center for Curriculum Materials in Science and the new mathematics curriculum materials center. We will communicate information about the resources that we develop through the Project 2061 newsletter and Web site, through communication outlets of organizations such as NCTM and the National Science Teachers Association (NSTA), and through Web-based links with other organizations. We will present papers at professional meetings such as the National Association for Research on Science Teaching, the American Educational Research Association, the Association for Supervision and Curriculum Development, NSTA, NCTM, and other relevant organizations, and we will submit articles to refereed journals such as the Journal for Research in Mathematics Education and the Journal of Research in Science Teaching, and to journals that reach a more broad-based audience such as Mathematics Teaching in the Middle School, Mathematics Teacher, The Science Teacher, Educational Leadership, and The Kappan. We will utilize the print and Web-based distribution outlets of our NSF-funded public outreach campaign (ESI-0103678) to inform parents, informal science organizations, and other community members of the assessment bank and related tools. We will also disseminate information about our work through the popular press.

Advisory Board

An Advisory Board will meet in years two and four to review the project's activities and products and to provide feedback and counsel. The following individuals have agreed to serve: Theron Blakeslee, Director of the Math and Science Center of Jackson County, Michigan; Rolf K. Blank, Director of Education Indicators, Council of Chief State School Officers; Danine Ezell, Science Specialist, San Diego Public Schools; Fred Goldberg, Professor of Physics, Center for Research in Mathematics and Science Education, San Diego State University and developer of the Constructing Ideas in Physical Science curriculum; Marshall Gordon, mathematics teacher at the Park School in Baltimore, Maryland; Mary Lindquist, Callaway Professor of Mathematics Education, Emeritus, Columbus State University; Virginia Malone, vice president for evaluation at Harcourt Brace; Marge Petit, Senior Associate, National Center for the Improvement of Educational Assessment (Center for Assessment); Barbara Reys, Professor of Mathematics Education and Director of the Show-Me Center, University of Missouri; Norman Webb, Senior Research Scientist, Wisconsin Center for Education Research, University of Wisconsin; and David E. Wiley, emeritus professor, School of Education and Social Policy, Northwestern University and Research Faculty, Center for the Study of Assessment Validity and Evaluation, University of Maryland.

Work Plan

The activities described above will be distributed over the five years of the grant. Each year the review teams will create four assessment maps; clarify the relevant content standards; and screen, review, revise, and pilot-test assessment items. In Years One and Two, the technology team will develop the basic architecture for the Web-based item bank and related tools. They will also conduct focus groups with typical users to define needs and revise the design based on that feedback. As items are screened and added to the item bank over the course of the five years, the technology team will create links between items, maps, and state and national content standards and build in functions such as browsing, searching, and sorting. Dissemination activities will take place throughout the life of the grant, including presentations at relevant meetings and conferences and submission of papers to journals and culminating with a final rollout when the item bank is completed in Year Five.

Results of Prior NSF Support

With support from NSF, Project 2061 has developed an array of science literacy tools to promote understanding and use of content standards, beginning with the publication of Benchmarks for Science Literacy (AAAS, 1993) (ESI-9350003; $5,000,000; 10/93-9/99). To increase understanding of conceptual connections among K-12 learning goals, Project 2061 published Atlas of Science Literacy (AAAS, 2001) (ESI-9618093; $4,746,014; 4/97-3/01), which has sold nearly 15,000 copies, and is serving as the basis for several recently submitted NSDL and Math and Science Partnership proposals. To improve the quality of science and mathematics curriculum materials, Project 2061 developed a set of criteria to analyze their alignment to important learning goals and the quality of instructional support they provide for those goals (ESI-9553594; $888,466; 3/96-2/97 and ESI-9618093). These criteria have been used to analyze science and mathematics instructional materials and are being used to guide the design of new materials.

In the area of assessment, Project 2061 is conducting a study of the alignment of assessment items to national and state standards and benchmarks for science and mathematics (ESI-9919018; $2,476,875; 5/99-1/04). The goals of the project are to (1) develop a set of criteria and a procedure for evaluating assessment quality and alignment and (2) demonstrate the use of the assessment analysis procedure in typical situations by conducting a series of case studies. To date, Project 2061 has analyzed nearly 500 items, including items from two large state pools and from the NAEP and TIMSS tests, along with items developed for our own research projects. We have also created guidelines for revising items based on the results of our analysis and for validating the revisions through student interviews. Project 2061 staff and consultants have conducted case studies documenting the application of the analysis procedures and the revision efforts, and presentations on our work have been made at conferences and meetings sponsored by organizations such as NSTA, NCTM, the School Science and Mathematics Association, and Research for Better Schools. Publications include "Accountability and Assessments" by Leah Bricker in Research for Better Schools' Currents, Volume 6.1, Fall/Winter 2002; "Aligning Assessment with Learning Goals," by Natalie Nielsen in ENC Focus, 2000, Volume 7, Number 2; "Putting Tests to the Test" in the Spring/Summer 2001 issue of 2061 Today; "A Revision Protocol Design: Item Revision and Impact Analysis Report," a report prepared Robert Capraro, Mary Margaret Capraro, and Mary Hammer of Texas A&M University and Kay Dighans, a Montana teacher; and "Lessons Learned from Students about Assessment and Instruction," by Richard Kitchen and Linda Wilson to be published in NCTM's Teaching Children Mathematics.


Horizon Research, Inc. (HRI) will conduct the external evaluation for the project. HRI has over 15 years of experience evaluating mathematics and science education improvement projects, including relatively small and narrowly focused teacher enhancement projects, a number of Statewide Systemic Initiatives, and materials development projects. In addition, HRI has expertise in digital library technologies and recently evaluated the development of one of the digital libraries under the National Science Digital Library umbrella.

Evaluation resources will be divided between formative and summative components. The formative component, designed to inform mid-course corrections in the project, will focus on two fundamental processes: (1) analysis and revision of assessment items, and (2) development of the online utility. The summative component, designed to gauge the impact of resources created by the project, will be guided by four questions: (1) What is the quality of the resources, including the assessment items, descriptions of item features, assessment maps, and clarification statements? (2) How effectively are the resources disseminated? (3) How are the resources used? (4) What impact do the resources have when they are used?

The collection of assessment items is the cornerstone of the project. These items will be only as good as the processes used to collect, analyze, and revise them. HRI will observe a sample of the earliest analysis sessions and will conduct focus group interviews with participants. Feedback to the project will focus on maximizing the efficiency of the iterative analysis and revision process.

A second critical feature of the project is the online utility; even the best resources are of little value if potential users see them as inaccessible. HRI will conduct a think-aloud protocol with a sample of individual potential users as they interact with the utility prototype. HRI will also arrange for a review of the prototype utility by an expert in designing user interfaces for digital libraries. Both activities will provide information the project can use to maximize the usability of the online utility.

The summative component will focus on the quality and impact of the resources. HRI will arrange for review of the assessment items, assessment maps, and clarification statements by content experts who are external to the project. Logs of search and browse activity on the Web site will be analyzed as one means of determining the impacts that resources have on users. These logs shed light on the paths that users most frequently take through the site; e.g., do users simply "grab items and go," or do they also access the assessment maps and clarification statements? The logs also can be used to compare which resources users most frequently access to what is available in the collection, which may inform future collection and development efforts.

The most valuable source of information about impacts will be the users themselves. HRI will conduct in-depth interviews with a sample of users focused on: (1) resources they are seeking when they come to the site; (2) components of the online utility they access; (3) how they actually use the resources they access; and (4) how the resources impact their work.

Effectiveness of the project's dissemination efforts will be gauged in two ways. First, HRI will survey a sample of members of the different target audiences regarding their awareness of the resources. Second, a sample of individuals using the online utility will be surveyed to gather demographic information.

HRI will report evaluation findings informally to the project staff through regular phone and e-mail contact. In addition, HRI will prepare one formative memo and one evaluation report each year detailing all evaluation activities and findings.


George E. DeBoer is deputy director of Project 2061 and will serve as PI on the project. He holds a Ph.D. in science education from Northwestern University and joined Project 2061 from the Division of Elementary, Secondary, and Informal Science of the National Science Foundation. He is associate director and co-PI for the Center for Curriculum Materials in Science, and co-PI on Project 2061's IERI mathematics project and the Project 2061 assessment project. He has been a professor of education at Colgate University since 1974 where he taught courses in the teaching of science and mathematics and in applied research methodology in the social sciences. At Colgate Dr. DeBoer held a number of administrative positions including chair of the Department of Education, acting director of the Division of Social Sciences, and director of the Master of Arts in Teaching Program. His primary research interests lie in clarifying the goals of the science curriculum, analyzing the history of science education, and analyzing the many meanings of scientific literacy. He has written extensively on these topics.

Jo Ellen Roseman is director of Project 2061 and will serve as co-PI on the project, helping to coordinate all efforts at AAAS/Project 2061. Dr. Roseman is also director and PI for the Center for Curriculum Materials in Science and PI for Project 2061's IERI mathematics study, which is examining the relationship between teaching and learning and characteristics of curriculum materials and professional development that can improve them. She served as curriculum director for Project 2061 from 1989 through 2001. In that capacity she was involved in the design, testing, and dissemination of Project 2061's science literacy reform tools. She participated in the development of Benchmarks for Science Literacy, which describes specific K-12 content standards on the way to science literacy, and directed the development of Resources for Science Literacy to help educators focus curriculum, instruction, and assessment and their own professional development on science literacy. She holds a Ph.D. in biochemistry from Johns Hopkins University.

Linda Wilson is an assessment expert in mathematics education who has been the primary consultant for Project 2061's IERI middle-school mathematics project and for developing assessment maps and goals-based assessments. She will be the primary consultant for the mathematics portion of this project. She has a Ph.D. in mathematics education from the University of Wisconsin. She taught mathematics education courses at the College of Education at the University of Delaware, where she was on the faculty. She helped write the Assessment Standards for School Mathematics, published by NCTM. At the U.S. Department of Education on the Voluntary National Test in Mathematics, she headed the committee that wrote the framework for the 2004 NAEP test in mathematics. Her research has included teachers' classroom assessment practices, analyses of student work on test items, the development of tests that measure specific learning goals in mathematics, and increasing the validity of mathematics test items for English language learners.

Jim Minstrell will be the primary consultant for the science portion of the project and will advise the project on issues regarding revision of test items using the facets of student knowledge approach. He has been a PI on several teaching and learning grants. Through his classroom experience and interest in the cognition of learners he has focused on development of assessment, curriculum, and teaching systems with a two-part goal in mind: to identify problematic conceptions and reasoning in learners, and to adapt instruction accordingly. His approach aims to build on strengths in the students' thinking, while specifically challenging problematic ideas and procedures. Minstrell serves as an advisor to several institutions, has delivered numerous presentations and workshops on learning and teaching nationally and internationally, and received numerous awards and honors for his research and teaching. Dr. Minstrell holds a Ph.D. in science education from the University of Washington.

Rebecca J. Kopriva is director of the Center for the Study of Assessment Validity and Evaluation (C-SAVE), which is housed in the Department of Measurement, Statistics, and Evaluation at the University of Maryland. She will advise project staff on issues regarding psychometric properties of items and issues related to access to items by English language learners through workshop training sessions and ongoing consultation. Formerly she was associate professor in the California State University System, state testing director, and consultant for test publishers, the U.S. Department of Education, national legal and policy groups, and a variety of states and districts. Dr. Kopriva is a researcher who publishes and presents regularly on the theory and practice of improving large-scale test validity and comparability. She is a leader in addressing these topics as they relate to the measurement of academic knowledge and skills in racial, cultural, and ethnic minority students and students with disabilities.

Joan D. Pasley, senior research associate at HRI, will be responsible for data collection related to science assessment and will coordinate all external evaluation activities. Dr. Pasley received a Ph.D. in curriculum and instruction from the University of North Carolina at Chapel Hill. Dr. Pasley has been working with HRI since 1994 on a number of research and evaluation projects, including the evaluation of the Ohio, South Carolina, and New Jersey Statewide Systemic Initiatives. Dr. Pasley currently coordinates the standardized evaluation system for NSF's Local Systemic Change through Teacher Enhancement project and directs the evaluation of e-Mentoring for Student Success, an online mentoring program for beginning science and mathematics teachers. In addition, Dr. Pasley manages the Increasing the Availability of Materials for the Professional Development of Science and Mathematics Teachers project.

Daniel J. Heck, senior research associate at HRI, will be responsible for data collection related to mathematics assessment. Mr. Heck received a Bachelor's Degree in Mathematics and History and a Master's Degree in Education from Wake Forest University. He is completing his Ph.D. in educational psychology from the University of Illinois at Urbana-Champaign, with a specialization in quantitative and evaluative research methodologies. Mr. Heck directed the study of the Impact of the Statewide Systemic Initiatives project, a research study of the National Science Foundation funded initiatives in 25 states and the Commonwealth of Puerto Rico. Mr. Heck currently directs the evaluation of the Indiana Mathematics Initiative and the Center for Curriculum Materials in Science. He also leads HRI's longitudinal studies of the core evaluation of the Local Systemic Change project.

Iris R. Weiss, president of HRI, will provide consultation to the evaluation team and will review all data collection instruments and evaluation reports. Dr. Weiss received a Bachelor's Degree in biology from Cornell University, a Master's Degree in science education from Harvard University, and a Ph.D. in curriculum and instruction from the University of North Carolina at Chapel Hill. Dr. Weiss has directed many of HRI's research, development, and evaluation projects since the company's initiation in 1987 and continues to be responsible for quality control of all HRI projects.