In this chapter, we explore how student assessment in American schools might be redesigned to match the goals of science education reform that are exemplified in efforts such as Project 2061's Benchmarks for Science Literacy (American Association for the Advancement of Science [AAAS], 1993) and the National Science Education Standards (Standards) (National Research Council [NRC], 1996). The chapter is divided into three sections in which we describe current assessment practice in the United States, suggest changes that are needed in order for science reform goals to be implemented, and explore possibilities and recommendations for closing the gap between current practice and a more ideal vision of science assessment.
In reading this chapter, it is important to keep in mind two often overlooked facts. First, although many people immediately associate the term "assessment" with standardized, group-administered, multiple-choice tests, assessment actually includes an enormous range of procedures used to gather information about what a student knows, believes, or can do. Student-focused assessment procedures-for example, judging students by a compiled portfolio of work on a range of performances rather than by an average of individual test scores-are currently being developed and used effectively by many educators, meaning that the rich variety of assessment procedures is likely to increase.
Second, assessments are not used only to assign grades to students or rankings to schools. Assessments have several goals, which are classified as either "internal purposes" or "external purposes." Internal purposes for assessment include:
The decentralized system of schooling in the United States ensures great variation in the way assessment takes place from school to school, district to district, and state to state. On the whole, however, the assessment techniques that would be ideally compatible with the goals of science education reform are not widely practiced in American schools. This section presents a brief overview of current assessment practice and policies at three levels: classroom, district and state, and national. District, state, and national assessments are especially relevant to reform because of their increasing dominance of student and teacher time and their visibility to the public in displaying achievement levels in mathematics and science.
Classroom assessment remains the most important direct influence on students' day to day learning. Teacher-designed tests for an individual class can serve the needs of individual students to a far greater degree than state-wide or nationwide standardized skills tests. However, research suggests that teacher-made tests are often as limited in measuring student thinking as their standardized counterparts (Stiggins & Conklin, 1992). First, teacher-made tests are mostly short-answer or matching items that place far more emphasis on student recall than on student thinking ability. Second, evidence suggests that because teachers do not receive proper training in effective assessment methods, they tend not to change the methods they use as assessment needs change. Different assessments are needed to measure performance, effort, and achievement, for instance, but teachers tend to use the same type of assessment to measure all three. Third, because of time constraints, teachers often use the assessments that are found at the end of textbook chapters or included in the textbook publisher's package. These assessments contain mainly short answer questions that require only low-level thinking skills and simple recall of factual knowledge (Center for the Study of Testing, Evaluation, and Educational Policy, 1992). Many newer science and mathematics curriculum projects and textbooks, however, include assessments that address an array of valued outcomes and show promise for improving classroom assessment if teachers are trained in using them.
Even if teachers receive the training, time, and resources that would allow them to broaden their science assessment practices, students themselves may be a barrier. Students, especially high school students who have become test-wise, sometimes object to the more labor-intensive format of assessments that require performing tasks, answering essay questions, or providing possible solutions to open-ended problems. Parents also have become accustomed to report cards that contain letters and percentages and may question new approaches that are not clearly explained and justified.
Finally, there are logistical and technical reasons why prevailing practice persists. Standardized, machine-scored tests are efficient and cost-effective. In addition, they provide quantifiable results that are easily understood by both internal actors-teachers and students-and external actors-legislators, policymakers and the lay public. Issues such as assessing individual contributions in group activities or determining how to address student absenteeism during multiple-day tasks are quite real and require further time and effort by teachers and others.
District and State Assessments
Nearly every state has some type of state-wide assessment in place. Some of these assessments have been developed specifically to align with state curriculum frameworks in science and mathematics, and a few use performance tasks and open-ended items. However, many statewide and most district-wide tests are inconsistent with the goals of science reform. They are "off-the-shelf," standardized, multiple-choice tests that are not well-aligned with standards or benchmarks and do not allow students to develop their own solutions to problems or to analyze, synthesize, and present information on their own. The tests are often chosen for the content of their mathematics and reading sections rather than their science content, and their results are seldom used to improve science instruction.
State and district tests have high stakes for both students and schools. For students, some states require passing scores on the state assessment in order to graduate from high school; school districts often use the tests as a factor in decisions about student placement into remedial, regular, or honors classes. High-stakes tests usually place students in a passive, reactive role rather than allowing them to develop ideas or to solve problems. For schools, the consequences of these high-stakes tests are also serious. The pressure on administrators and teachers to improve test scores is enormous, overshadowing other educational concerns. Local newspapers often publish the test averages of individual schools. In some states and districts, funding decisions are based on student performance on standardized tests; those doing well get bonuses. In a few cases, school districts have been placed under state control because of poor student performance on tests.
There is considerable evidence suggesting that high-stakes tests can have negative consequences for instruction (Darling-Hammond, 1995). These tests force students and teachers to emphasize test-taking skills over and above other educational concerns, and they exclude many kinds of knowledge. Fairness and freedom from bias-particularly in large-scale assessments-continue to be issues, especially for females and minority students and those with disabilities or English as a second language. Unfair testing is especially troubling because the test results may be used for purposes as varied as tracking students, determining promotions or hiring, and allocating rewards or imposing sanctions.
Despite the multiple negative consequences that may be produced by overemphasizing the importance of standardized test results, assessment used appropriately can be a powerful factor in science education curriculum reform. It is a widely accepted claim in educational circles that what is tested is what gets taught. Therefore, poorly designed assessments are an enormous barrier to reform. At the same time, state and district science assessments that are designed with the goals and philosophy of Science for All Americans (AAAS, 1989) in mind could help produce results that are at least somewhat akin to the goals that science educators envision.
National and International Assessments
The public concern over a perceived decline in the quality of education in the United States has largely been fueled by student performance on three tests-the National Assessment of Educational Progress (NAEP), which measures student achievement within the United States; the International Evaluation of Achievement (IEA); and the International Assessment of Educational Progress (IAEP). The latter two assessments compare student achievement among countries. NAEP, commonly known as the "nation's report card," regularly assesses a broad sample of US students in various subjects.
NAEP mathematics and science scores declined from the time of the test's introduction in 1969 until the mid-1980s, when they began to improve slightly. The results of international tests and comparisons have been equally dismal. For example, in a 1994-95 IAEP study, US 8th graders ranked 28th out of 41 countries in mathematics (Beaton et al., 1996). More detailed analyses of curriculum, time spent in and out of the classroom, and other variables reveal that these rankings provide just a small part of the picture. For example, U. S. achievement has more to do with the nature of the curriculum and how mathematics is taught than the time spent in classrooms. Recent comparisons between countries and individual states have revealed that using overall US averages gives a too simplistic picture. For example, the mathematics achievement averages for many states compare well with the top countries in the world (National Science Foundation, 1996).
With these scores fueling public outrage over the quality of American education, and with Goal 4 of the President's "Goals 2000" plan for American education calling for US students to be first in the world in science and mathematics by the year 2000, the tests are likely to become even more high-stakes in the near future. An IEA assessment is planned for the year 1999, and teachers and administrators are likely to place increasing emphasis on improving exam scores as this date approaches. And while earlier calls for national achievement tests in all subjects have diminished, the National Education Goals Panel has been formed to monitor states' performances in mathematics and reading achievement, issuing annual reports of progress toward Goals 2000 (National Education Goals Panel, 1996).
Again, it is worth noting that while high-stakes standardized exams have extreme limitations and potentially negative consequences in their current design, national goals in science and mathematics reflecting Benchmarks or Standards would have enormous impact on the way these subjects are taught in American schools. The Standards document developed by the National Research Council is similar to Project 2061's Benchmarks in its call for greater depth of understanding in fewer topics. Still, there are differences in content, tone, style, and grade-level focus in these two sets of standards, which could pose a problem in implementing united efforts to improve science education through common assessments.
We have identified some important aspects of student assessments that need to change in order to support reforms such as those envisioned by Project 2061. They include assessment philosophy and practice, assessment and instruction, teacher preparation in assessment, and external assessments.
Assessment Philosophy and Practice
Communicating the goals of science education reform and the philosophy of science literacy outlined in Science for All Americans presents perhaps the greatest challenge to science education. Teachers, educational administrators, local board members, and state department of education officials are working to develop science frameworks and standards that reflect Science for All Americans, Benchmarks for Science Literacy, and the National Research Council's Standards (Blank & Pechman, 1995). To translate these state frameworks into district policy and classroom practice, an extensive communication effort is needed to promote the goals of science education reform. If science educators hope to have wide impact, they must work to influence and support state-level efforts to set and implement science learning goals through curriculum frameworks and related assessments:
Many states now test every student several times-typically at grades 4, 8, and 11 or 12-as a part of their accountability system. This type of widespread testing argues for inexpensive, easily scored tests that report quantitative scores. Because the multiple measures needed to reflect science literacy are expensive and student-centered approaches require further development to make them reliable, there is a direct conflict between current accountability needs and the interests of science education reformers. Some possibilities for overcoming this conflict are described in "Recommendations."
In designing any assessment process, it is necessary to specify several factors: the type of data needed (e.g., achievement, attitude, performance), the way it is collected, who will use it, and for what purpose it will be used. Remembering that assessments have both internal and external purposes, science education reformers can influence policymakers to keep these factors in mind. For example, assessments of science achievement to monitor the effects of state frameworks may not require testing every student in the state. Sampling and in-depth approaches such as observations and interviews could minimize the importance placed on a single high-stakes exam and the influence of such exams on teaching practice.
For internal purposes, assessments can be changed from tools to force students to learn to instruments that encourage student reasoning, scoring the student on the rigor of her or his argument; examine the quality of work samples; base results on observations of behavior over long periods of time; and support students in developing their own judgments of the quality of their work. This type of philosophy is likely to blur the boundaries between instruction and assessment and to reflect the broad goals of science literacy described in Science for All Americans.
There are barriers to changing assessment philosophy and practice, but also some possible strategies for overcoming those barriers. Teachers need to know how to design and use new assessment methods. Students also must become accustomed not only to the idea of new assessments but also to taking some responsibility for assessing their own science learning. If teachers are comfortable with new testing procedures, student opinions are likely to follow.
Policymakers, administrators, and the constituents to whom they answer may resist seemingly "soft," nonquantifiable assessments. Such procedures might not seem like "real testing" to those who were taught using traditional multiple-choice and short-answer exams. Efforts are needed to show the skeptics that new assessments are anything but "soft" and that they measure the development of higher-order thinking skills. These are the skills that education policymakers and the media insist will be needed for Americans to live healthy, productive lives in our increasingly technological world. At the same time, business leaders insist that tomorrow's workers must be prepared to be flexible thinkers in order to perform in a workplace constantly transformed by technology.
Science education reformers should emphasize that higher-order thinking skills cannot be measured by machine-scored, multiple-choice tests. In fact, development of those skills may be hampered by such tests. By casting new assessments in this light, science educators may find political allies where they otherwise would have found opponents.
Assessment and Instruction
Currently, assessment and student testing are not included in Science for All Americans or Benchmarks. Project 2061 avoids dictating assessment techniques to teachers, reasoning that teachers are better able to design flexible assessments that measure whether their individual students have reached the respective benchmarks. Nonetheless, teachers have little time to develop multiple assessments in science, especially given the effort necessary to design assessment activities that are valid and reliable. Although it may not be appropriate for Project 2061 to design assessments to accompany the benchmarks, a set of criteria for such assessments, complete with detailed examples, would help bring reform goals into the classroom. Such a guide would surely play a major role in encouraging classroom implementation and would provide an additional layer of structure and depth to reform.
Testing practice in the United States, especially at state and district
levels, still relies a great deal on short-
answer tests that emphasize reflexive rather than reflective thinking on the part of students. Classroom science assessment-used by teachers and students to diagnose difficulties, plan instruction, give feedback on progress, make improvements in learning activities, and monitor attitudes-can use a variety of tasks and methods.
New, multiple methods of assessing science knowledge and reasoning have the potential to greatly increase our ability to promote higher-order, critical thinking skills among students. New assessments can also improve equity by measuring a wider variety of student abilities and skills in science than current methods. Accommodations for individual differences and the use of tools beyond pencil and paper can make assessments more valid and useful than traditional tests, which tend to reward those students with great recall ability over those who have other academic strengths like creativity or clarity of expression.
Testing remains distinct from learning in the minds of most American students and teachers. A typical scenario, especially in secondary school, is to read the text, listen to lectures, perhaps do some lab work, and then be tested on the week's work on Friday. This process is not unlike assessment procedures in most colleges and universities. Classroom science instruction and assessment can be brought together through observations and checklists of students' performance in activities such as solving problems and conducting lab experiments, assessment of individual and group projects using several criteria, and the use of portfolios that reflect student growth and achievement on a variety of activities over time. In all of these, students can have a role in selecting the criteria for evaluating their work, in making choices about what will be assessed, and in making improvements on performances.
To help break the cycle that maintains the status quo, the recommendations of this chapter should be implemented in concert with those in Blueprints' Chapters 9: Teacher Education and 10: Higher Education, so that prospective science teachers will learn about and experience improved assessments in their own science learning. Changing teacher preparation programs and improving teacher development will go a long way to helping educators rethink how they assess student performance in science classrooms.
Teacher Preparation in Assessment
Teachers continue to rely on traditional short-answer tests for three main reasons. First, they do not feel confident that new assessment techniques will be accepted for accountability purposes by school administrators and the public at large. (This reminds us that until administrators approve, teachers are unlikely to use new assessments, even if they know how.) Also, many teachers have not yet learned how to develop and use new assessments in their classrooms. Finally, many new assessments take more time to develop or to administer than traditional tests. All of these reasons imply a great need to illustrate the need for and to train teachers and administrators to use new assessments.
Achieving this goal will require a cooperative effort between schools, colleges and universities, and state science education leaders. Teachers' fears that new assessments will not meet accountability needs could be mitigated by encouraging them to follow state science and mathematics curriculum frameworks and by providing state and district-sponsored staff development that reflects those frameworks. States that have used Benchmarks or Standards to guide their science and mathematics framework would need to rethink their methods of state-wide assessment, lending further support to teachers who wish to align classroom assessments with science literacy goals.
Both teacher preparation and staff development can focus on assessment
along with other components of science instruction. Rather than ignoring
assessment or keeping it as separate course or workshop, as has been done
in past work with teachers, professional development can integrate work
on assessment with work on instruction and materials. Teachers can also
learn how to analyze existing assessments in the same way they analyze
curricula to determine how well they meet benchmarks and standards. These
experiences would provide a natural way for science teachers to think about
fusing these functions in their classrooms.
Science education reform can make great strides by working to include open-ended questions and other performance-based tasks in district, state, and national exams. However, teachers cannot be expected to change their assessment procedures only to be judged on the students' performance on standardized, multiple-choice exams. Realizing this, some states have brought together test publishers, state education officials, the reform community, and school personnel to create new forms of assessment. As other states implement frameworks that are based on Benchmarks and Standards, they will need to align assessments with these new goals.
Perhaps the greatest barrier to moving away from machine-scored tests is that they are efficient and cost-effective. It is hard to imagine the time and effort that would be required to judge and score open-ended responses and performance of an entire state's students at several grade levels. As assessments change to reflect the goals of standards-based reform, states will need to rethink the uses of their assessments. States must consider seriously the ideas of sampling and of returning responsibility for monitoring student progress and graduation requirements to local districts. It is critical that these issues be recognized and discussed, and that a viable solution be found.
Recommendations for Improved Assessment
The previous section presented the changes needed in assessment that are necessary for implementing reform in science education. We now describe characteristics of ideal student assessment and recommend some steps toward this ideal.
Assessment and Science Content
Ideally, the content of assessment activities, both for internal and external purposes, should reflect the content of Benchmarks and Standards. State and district frameworks for science and mathematics education should embrace these documents, and assessment programs be linked to those frameworks.
Any assessment program designed for use in science classrooms should frequently test student familiarity with and comprehension of systems, models, constancy, patterns of change, evolution, and scale, as well as assessing their habits of mind-curiosity, openness to new ideas, and skepticism-as described by Project 2061. This emphasis on themes rather than on bits of information-on habits of mind rather than on recall-means that assessments should stress reflective thinking rather than reflexive thinking.
Assessment strategies and activities should be available for each science curriculum unit used by teachers. These curriculum units-and the assessments that accompany them or that are selected or developed by science teachers-should be analyzed using a valid, comprehensive, and standardized procedure that describes their alignment with Benchmarks and Standards.1
Science assessment activities can also be occasions for learning. Rather than setting aside time for the testing of memorized facts, teachers and students together can design assessments that are integrated with the curriculum.
Appropriate and Fair Assessment
Assessment procedures must be appropriate and fair for all students. Techniques that aim to promote equity-to ensure that all American school students have an equal opportunity to learn science-will emphasize student accomplishment rather than document failure (Malcom, 1991).
Pre-service programs should ground future science teachers in a variety of assessment techniques. All teachers should be well prepared to understand and use effective and varied ways to judge student performance and to develop effective methods for blurring the line between testing and teaching.
Although state and district policies should emphasize assessment, teachers should be prepared to distinguish between assessments designed to meet state and district accountability needs and the richer, more comprehensive science assessments they use for instruction.
District, state, and national level examinations should be based on the content of Benchmarks for Science Literacy and the National Science Education Standards. Because of content and student sampling considerations and time limitations, multiple-choice items may be used on these exams, but the tests should also include tasks that are similar to those used for internal purposes-open-ended questions, essays, or performance tasks. The format of the exams should be familiar to students to avoid jeopardizing the validity of inferences drawn from test results.
Assessment techniques for both internal and external purposes should meet acceptable standards of validity, reliability, feasibility, and equity. The National Research Council's National Science Education Standards (1996) includes a set of standards for assessment in science education, and the National Council of Teachers of Mathematics has published Assessment Standards for School Mathematics (1995). Both of these documents are good starting points for addressing criteria that should be used to judge science assessments at all levels-classroom, district, state, or national. In addition, the American Educational Research Association, the American Psychological Association, and the National Council for Measurement in Education have established standards for testing, and several books and journals are devoted to the quality of tests.
1Project 2061 has developed a procedure for analyzing how well the content and pedagogy of science curriculum materials match Benchmarks and Standards, and is training teachers and others to analyze the materials. A small set of science curriculum materials has been analyzed; work is underway to analyze more materials, develop a greater capacity for training people to do the analysis, and develop a procedure for analyzing assessment activities.
American Association for the Advancement of Science
Copyright © 1998 by American Association for the Advancement of Science