Policy & Student Learning: What Textbooks, Assessment, and Professional Development Can Contribute

A Conference Report
Drafted by Andrew Porter
University of Wisconsin–Madison

September 12, 2002

[Return to the conference documents]


In its continuing effort to significantly improve student learning in science, mathematics, and technology, Project 2061 of the American Association for the Advancement of Science hosted a conference on state and district policies that influence student learning May 15-17, 2002, in Washington, D.C. The conference, the third in a series of Project 2061 conferences dedicated to improving science and mathematics textbooks and curriculum, examined policies that affect the quality of instructional materials, student assessment, and teacher professional development.

Jo Ellen Roseman, acting director of Project 2061, opened the conference. She noted that, although the past 2 decades of research have produced some curriculum materials and examples of teaching focused on helping students understand important science and mathematics ideas, the scaling up of these effective materials and practices has been difficult. With regard to curriculum materials, recent evaluations have shown that in science—unlike mathematics, where several effective research-based textbooks series have been developed and are being implemented on a national scale—very little has been done to incorporate the best that research and experience have to offer. Furthermore, student assessments, which are becoming increasingly important in science, may not align well with what states and districts agree are important ideas for students to learn.

The conference focused on five case studies that shed light on state and district policies that contribute to students’ learning important ideas in math and science. The case studies addressed the selection and implementation of curriculum materials, the role played by state assessment instruments, and the professional development available to teachers. Discussion attempted to identify patterns across the case studies that can inform state and district efforts to improve student learning.

Dr. David K. Cohen, Professor of Education and Public Policy at the University of Michigan, gave the keynote address, presenting findings from his recent book, Learning Policy: When State Education Reform Works (coauthored with Heather Hill). Gary Sykes, Professor of Education at Michigan State University, chaired the conference and provided the closing analysis and summary.


Conference Attendees

The conference was attended by 81 people representing a rich mixture of groups knowledgeable about and having a stake in math and science textbooks, assessments, and professional development. Approximately 20% of the attendees came from school district staff, and another 20% from higher education. There were about 10% from each of the following groups: textbook publishers, education professional organizations, state departments of public instruction, the U.S. government, technical assistance centers, and research houses. There were a couple of teachers, and a principal as well.


Intervening in Instruction to Enhance Teaching and Learning

In his keynote address, Cohen stressed the importance of research that shows connections between policy and student achievement and gives careful attention to the nature and quality of instruction as an intervening variable. He emphasized that teachers’ responses to curriculum policies (as evidenced in their instructional practices) are filtered through their knowledge and norms. Surprisingly, relatively little research on the effects of policies has included good measures of teachers’ content knowledge and pedagogical content knowledge. Yet these two sets of variables may offer strong explanations for how policies lead to instructional practice. More work needs to be done to operationalize both teacher content knowledge and teacher pedagogical content knowledge and to develop adequate measures of these.

Cohen listed curriculum materials, assessments, professional opportunities to learn, and incentives as instruments of practice. He characterized these instruments as working in concert to produce desired change, concluding that curriculum materials alone are unlikely to succeed in creating the types of instruction sought by today’s reforms.

Cohen concluded that teachers’ knowledge of science and math is modest, school managers know even less, teachers and managers both receive conflicting guidance about what should be taught and how, and generally professional development is weak. He concluded that tests, academic standards, and accountability by themselves rarely influence practice in the directions and to the extent hoped.

Cohen sketched a more hopeful picture of California's math reform from 1985 to 1995 (the reform investigated in his book with Heather Hill). He described the California reform as consisting of a math framework, a new state test matched to the framework, new curriculum materials aligned with the framework, and professional development aligned with both the framework and the test. The curriculum materials were replacement units, each covering a single topic and lasting 4–6 weeks. Cohen described the replacement units as grounded in practice, supported with aligned professional development, and fitting together so that as a set they aligned with the state framework.

Cohen and Hill surveyed 600 elementary school teachers to investigate teachers’ opportunity to learn, their instructional practices, their school conditions, and their students’ achievements. Cohen and Hill found positive effects on student achievement from replacement units, teachers’ scoring of performance items on the state test (the California Learning Assessment System, or CLAS), and professional development on replacement units. The major distinction between the successful California approach and the general approach described earlier by Cohen was California's provision of aligned and curriculum-embedded professional development.

Cohen ended his talk where he began, calling for more research and evaluation on the effects of reforms on student achievement mediated by instructional practices. He urged researchers to undertake research on the alignment among standards, assessments, curriculum materials, and professional development. Cohen further advocated that professional associations should play a larger role in reform so that education becomes less dependent on the government.



Four interactive sessions at the conference provided a look at the AAAS Project 2061 procedures for judging alignment between (a) national and state science and mathematics standards (Kathleen Morris and Lori Kurth), (b) curriculum materials and standards (Linda Hackett and Ann Caldwell), and (c) assessments and standards (Leah Bricker). All served to highlight the importance of examining alignment to specific learning goals in standards (rather than to just their general topic).  Since both benchmarks and standards specify a carefully and deliberately chosen subset of knowledge and skills on any particular topic, judgments about alignment should only be made with this subset in mind.  At the same time, it is important when assessing alignment to recognize any content in the assessments or materials that is not in the standards as well as content in the standards but not in the assessments or materials.

These four interactive sessions were useful in clarifying Project 2061 procedures for judging alignment.  Exercises were conducted in a small-group format, allowing all participants to have firsthand experience with attempting to judge alignment among standards, assessments, and materials.  Group discussion identified differences in judgment and helped all to better understand how difficult alignment can be to achieve.

Not only was alignment discussed during the four interactive sessions, but alignment was a major focus of the entire conference.  Conference presentations and discussions went a long way toward clarifying what is meant by alignment, how alignment should be judged, who should do the judging, and how the results should be used. In addition to the Project 2061 procedures for judging alignment, recognition was given to methodologies developed by Achieve, Inc.; and the Council of Chief State School Officers, working with the National Institute for Science Education.

Conference discussions made clear that alignment remains for many a somewhat elusive concept, notwithstanding its essential role in standards-based reform. Uniformity does not exist in the criteria used for judging alignment, the objectivity and replicability of the procedures followed, policies on use of third parties to make judgments, or the uses to which alignment is put. There remain many questions for which consensus answers are net yet available. For example, is it important to have a quantitative index of alignment?  Since alignment exists in degrees, how much is enough?  When can a state say its tests are sufficiently aligned to our content standards? Is it necessary to look at alignment between assessments and standards not only from the perspective of the assessment (i.e., is everything on the test in the content standards?), but also from the perspective of the content standards (i.e., is everything in the content standards assessed?)? When judging alignment, what “grain size” is most useful?  If alignment must be an exact match between, say, a test item and a standard so that each is virtually identical, then no assessment will be judged to be aligned (very small grain size).  On the other hand, if one only asks if a test item is about mathematics and the standard is about mathematics, then every test would be found to be aligned (very large grain size).  Generally, participants agreed that questions of alignment of assessments to standards should not be left to test publishers because of their vested interest in marketing their products.


Case Studies

El Centro, California

(Presenter: Olga Amaral, San Diego State University)

The El Centro reform initiative addresses science reform in kindergarten through sixth grade and involves a local and regional K-16 collaborative partnership among districts, business, volunteer scientists, and the Imperial Valley Campus of San Diego State University. The El Centro reform is comprehensive, encompassing (a) curriculum, (b) professional development, (c) materials support, (d) administrative and community support, and (e) assessment and evaluation.

Located in the extreme southwest corner of California, Imperial County is the poorest of all 58 counties in the state, with 30% unemployment and a mean income of approximately $16,000. Of the total K-16 student population, 73% receive free or reduced-price lunch, 51% are designated English language learners, and 81% are Hispanic.

The science curriculum focuses on life, earth/space, and physical science, with three to four science units covered each year at each grade level. The curriculum units are designed to be (a) developmentally appropriate, (b) research-based, and (c) focused on “big ideas.” The units give students the opportunity to explore, investigate, inquire, question, test hypotheses, and collect and analyze data.

A sustained professional development program involves over 100 hours across 5 years, along with both preservice and master’s degree components. Inservice professional development is embedded in the content of the science units and includes a lesson lab component.

The reform project’s administrative and community support component offers administrator training on content, pedagogy, classroom supervision, teacher evaluation, and assessment. A volunteer scientist initiative trains volunteers to serve as content consultants, validate the curriculum, model questioning, model inquiry, and become community advocates. The project includes a parent education initiative, as well.

Evidence of the reform’s effects can be seen in student achievement. The project has compared the SAT-9 science scores of participating and nonparticipating classes and of individual students based on their cumulative years in the program. In each case, the program has been found to yield strong positive effects. Admittedly, student stability is confounded with years in the program, and there is no randomization of participation versus nonparticipation. Still, the results in science are encouraging, and there are similar positive results for reading, math, writing proficiency, and science notebooks. The writing and reading improvements are likely a function of the reform’s science literacy components. The project’s effects on mathematics scores are less easily understood.

The El Centro reform is highly focused and ambitious, with an impressive, comprehensive set of program components.


(Presenters: Patricia Campbell, University of Maryland; Melva Greene and Andrea Bowden, Baltimore City Public Schools)

The Baltimore reform focuses on kindergarten through fifth-grade mathematics, drawing in part on National Science Foundation urban systemic and local systemic initiative funding and in part on special state funding. The collaboration among the Baltimore City Public Schools, the University of Maryland–College Park, Morgan State University, and the Maryland State Department of Education has undertaken an ambitious and comprehensive approach to reforming elementary school mathematics.

The Baltimore City Public School System serves approximately 50,000 students. In 122 elementary schools, 74% of the students receive free or reduced-price lunch, and 87% are African American. Scores on student achievement tests have been low historically relative to state and national norms.

When the reform began in 1995, the mathematics program was diagnosed as exhibiting “curriculum overload,” too many objectives, lack of uniformity in instructional materials, too little and poor-quality professional development, and conflicting curriculum policies. Six years of reform have resulted in (a) a standards-based K-5 mathematics curriculum; (b) one districtwide textbook aligned to the curriculum; (c) an instructional model that shifts instruction from a show-and-tell demonstrated practice approach to an approach based on asking questions and questioning answers (called Mathematics: Applications and Reasoning Skills, or MARS); (d) an aligned program of professional development, with a requirement of at least 100 hours for each teacher; (e) instructional leaders in mathematics in two thirds of the elementary schools; and (f) systemwide, grade-level assessments, consisting of 15 assessments aligned with the curriculum and administered and scored by teachers.

Since 1998, Baltimore district student achievement medians on national percentiles are up at every grade level, both for concepts and applications and for computation. Increases are dramatic on the Comprehensive Test of Basic Skills (CTBS), but improvement is also seen on the Maryland School Performance Assessment Program (MSPAP) test.

Clearly, the Baltimore reform made excellent use of additional funding from NSF and the state. The reform initiative also profited from excellent leadership both at the university level and in the district.


(Presenter: Diane Briars, Pittsburgh Public Schools)

The Pittsburgh reform focuses on mathematics in Grades K-12, though the case presented was limited to Grades K-8. Like other reform initiatives described at the conference, the Pittsburgh reform is multifaceted; like the Baltimore effort, the Pittsburgh initiative received some additional funding from NSF. Beginning in 1990 with kindergarten, the initiative has phased in new instructional materials for an additional grade each year. In elementary school, the curriculum materials are Everyday Mathematics; at middle school, Connected Mathematics. Assessments are the Iowa Test of Basic Skills (ITBS) and the world-class New Standards. Student performance is reported by performance level. There is a curriculum-embedded and sustained professional development program for both teachers and administrators, delivered in part by school-based resource teachers. A parent component attempts to elicit parental support and involve parents in the education of their children. Accountability is at the school and teacher level.

The Pittsburgh Public Schools serve approximately 40,000 students, with 59 elementary schools, 19 middle schools, and 11 high schools. Of the Pittsburgh student population, 56% are African American, and 62% qualify for free or reduced-price lunch.

Leaders of the Pittsburgh reform conclude that the key barrier to successful implementation is lack of teacher content knowledge. Their vigorous professional development program includes in-class support using resource teachers and professional development embedded in standards-based instructional materials. The Pittsburgh reform puts an unusually heavy emphasis on measuring the quality of reform implementation in the classroom.

Effects on student achievement are seen through pre-reform/post-reform comparisons, as well as comparisons between classrooms with a strong reform implementation and those with a weaker implementation. For all grade levels and on both New Standards and ITBS, the comparisons have shown improved achievement in favor of the reform effort.

Once again, the importance of external funding can be seen. Also, the reform was led by highly qualified, charismatic leaders and put an emphasis on instructional leadership at both the district and school principal levels.

Indian River School District, Delaware

(Presenters: Linda Bledsoe, Cedar Lane Elementary School, Middleton, Delaware,and Jon Manon, University of Delaware)

In 1995, the Indian River School District decided that, although its elementary school mathematics achievement compared favorably to the state average, achievement in the district was not good enough. In particular, the district wished to address the achievement gap between African American and White students. Reform planning began in 1995; by 1998, the district had adopted the NSF-funded curriculum Math Trailblazers. With the new curriculum materials as its lead policy instrument, Indian River built a system of reform with a heavy emphasis on professional development. All elementary mathematics teachers participated at least monthly in a professional development program known as the Math Club. The initiative created a cadre of lead teachers to provide school-based leadership, engaged a district mathematics specialist to provide district-level leadership, and sought additional funding through grants. Other components of the reform include curriculum-embedded assessment, monitoring of individual student performance to ensure that no child falls behind, increased instructional time in mathematics, and parents’ involvement in support of the reform and their own children’s learning.

Strategically, Indian River seeks to create increased teacher responsibility for student learning. More generally, the district wishes to shift its focus from teaching to student learning. The curriculum is ambitious, aligns with state standards, provides vertical articulation of “big ideas,” focuses on problem solving and communication, and achieves a balance between conceptual understanding and skills.

Performance on the state mathematics test has been impressive, with steady increases from 1998 through 2001 at both third- and fifth-grade levels. Student achievement, disaggregated by race, shows steady reductions in the achievement gap from 1998 to 2001.

The program is now being replicated in eight other Delaware school districts. The success of these replications is not yet known.

Battle Creek Area Mathematics and Science Center, Michigan

(Presenters: Theron Blakeslee, Connie Duncan, Chris Lapekas, and Barry Linscott)

The state of Michigan has regional mathematics and science centers, one of which serves the Battle Creek area. The state has also taken on a leadership role in reforming K-12 science education. The case study presented at the conference described the collaborative relationships between the Battle Creek center and the state, on the one hand, and districts and schools drawing on the center’s expertise, on the other.

Since 1989, with Project 2061’s publication of Science for All Americans, the Michigan Department of Education has been pursuing a new framework for science education. This ambitious framework was revised in 2000. State tests in Grades 5, 8, and 11 are aligned to the framework. These tests, required of all students, are demanding: just under 50% of 5th graders pass, 20% of 8th graders pass, and 60% of 11th graders pass. Student incentives are tied to performance on the state test, with a $2,500 college scholarship available to students who pass the 11th-grade tests in all subjects and a $500 award available to students who pass the 8th-grade tests.

Recognizing that assessment alone cannot bring about deep and meaningful reform at the classroom level, Michigan developed a number of curriculum units in science, including Chemistry That Applies and Food, Energy, and Growth. These units are made available to anyone in the state who wishes to use them.In addition, a number of the regional math and science centers have developed complete unit-based curricula for the elementary grades.Over 100 districts across the state use the New Directions Teaching Units distributed by the BattleCreek area center. The motivation for local development of science curriculum units is the inability to find off-the-shelf materials that are sufficiently well aligned to the state framework and test.

Science reform in Michigan is also supported through professional development. Again, the regional math and science centers play an important role, with state funding augmented by Eisenhower and NSF funds. The emphasis is on curriculum-embedded professional development.

The Battle Creek area math and science center reforms are working best in schools with strong principal leadership. Another key to successful reform appears to be careful record keeping relating to teacher coverage of state science framework material. Finally, the use of outside observers to assess implementation has proved useful.

The effects of the Battle Creek reform on achievement are not well documented. Nevertheless, when a cluster of Michigan schools—the Michigan Invitational Group—participated in the Third International Mathematics and Science Study–Repeat (TIMSS-R), students scored extremely well.



The last conference session brought together a panel of three to talk about science assessment and particularly issues of alignment of science assessments to science content standards: Virginia Malone, from Harcourt Publishing; Edward Smith, from Michigan State University; and David Potter, from the South Carolina Education Oversight Committee.

The challenges of building assessments aligned to content standards are sharply different for a publisher and a state. The Michigan educational assessment of progress in science, developed in partnership with Michigan State University, provides one example of how a state can build aligned assessments. At least two aspects of the Michigan assessment are innovative and merit consideration by others:

  1. Use of both framework-wide and area-specific assessment. Michigan's framework-wide assessment covers a representative sample of objectives from the state framework. Area-specific assessment provides a more in-depth assessment of a small set of closely related objectives.
  2. Assessment of student skills in investigation. In the month prior to state testing, all science classrooms in Michigan conduct the same investigation. The state then includes questions about that investigation in its assessment. This type of curriculum-embedded state assessment has great potential—not only for upgrading the enacted curriculum statewide, but also for allowing an on-demand paper-and-pencil test to assess students’ understanding of how to carry out and interpret a science experiment.

Michigan hopes to convey to teachers that intensive work on a particular topic pays off in student achievement and that students need to be writing in science, doing investigations, and engaging in critical reasoning based on science source materials. By taking innovative approaches to assessment, Michigan has ensured that some of the more difficult-to-assess content in its framework is being tested.

South Carolina offers another example of a successful approach to a state building science assessments aligned to their content standards. South Carolina is particularly aggressive in analyzing the alignment of its assessments to its standards. Alignment is a key consideration at every step of the test preparation process, from test construction to field testing to reporting and interpretation of student performance. South Carolina believes that its work on alignment has improved the quality of its assessment and accountability system.

But many states lack the expertise and funds to develop their own assessments. Such states rely on test publishers to supply an assessment product that meets their needs. At the same time, a test publisher must have a product that is both affordable and useful. Typically this has meant developing one set of assessments per academic subject area for a nationwide market. Virginia Malone represented the publishing industry in explaining how her company copes with the challenges between the need for science assessments in each state that are aligned to the state standards and the need to market nationally assessments that are of high psychometric quality yet affordable. While the challenges of marketing an assessment nationwide yet aligned to each state’s content standards are challenging in any academic subject, they are particularly challenging in science, where a very broad range of content might be taught and where prerequisites are often not well established. Malone stated that the test publishing industry is finding new ways to partner with states to provide assessments aligned to state standards. With the No Child Left Behind legislation’s call for science assessment aligned to state standards, the challenges will surely become more pressing over the course of the next few years.


Gary Sykes’ Summary

At the conclusion of the conference, facilitator Gary Sykes summarized what he had learned from the presentations and discussions. For him, there were two “stories” running through the conference—one elaborating on the nature, the complexities, and the measurement of alignment; and the other exploring the role of instructional materials and policies in determining what happens in schools.

A commitment to standards-based reform served as the context for Sykes’ remarks. Standards-based reform—originally thought to be technically easy but politically difficult—has turned out, according to Sykes, to be both technically and politically difficult. A core problem appears to be the desire to cram too much content into the curriculum, resulting in standards that are too broad and textbooks that are too fat. Still, Sykes concluded from the case studies presented at the conference that standards-based reform can work. He urged conference attendees to continue to pursue such reform, not only because success stories exist, but also because the alternatives of school choice and privatization seem even more problematic. Moreover, as standards-based reform is pursued, the focus should be kept squarely on the education of children from low-income families.

According to Sykes, the success of standards-based reform depends on doing a number of things well. In particular, the failure to make any of the following a strong part of a standards-based reform effort represents a fatal flaw:

  • Alignment. According to Sykes and as seen in the case studies, alignment is complicated and difficult, requires considerable interpretation, yet must be understood by both teachers and principals. Alignment is needed (a) among the instruments of policy (e.g., standards, assessments, instructional materials, professional development) and (b) between state instruments of policy and instruction as delivered in the classroom.
  • Professional development. Professional development must be embedded in aligned curriculum materials. The professional development should focus on content and the manner in which students learn content, teachers’ use of curriculum materials in classrooms, and the involvement of teachers in helping other teachers. Duration of professional development must be sufficient (both in contact hours and across months). Sykes contended that there is much too much low-quality professional development in K-12 math and science, and he called for a moratorium on all weak professional development.
  • Time. Sykes expressed concern that we are trying to incorporate too much content into too little time, especially at the middle school and high school levels, and that at the elementary school level, we are devoting too little time to science and perhaps even mathematics.
  • Leadership. The Baltimore and Pittsburgh case studies brilliantly captured the value of leadership at both district and school levels.
  • External assistance. From the case studies, Sykes concluded that help from a local university was a common ingredient to success.
  • Funding. In all of the cases presented, the reforms were supported in part by “extra dollars,” money beyond the school’s and district’s regular budgets. In particular, NSF leadership and support were present in virtually every case. Thus, it appears that outside funding is a key ingredient to success. On the other hand, Sykes concluded that money is available, and so perhaps may not really be a problem.
  • Strategy. Each successful case laid out a strategy in advance, typically in the form of a 5- to 10-year plan. Strategies involved careful selection of policy instruments and a thoughtful approach to phasing them in over time. However, Sykes noted one issue without a definitive answer—namely, how scripted strategies should be for teachers. He also noted that, although research and development were a part of each strategy, there is a need for higher quality evaluations against student achievement data, a concern expressed at the beginning of the conference by David Cohen and recognized as a shortcoming in virtually every case.
  • Politics. In Sykes’ words, standards-based reform should not be “too avant-garde.” He worried about a possible backlash to “fuzzy math.” He also worried about the difficulties of a revolving door in district superintendent and school board positions. Sykes favored standards-based reform that is focused on content, not too controversial, and supported by stable leadership. Stable funding may be an issue as well.

As a final thought, Sykes observed that high school reform was missing from the conference agenda.


In Closing

The case studies presented at the conference make clear that standards-based reform, to be successful, must go well beyond ambitious content standards and high-quality, aligned assessments. High-quality, aligned curriculum materials must also be readily available. Professional development must be embedded in the curriculum materials, require a significant and sustained investment of time by teachers, and use teachers to teach teachers. Leadership was seen as essential in the conference case studies, at both district and school levels. External agents played an important role, as well; the National Science Foundation may have been the impetus for many, if not all, of the reforms, and certainly provided some of the funding. Finally, universities typically played an important collaborative role, providing expertise and support.

There were a number of things missing from the cases, as well. First, although one should be careful when generalizing from so few cases, mathematics was more visible than science. Similarly, elementary school reform was more visible than high school reform. A parent component was mentioned in many of the cases but seemed poorly planned and weak. Not one mention was made of comprehensive, research-based, whole-school reform. Surprisingly little was said about accountability: There was some evidence of accountability for students sometimes and for schools and teachers at other times, but accountability was not built into the reforms in the way called for in the No Child Left Behind legislation. One cannot help wondering how these reforms will adjust to the new Title I requirements. Finally, although every case had a research and evaluation component, none was as strong as one would wish. Comparison groups were often lacking, and baselines were not well established for across-cohort trend data. Longitudinal student data was nonexistent. Analyses were descriptive; the reform initiatives did not include sufficient controls to provide convincing arguments of cause and effect.

Although much remains to be done, participants left the conference clearer about the future agenda, excited about what can be accomplished, and inspired by the importance of the work.