Implementing NCLB Assessment and Accountability

This paper was presented at the Tenth Annual Education Law Conference, sponsored by the Department of Education, University of New England and Education Law Programs, Franklin Pierce Law Center. Portland, Maine, July 29-31, 2003.


Schools are not perfect, and laws are not perfect. The No Child Left Behind Act has state departments of education scurrying to become compliant, testing companies gearing up to handle vastly increased volumes of testing, and schools wondering how they can reach the lofty goal of one-hundred-percent-proficient students. Viewed by some as the federal government overstepping its bounds in an area that is a responsibility of the states according to the Tenth Amendment, drafters and other supporters of this recent reauthorization of Title I see it as a businessman’s no-nonsense, just-do-it approach to raising student performance levels, which have shown little change for decades.

Despite the impossibility of some NCLB requirements, state education officials and local educators are making significant efforts to comply with the law for two reasons. First, NCLB is the law of the land, and the consequences of noncompliance (in terms of the creation of an accountability system and meeting performance targets) are severe. Second, for many, the Golden Rule applies: those who have the gold rule. With state budgets in dire straits because of a failing economy, state officials cannot afford to pass up the federal monies associated with Title I initiatives.

It would be unfair to adopt a totally cynical point of view in characterizing the motives behind passing and complying with NCLB. No one disagrees with the notion that there is room for significant improvement in student performance, and there are many—educators and non-educators alike—who are dedicated to that end. Nevertheless, it is important to examine the imperfections associated with the law itself, the mindset of the political figures behind the legislation, the implementation of it, and the actions taken by states to comply with it.

We should also keep in mind that there are some very positive aspects of the legislation worthy of mention. The work done on state content standards (actually required under previous federal initiatives but given greater emphasis under NCLB) has done much to clarify for educators, parents, and students what it is students are to know and be able to do. The NCLB emphasis on all students is making schools accountable for subgroups of students previously excluded from accountability systems. Finally, NCLB has been the impetus for a great deal of activity in areas such as professional development, the development of new Products for instruction and assessment, and educational research.

The Requirements

Assessments

By the year 2005-06, states must be testing all students in reading (or English language arts) and mathematics in grades 3 through 8, plus one higher grade. Until that time, they must be testing these subjects in three grades: one elementary grade (3, 4, or 5), one middle grade (6, 7, 8, or 9), and one higher grade. (This is consistent with the 1994 Title I legislation.) By 2006-07, science must be assessed in three grades within the same ranges. The assessments must be closely aligned with state content standards. While local (school or district) testing components may be used to address NCLB requirements, such an approach may not be readily approved because of the requirements of technical quality and comparability. Results must be reported in terms of percentages of students in at least three performance categories, and the results must be disaggregated by subgroups based on gender, race/ethnicity, poverty level, English-language proficiency, and disability status.

Adequate Yearly Progress

Schools and all required subgroups within them must show adequate yearly progress toward the ultimate goal of one hundred percent of students proficient in 2013-14. The interim targets are simply one-twelfth of the way toward one-hundred-percent proficiency, with the state’s 2001-02 percents proficient as the baselines. States may choose to report annual progress in terms of change from one year to the next, or in terms of change in consecutive, three-year cumulative averages. They may aggregate school and school subgroup results over grade levels.

If a school fails to meet its adequate yearly progress (AYP) target for two consecutive years, then it is designated as a school in need of improvement. Parents of students in a school so designated will be given the option of sending their children to another school. Continued failure of a school to attain AYP targets beyond two years can result in more severe consequences, to include restructuring or changes in governance. There are many more details to the accountability requirements pertaining to such things as inclusion rules and various situations, such as schools meeting their targets but with not all subgroups meeting them.

Major Problems

One-Hundred-Percent Proficiency

Projections in many states are that as many as eighty-five to ninety percent of their schools will ultimately be designated as “in need of improvement.” These are reasonable projections for several reasons. Many states have already been testing at three isolated grades (one elementary, one middle, and one high school grade), thereby addressing the requirements of the 1994 legislation. With such a system, results at the school level fluctuate considerably from one year to the next. This is because the impact of year-to-year variation in the capabilities of the students passing through a tested grade is far greater than that due to any general improvement that might be expected in an instructional program. This fluctuation is particularly pronounced in small schools in which the tested students in a year are a smaller sampling of the student population served. Thus, with this isolated grade model, even in schools making reasonable instructional improvement, resulting improvements in achievement could not be detected. Targets will be met or not met simply because of sampling variance. Even the use of three-year cumulative averages does not eliminate the problem. The difference in two consecutive years’ three-year cumulative averages is still due to the difference between two classes of students: the one being deleted from the computation and the one being added to it. Cumulative averages only lessen the magnitude of the fluctuations in results. Thus, most schools in states meeting the requirements of the law (in terms of the grades tested before 2005-06) by testing isolated grades, should not only expect that they will not meet interim targets—they should expect their results to show improvement some years and declines in other years, no matter how much effort they put into instructional improvement. This problem will be lessened only when more grades are tested and their results aggregated—2005-06 in many states.

Once all seven NCLB grades are tested in a state, another phenomenon, one associated with the aggregation of percents proficient computed separately by grade, will still make progress toward one-hundred-percent-proficient students difficult. To the extent that there is new content covered and new expectations for students at each grade level in school, a school is “starting from scratch” each year. Thus, to some extent, progress toward one-hundred-percent proficiency is not cumulative. There are many factors that might inhibit school improvement, and generally such barriers should be addressed. However, this non-cumulative component of improvement is one that cannot be eliminated and may put a limit on how far toward one-hundred-percent proficiency is possible.

Finally, one must question whether one hundred percent proficiency is a reasonable goal for all subgroups of interest. Drafters of NCLB have stated that nothing less than 100 percent proficiency would “play” politically. How can we tell some parents their child cannot be proficient? 1 Given how high some states have set their standards for proficiency, maybe the problem is the use of the word “proficient.”

Consequences. Years before NCLB was first conceptualized, many states were already reporting test results by performance levels. Some were even using performance level names such as “Proficient” or “Advanced.” However, this does not mean that those words had the same meanings across states. Some states were testing lower-level basic skills, while others were placing great emphasis on higher-order thinking skills. Some were setting cut scores so that perhaps seventy-five percent of the students were in those top two performance levels, while others set cut scores so that as few as twenty-five percent were proficient or above.

As a result, there are states reporting high percentages of proficient students based on their own assessment programs, but showing performance on the National Assessment of Educational Progress (NAEP) that is considerably lower than that of states that report low percentages of proficient students based on their own assessments. Despite the different program purposes and emphases and different meanings of performance level labels, federal officials have praised states reporting high percents proficient on their own assessments and publicly denounced higher performing states that have considered lowering their cut scores to make them more reasonable given the new purposes and uses of their assessments and assessment results. These latter states must do what is right and just hope that this blindness of some federal officials is only temporary.

States have found a variety of ways to identify fewer schools in need of improvement.2 Texas and Michigan are simply lowering the scores students need to achieve to be designated “proficient.” Texas was originally criticized for having tests focusing on low-level skills and with low standards for proficiency. Having created more challenging tests, the state found few students passing and therefore, decided to lower cut scores. Thus, Texas now finds itself in the same predicament as states whose pre-NCLB programs had high achievement standards for proficiency, states such as Wyoming, Michigan, and many states in the Northeast.

Colorado has determined that students in its old “partially proficient” category will be considered “proficient” for NCLB purposes. Ohio’s plan for Adequate Yearly Progress calls for lesser gains in the early years and larger gains as 2013-14 is approached. Thus, Ohio, like other states, is hoping reason will ultimately rule and some requirements of NCLB will be modified. 2 Massachusetts uses an accountability index, rather than percent proficient as its basic reporting statistic. The index equals the sum of weighted proportions of students at the various performance levels. The proportion proficient or above is multiplied by 100, while the proportions in lower levels are multiplied by lesser weights. This system rewards shifts to higher levels that are still below proficient and is particularly useful for states with high standards.

The Same Standards

The previous section described problems related to NCLB’s flexibility in allowing states to use their own achievement standards. The problems discussed in this section are associated with NCLB’s requirement that each state hold all students to the same content and achievement standards.

One of the very positive aspects of NCLB, and in fact, the Individuals with Disabilities Education Act of 1997 (IDEA ’97), is the requirement for schools to be more accountable for students with disabilities and to require these students to meet more rigorous academic standards than in the past. A result of these requirements that has emerged as states and their schools have been addressing them is the realization that even students with moderate to severe disabilities can attain academic knowledge and skills. Special education programs need not be viewed as babysitting. Nevertheless, the NCLB requirements for the same standards and tests for ALL students are quite unrealistic.

Consequences. With respect to content standards, many states have “extended” the content standards they developed for the general population of students. This means that for a special subpopulation, they have created content standards that consist of prerequisite concepts and skills, the attainment of which would allow these students to gain “entry” into the more rigorous academic standards for the general student. The USDOE has allowed this approach. This is one of many examples of how the enforcing agency can help educators overcome some of the shortcomings of legislation. Of course, the achievement of many students with disabilities is measured by alternate assessments—also required by law. However, NCLB wording assumes these assessments would be measuring the same standards as the general assessments. They are not.

States are using different approaches to aggregating the results of their students with disabilities with the results of their larger general population of students. Some are attempting to put the scores of alternate assessments on the same scales as those of the general assessments. This approach results in almost all these students being placed in the bottom performance category. In truth, this situation is unlikely to change, thus making 100-percent proficiency an even more unrealistic goal than other factors described earlier.

Another approach used by many states is to set achievement standards (cut scores) independently for the alternate assessments. While this causes “proficiency” to have a different meaning for students taking alternate assessments, from a program evaluation perspective, counting proficient alternately assessed students in with proficient general students makes sense. Progress of students who may otherwise never escape the lowest performance category should be rewarded. That is, both the students and the school programs should be credited with such successes. Again, this approach has been approved by the USDOE.

Inadequate Yearly Progress and Sanctions

In February 2003, twenty testing company executives were invited by Secretary of Education Paige to a meeting in Washington. The sole purpose of the meeting was for USDOE officials to urge testing companies to significantly reduce turnaround times for results. Old state accountability systems (particularly those testing in isolated grades) recognized that several years of data had to be aggregated to determine school performance status, let alone change. Thus, it was acceptable that a few months pass from the time tests were administered to when results were reported. Now, because of the NCLB requirement that annual results must be available for parents to use to make school choice decisions, weeks rather than months are what is allowed for producing test results. Parents must know in the summertime if their children’s school is identified as needing improvement. Yet with the increased need for school test score files to be complete and categorizations of students accurate, data processing and file clean-up demands are greater.

Consequences. The result of the demand for quick turnaround of results in many states (e.g., Utah, Georgia, Maryland) is that the use of constructed-response questions has been discontinued so that time is not needed for the human scoring of written responses. (This action is a response to time pressures, not technical quality of data. Constructed-response questions are now scored accurately enough to contribute significantly to the overall reliability of test scores.)

A lesson learned in the 1980s and 1990s was that the higher the stakes associated with test results, the more the test content and format influenced curriculum and instruction. This could be good or bad. One potential problem, depending on the nature of an assessment instrument, was the narrowing of instructional focus, in terms of both content and cognitive processes. Several researchers suggested strongly that heavy reliance on multiple-choice format in high stakes programs negatively impacted instruction and ultimate outcomes. 3, 4, 5 Because of the growing dissatisfaction with the commercial standardized tests at that time, curriculum specialists significantly influenced the nature of states’ customized tests. Greater use of constructed-response questions was the result. (Also, extended performance tasks and portfolios became more widely used.) Such instruments are more direct measures of many of the competencies educators want students to achieve, particularly many higher-order skills.

Some states have “stuck to their guns” with respect to constructed-response testing. They have found ways to adjust their testing schedules, and their contractors have found ways to score, analyze, and report faster so that NCLB deadlines can be met. Other states, using more traditional tests in some grades, are working to develop local assessment systems that allow for a variety of assessment types to be used in the “off grades.” We will have to wait to see if such approaches meet the approval of the USDOE.

Another Issue: Security. In the late 1980s and 1990s, we learned another important lesson about testing for accountability purposes. With the passing of reform acts and accountability laws, many states made use of readily accessible off-the-shelf tests that were developed for purposes other than school accountability. The annual reuse of such tests for high-stakes accountability purposes led to security problems that no doubt contributed to inflated results associated with the “Lake Wobegon effect.” Some states chose to avoid this problem by developing their own customized tests that were replaced annually. (Of course, released test questions are quite useful for results interpretation and other instructional uses.) With the more recent requirements for and emphasis on states’ unique content standards, customized tests offer the added advantage of better alignment of tests to content standards. NCLB allows states to use off-the-shelf products, but recognizes their shortcoming with respect to coverage of standards by suggesting that they need to be augmented with items filling gaps in coverage. Thus, NCLB, in allowing states to use instruments that are not changed over time, has ignored the issue of security that led to a major scandal in the testing industry a few years ago.

Commentary

Obviously, education is always an important, safe, and fruitful political issue. Everybody can relate to it, and it is easy to find fault with current levels of student performance. This has increasingly become the case in recent years with the emphasis in reporting on percentages of students at various performance levels, as opposed to normative results. With dissatisfaction with current performance a reasonable position to take, finger pointing at educators naturally results. Throughout the drafting and revising of the house and senate bills, distrust of educators was considerable. Conference committee members and their education advisors received a great deal of information about problems with the assessment and accountability provisions, but this input was treated as educator excuse making. Contributing to this situation was legislator distrust of the USDOE as well. Remember that the requirements of the new legislation for the first few years are not drastically different from the requirements of the 1994 Title I authorization. Lawmakers felt that the USDOE did not start cracking down on non-compliant states soon enough. The politicians’ solution to the lack of progress in education was a simple business solution: mandate improvement and throw a lot of money at the problem (much of which is directed at overmeasuring the extent of the problem). Of course, this solution, attempted at various levels in the past has not worked. In this case, the NCLB accountability requirements guarantee that it will not work. When “adequate” progress is not made on a large scale, will educators be blamed? Will lack of progress provide the impetus for a move to vouchers and charter schools? Unfortunately, the lawmakers have not uncovered the secret to raising performance levels, certainly to the extent required by NCLB. Mandating it is not enough. Changes in governance have not worked either.

The No Child Left Behind Act of 2001 was passed in the aftermath of the 9/11 tragedies. Passing the bill was important at that time. Given President Bush’s high approval rating associated with his response to the terrorists’ attacks that touched all our lives, and given the need to demonstrate an ability to tend to other domestic issues, the No Child Left Behind Act met with little resistance. Who could argue its intent and the “just-do-it” mentality associated with it? Even educators, though recognizing the serious flaws in requirements, chose to take the position that NCLB presents us with an opportunity for school improvement. Thus, they generally have rallied to become compliant as best they can. This is the proper response, because NCLB is the law of the land. Now with some regulations and guidelines and a year of great effort behind us, the NCLB flaws are staring us in the face. It is just a matter of time before the assessment and accountability requirements change or fall apart.

It is especially important at this time that educators keep their focus on the primary goal of their profession—doing what’s best to educate young people. Even though 100 percent proficiency is generally impossible, compliance with the law in all other ways can still be accomplished by programs that not only do no harm to students but that are good practice based on what we have learned over the years. This is not a time to succumb to some of the pitfalls that might be associated with NCLB. The unreasonable requirements will eventually have to change.

References

1. 2001-03 conversations with education advisors working on the NCLB legislation in conference committee

2. Dillon, S. (2003). States cut test standards to avoid sanctions. New York Times, May 22, 2003.

3. Shepard, L. (1989). Why we need better assessments. Education Leadership, 46(7), 4-9.

4National Research Council. (1989). Everybody counts: a report to the nation on the future of mathematics education. Washington, DC: National Academy Press.

5. National Council of Teachers of Mathematics. (1991). Mathematics assessment: myths, models, good questions, and practical suggestions. Reston, VA: The Council.

© Copyright 2003, Measured Progress, Inc. All rights reserved.