lafa.gif (1903 bytes)
Home

More
SOL
Testing

40th
Reunion

'70
Reunion

'90
Reunion

Join Now

Email Us

Links

Telling
Tales

Photo
Albums

Lancer
Memorial

Alumni Directory

50's

60-63

64-65

66-68

69

70-72

73

74-76

77-79

80

81-82

83-84

85-89

90

91-93

94-99

2000-13

 

 

 

 

 

 

 

 

NytArticleHeader.gif (1794 bytes)
May 21, 2001
NONE OF THE ABOVE / Second of two articles                        First of two articles

When a Test Fails the Schools, Careers and Reputations Suffer

By JACQUES STEINBERG and DIANA B. HENRIQUES

Sitting in his cramped office in Fort Wayne, Ind., with his calculator running, John Kline became the first to suspect that a major test publisher had erred in computing the standardized test scores of thousands of his students.

As testing director for the local school system, Mr. Kline quickly alerted the company, CTB/McGraw-Hill, but it did not fully investigate his complaint at the time.

If it had, CTB would have discovered a crippling programming error in time to prevent it from upending the lives of students, parents and educators as it rippled across the nation over the first eight months of 1999. This mishap, the most far-reaching in the recent history of school testing, jolted school districts in at least six states, including New York City, where it mistakenly sent nearly 9,000 students packing off to summer school.

A post-mortem of how this error spread unimpeded for so long lays bare a basic truth of standardized testing: school districts lack the ability to uncover serious testing errors on their own, and must rely on the testing companies to do so voluntarily.

Because the testing industry has succeeded in fending off various proposals for federal oversight, the companies themselves decide what they will disclose and when.

CTB's error hit hardest in New York City, the nation's largest school system. Apart from the children, the most prominent victim may have been the city's schools chancellor, Rudy Crew. The error showed — incorrectly — that reading scores citywide had stagnated after rising for two years, raising questions about Dr. Crew's leadership. Within months, he was out of a job.

Before the mistake was discovered, Dr. Crew had been a leading advocate for using standardized tests to hold students and educators accountable. But now, as Congress is poised to vote on a presidential proposal that would sharply increase the nation's reliance on standardized testing, Dr. Crew says he has been chastened by his personal experience with the testing industry.

"The answer is not to use test scores as the sole source of information about a student's performance," he said. "These are human errors. They're going to happen again."

The issue, then, is how the test companies handle mistakes once they occur, educators say. A New York Times examination of CTB's error shows that the company had been warned repeatedly by testing officials in Indiana, New York City and other districts that their percentile scores seemed wrong. While CTB told each not to worry, the company did not mention the other complaints.

Then, after finding an error, CTB officials waited seven weeks before passing that critical information on to New York City and other school districts.

When told of these findings, Dr. Crew, who begins work next month at an education foundation in San Francisco, expressed disappointment and anger.

"What CTB did was lie," he said.

CTB officials say they did their best to uncover a deeply imbedded software problem. Once the problem was located, the officials say, they did not immediately alert any school districts because they wanted to be absolutely sure of the damage it had caused.

"It was hard to see this," David M. Taggart, the company president, said. "But, and I think this speaks to the integrity of our company, we didn't stop looking."

Robert Tobias, the longtime testing director in New York City, does not accept the company's explanation, particularly in light of the early warnings that CTB received.

"They clearly did not check carefully enough," he said. "It's that simple."

Dr. Crew sees a broader problem. "The largest testing companies are guilty of what most people accuse public schools of," he said. "They've actually got a monopoly."

In Indiana: The First Indication of a Costly Error

CTB has its headquarters in a tan fortress perched atop a hill overlooking California's idyllic Monterey Peninsula. Founded in 1926 by a Los Angeles public school official and his wife, CTB grew into an industry giant after being acquired in 1965 by McGraw-Hill, a financial information and publishing company.

CTB's biggest rival, NCS Pearson, might score more student tests — about one in every two nationwide — but CTB is an industry giant, too, providing test design as well as scoring. By 1998, nine million students were taking CTB tests annually, about 40 percent of the market.

Each spring, answer sheets descend on Monterey like a steady rain, with postmarks from as far away as American military bases in Japan. Once scored, the results are shipped back to the schools in boxes full of numbers that are regarded as the definitive educational measure of children and teachers and schools.

Though CTB's work is widely praised by educators, the company did make two errors in 1998: one resulted in wrong math scores for a number of Missouri school districts; the other affected the math scores of a small number of Florida students who took the company's tests.

Still, as the 1999 testing season began, CTB was the envy of the testing industry. The company could claim nearly 20 states as customers, all under contract for several years.

Indiana was one state that believed in CTB, hiring the company to test about 320,000 students in grades 3, 6, 8 and 10. But when Mr. Kline, the testing director in Fort Wayne, got his district's scores in early 1999, he saw that they had plunged unexpectedly.

"I felt sick," he said. "How am I going to explain it to the superintendent?" Although Indiana did not use the test to promote students, as many states do, the scores gave politicians and educators a yardstick to measure student progress. Bad test scores, Mr. Kline knew, would echo through the city like a tornado warning, causing parents to worry and teachers to wonder what they had done wrong.

Before releasing the bad news, Mr. Kline called half a dozen other testing directors to see how they fared. To his surprise, each described nearly identical drops in scoring. "It was almost unbelievable how similar the patterns were," Mr. Kline recalled.

 

It did not make sense, Mr. Kline thought, for so many students in so many places to fail by nearly the same margin. So he called the testing company.

CTB officials were not particularly alarmed to hear Mr. Kline's complaint, because they knew that when test scores drop, the first and easiest reaction of school officials is to blame the test.

But CTB did agree to look into Indiana's scores, and within days it found a problem. In trying to compare Indiana students with the rest of the country, CTB had used an old formula. When the problem was fixed, most student scores rose, some as much as 10 percentage points.

But Mr. Kline still was not satisfied. He and his colleagues told CTB that the error did not account for other large, unexplained drops. "Our feeling was, `There is still more to it, there's something out there that no one's been able to explain,' " he said.

By now, Mr. Kline had come to suspect that the scoring drop could be traced to an arcane area of test design called equating.

This process is necessary so scores one year can be compared with those from previous years, even if different questions are used. States ask for new questions because they are worried the old questions will leak out.

CTB told Indiana that its sophisticated software program had insured that the current test was comparable, or equated, to the previous year's test. But just to be sure, the company agreed to take another look. This time, the company said it found nothing wrong. "Our confidence in the accuracy of the equating was reconfirmed," CTB told Indiana in a memorandum on Jan. 18, 1999.

CTB even sent its president, Mr. Taggart, to Indiana in early March, to personally assure educators that the test scores were solid. In a follow-up letter, though, the company said it was developing "procedures to improve quality control in the future."

Reluctantly, Fort Wayne distributed the results to its schools, but not before Mr. Kline had ordered them stamped: "May contain inaccurate scores."

Then, with no options left, Mr. Kline gave up, assuming he had heard the last of the matter.

In New York: Unearned Tickets to Summer School

In April, about the time Mr. Kline was conceding his fight, 300,000 students in New York City's public schools were taking their reading and math tests in grades 3, 5, 6 and 7. Those tests, too, were designed by CTB. And though many of the multiple-choice questions were different from Indiana's, both school systems drew some of their questions from the same versions of the company's flagship test, Terra Nova.

But the New York City Board of Education and its chancellor, Dr. Crew, had decided to attach a much greater value to CTB's tests than Indiana did. For the first time that spring, students in grades 3 and 6 were required to pass CTB's test, or attend summer school. And if they did poorly in summer school, they would be held back.

Making such decisions based on a single test score violates the testing industry's standards, and both CTB and city school officials agree that the company advised the city against putting such a premium on its test. But the board forged ahead anyway.

Dr. Crew raised the stakes not only for children but also for school principals and superintendents of the city's 32 neighborhood school districts. He announced that, for the first time, school officials would be judged by how well their students did on the CTB tests. Those educators whose students scored poorly faced the loss of their jobs.

Dr. Crew's future was also at stake. For two years, Dr. Crew had managed to do something that had eluded his predecessor, Ramon C. Cortines: forge a warm relationship with Mayor Rudolph W. Giuliani. But that was changing. The issue: school vouchers.

Mr. Giuliani said he believed that taxpayer money should help finance private-school tuition for thousands of students who were attending failing public schools. Dr. Crew disagreed with the mayor, and he did so publicly.

So long as test scores kept going up, Dr. Crew felt that he could defend his position. If the scores were bad, Dr. Crew's own job would be on the line.

When the eagerly awaited reading scores arrived from Monterey in early May, Mr. Tobias, the New York system's testing director, was among the first to see them.

The news was not good. As in Indiana, many of the students' scores had dipped sharply from the previous year — so steeply and uniformly as to appear improbable, Mr. Tobias thought. Knowing how high the stakes were this year, Mr. Tobias directed his staff to ask CTB whether it had made a mistake. The company's response, Mr. Tobias recalls, was as swift as it was definitive: "We can't find anything wrong."

Mr. Tobias continued to press CTB, eventually calling the company himself to make an argument the company had already heard: perhaps the tests from one year to the next were not quite equal. No one told him that he was echoing Indiana's earlier suspicions.

Still, CTB held firm. "If we were not comfortable, we would have advised them not to release the data," said Mr. Taggart, CTB's president.

Unsure of what to do, Mr. Tobias held off releasing the results until June 8, the last possible day the scores could be used to make summer-school assignments.

As the date approached, Mr. Tobias finally told Dr. Crew about his doubts. Dr. Crew says he seriously considered calling the press to disavow the results. But as a national spokesman for the movement toward standardized assessment, Dr. Crew decided his credibility would be lost. He thought he would be seen as a crybaby.

Mr. Tobias concurred.

"Errors of measurement are a fact of life in this business," Mr. Tobias said in an interview. "There are times you can explain them. Other times you just bite the bullet and accept the data as they are."

And so, Dr. Crew summoned reporters to deliver the disappointing news: two years of progress in reading had apparently stalled.

The mayor said he was "very alarmed and concerned." And Dr. Crew knew he had some homework to do.

In Tennessee: State Officials Seek Review of Test

Most school districts, including New York City, gauge progress by comparing students in a particular grade with their predecessors in the same grade a year earlier. But Tennessee has long used a more sophisticated approach: it compares a student's test scores as a first grader with that same student's scores as a second grader, third grader, and so on through school.

 

This approach was pioneered and overseen by William Sanders, a longtime professor at the University of Tennessee, who was curious about how class size and teaching styles influenced student performance.

In early May 1999, when Professor Sanders received Tennessee's scores from CTB, he knew from his own data that they could not be right, state testing officials said. The drops were much too sharp.

Again, state officials recall the company saying not to worry — the scores were accurate. But Tennessee had something that Indiana and New York City did not: a treasure trove of data on the performance of actual children going back six years or more. CTB's results broke patterns in individual students' scores that had been uninterrupted for years.

Professor Sanders was so insistent that there was a problem that he told the company he would call a news conference to challenge the results, Tennessee school officials said.

Then CTB did something that it would not do in any other state: it simply raised the comparative rankings of many Tennessee students, and lowered some others, to conform with Mr. Sanders's statistical models — even though the company could find no error to justify those changes.

The company made this adjustment in late May or early June, just as it was assuring New York City that its results were correct.

CTB did not tell any of its other customers what it had done for Tennessee. CTB considers its relationship with each state or district to be confidential, even if the products that state uses are similar to others, said Mr. Taggart, the company president.

Moreover, Mr. Taggart said, CTB's researchers had not yet detected any similarity in the complaints from New York City, Tennessee, Indiana and another state, Nevada, which had contacted the company around the same time. Finding a common thread was difficult, Mr. Taggart said, because each had used a customized version of the same basic test.

But after certifying New York City's results as accurate, and altering Tennessee's results, CTB began to have its own doubts, the company now says. In June and into July, unbeknown to its customers, CTB assigned an army of researchers to investigate its results.

The Results: School Districts Cope With Falling Scores

While CTB stepped up its inquiry, its clients were dealing with the consequences of the test results they had been given.

In Tennessee, the adjusted results were not distributed to teachers and principals until late summer, too late to play their customary role in many districts' decisions on summer school or student promotion.

In Indiana, the districts' very public concerns about the accuracy of the scores led teachers and principals to be wary about how much stock, if any, to put in those numbers. And so, educators there grew reluctant to use the test results to shape their lesson plans.

Nevada had voiced similar concerns to CTB. But state education officials nonetheless moved forward, branding a handful of schools as "inadequate" based on their poor scores. One of them was Cambeiro Elementary, in the shadow of the Las Vegas strip, which was put under the supervision of a state oversight panel and awarded over $100,000 for remedial programs. School administrators felt more than a little humiliation.

"At bowling night and at church," Cenie Nelson, the school principal, said, "teachers were asked by other teachers and friends, `Why would you want to be associated with a school not doing a good job?' "

But nowhere did CTB's scores have more impact than in New York City. Based solely on their performance on the test, Dr. Crew immediately ordered nearly 40,000 third and sixth graders to attend summer school.

"Your child must attend summer school," the superintendent in one district wrote to parents. "We feel that your child would benefit from this enriching experience."

Two weeks after releasing the test results, Dr. Crew took direct control of 43 failing schools, saying he intended to fire many of their principals. He also fired or eased out 5 of the 32 superintendents who preside over the city's neighborhood school districts, citing their failures as leaders as well as their students' test scores.

One of them was Robert Riccobono, then 54, who had brought rigorous literacy programs to one of the poorest districts in the city, No. 19 in East New York, Brooklyn. After four years as superintendent, Mr. Riccobono says, his efforts were starting to bear fruit when Dr. Crew fired him.

"Giuliani was talking tough," Mr. Riccobono said. "Crew felt the need to find victims."

The day after Dr. Crew announced his firing at a news conference broadcast live on local television, Mr. Riccobono attended his son's graduation from high school.

"I felt singled out and embarrassed," said Mr. Riccobono, who had known teachers at the school for a decade. "I was wondering where I had gone wrong."

The Inquiry: An Error Is Found Deep in the Software

While New York City was firing administrators and disrupting the summer vacations of students and teachers, CTB was closing in on evidence that would undermine those very decisions.

The company's focus was again on the equating process, which allows test scores to be compared year over year.

As it turned out, CTB — despite its assurances to Indiana and others — had done an incomplete job of reviewing test data. When a much larger sample was reviewed, a programming error surfaced.

The error had — erroneously — made the current test appear easier than the previous year's. To make the tests equal in difficulty, the computer had then compensated by making it harder for some students to do as well as they had last time. The error did not change students' right and wrong answers, but it did affect their comparative percentile scores.

On July 20, Wendy Yen, then the vice president of research for CTB, walked into the office of Mr. Taggart, the company president, and announced, "We have found something."

Mr. Taggart decided not to tell schools just yet about the problem, because, he says, he did not yet know how bad it was. "Would it be a positive impact, a negative impact, no impact?" Mr. Taggart said.

 

At the time the company found the error, New York City's students were just two weeks into a monthlong summer-school program, sweltering in a heat wave. Even classrooms with air-conditioners routinely registered 90 degrees on indoor thermometers.

Dr. Crew would later say that had he known what CTB knew — no matter how tentative — "we could have corrected the action midstream, and not put families through all that torment."

A month later on Aug. 24, after summer school had ended, Mr. Taggart traveled to New York City to hear, in person, the city's lingering concerns about the spring results.

"We're the largest school system in the country," Dr. Crew recalls saying. "You have got to get this right with us."

Again, Mr. Taggart promised to look into the city's complaints. And again, he did not tell them what he knew about the error.

Mr. Taggart had more to say when he called Mr. Tobias, the city's testing director, on the first day of school, Sept. 9, 1999.

"We have done further analysis into your concerns about the scoring," Mr. Tobias recalls being told. "And we have found a problem."

"It's a small problem," Mr. Tobias remembers the company president saying. "We don't believe it's going to have a huge impact on your scores."

Mr. Tobias quickly did a few calculations of his own.

It seemed, at first, that 3,000 students who had been sent to summer school in June had in fact scored well enough to have spent the summer as they wished. That number eventually grew to nearly 9,000 — almost a quarter of the mandatory summer-school roster.

So much for "a small problem," Mr. Tobias thought.

But the real shock came when school officials learned what the corrected test scores meant for the entire city. Instead of reading scores stagnating over all, the citywide average had actually risen five percentage points — a substantial jump, particularly for an urban school district.

"I was feeling really horribly," Mr. Tobias said. "I realized that what was a bad story last spring really could have been a triumph for the chancellor."

Dr. Crew agreed.

"You've got the mayor and the political people saying you haven't done a damn thing," Dr. Crew said. "This was the beginning of the end for me. You can't go back and retrieve this."

The following week, Mr. Taggart flew back to New York City to tell a packed meeting of the New York City Board of Education that he was sorry. His voice shaking, Mr. Taggart said that CTB had "worked diligently" to find the problem, and had notified New York "as soon as those calculations were complete and verified."

Mr. Taggart also said it was not his company's idea to use CTB's test to decide who had to go to summer school. Even so, he said, "The test itself remains a valid measure of student performance."

William C. Thompson Jr., the president of the board, was disbelieving. "Why would I use your company after this?" he asked.

Two days later, Mr. Taggart appeared at the Indiana Board of Education, where he told a similar story and received a similar reception. It was his second trip to Indiana in six months, and he was armed with his company's third version of that state's test scores.

But this time, the corrected percentile scores virtually eliminated the unexplained drops that had troubled Mr. Kline, the Fort Wayne testing director. "It was just good to know we were right," Mr. Kline said.

Mr. Taggart did not travel to Nevada, but he called testing officials there. Careful readers of The Las Vegas Sun on Oct. 20, 1999, may have noticed the headline, "Cambeiro Elementary School Taken Off Academic Probation by State."

When CTB recalculated the results of the Nevada tests, students at Cambeiro, and another school, in Reno, were found to have exceeded the state's criteria for the label "inadequate." They were, in fact, "adequate."

The school was no longer entitled to the more than $100,000 in remedial money it had been given, but the money had already been spent. A cloud had lifted, but it was hard for the school to tell.

"You can't undo an `inadequate,' " Ms. Nelson, the school principal, said. "It's not something that goes away."

CTB also called Tennessee, with word that it could finally explain the unexplainable dips in its rankings. Now the company could actually correct the percentile scores, rather than simply adjust them to meet what Professor Sanders thought they should have been.

The Future: Most School Districts Have Few Options

When Mr. Tobias first learned of the error, he says, he asked Mr. Taggart if any districts outside New York had been affected. Mr. Tobias was told that was proprietary information.

The press release issued in New York, written by CTB's parent company, McGraw- Hill, mentioned only New York. And a release issued the same day in Indiana referred only to Indiana.

While the company has since confirmed that in addition to Tennessee and Nevada, two other states were affected — Wisconsin and South Carolina — it has refused to identify two other school districts involved, or to say whether the districts ever alerted teachers and parents to the error.

Subsequent audits by Indiana and New York criticized CTB for lax supervision in the research department — the department that had created the error, and then was charged with finding and correcting it. The auditors wrote that managers were only "informally involved in the day-to-day work of subordinates."

Wendy Yen, the CTB official who oversaw the research department, has since left the company to work for the Educational Testing Service, which administers the SAT. Dr. Yen, through a spokesman at her current company, refused several interview requests.

But Benjamin Brown, Tennessee's testing director, said the problem went beyond research: he said CTB's greatest error was in treating each customer as if its problem was isolated, even after the company knew otherwise.

"It'd be like someone holding a barking dog and saying, `This dog won't bite,' knowing he's bitten three neighbors in the previous month," Mr. Brown said.

Mr. Taggart said the company had since installed new quality controls to intercept such an error, and had put its employees through a customer relations course.

The New York City Board of Education voted to renew CTB's contract despite its record, although the board did negotiate financial penalties that totaled $500,000 on a multimillion-dollar agreement renewable over four years. Dr. Crew supported retaining CTB because the city had already spent years working with the company to create tests specifically designed for city students. Also, CTB's competitors had experienced their own quality control problems.

"There was no place else to go," Dr. Crew said.

Dr. Crew did not fare as well as CTB.

On Dec. 23, 1999, a board majority led by the mayor's appointees voted not to extend his contract, saying that after four years he had lost interest in his job.

Though he lamented that no one noticed the city's vastly improved scores, Dr. Crew refused to rehire the superintendents and principals whom he had fired, saying their problems went beyond bad test scores.

But New York State's education commissioner, Richard Mills, disagreed, at least in the case of Mr. Riccobono, the innovative superintendent from Brooklyn. Mr. Mills is taking steps to help Mr. Riccobono, who teaches part time at New York University, get his old job back.

"I suppose I felt vindicated," Mr. Riccobono said. "I am certain that had the correct scores been reported initially, I wouldn't have been fired."

But he says he still bears emotional scars from the experience. After his firing, he applied for at least 30 other superintendent jobs in New York State — and did not get one of them.

"Clearly standardized tests are a valid way of providing part of the picture," Mr. Riccobono said. "But they should not be the ultimate determinant of success."

New York City now uses multiple measures — teacher evaluations as well as test scores — to make summer-school assignments.

Indiana's contract with CTB expires this year, and the state is soliciting bidders. For the first time, the state is requiring bidders to list all errors made over the last two years and to promise, if hired, to disclose any new errors immediately.

The superintendent of education, Suellen Reed, has said she would consider rehiring CTB, particularly if it was the low bidder. But officials in Fort Wayne are not awaiting the outcome.

Mr. Kline and his superintendent, Thomas Fowler-Finn, have instead written their own tests for the district's students, to be administered in grades 3 through 9.

"I still believe in standardized testing," Mr. Kline said. "I just don't think the industry is ready to give us the tests we need."