Adding Up: Phyllis Rosser and the Gender Politics of Standardized Math Tests

By KJ Shepherd

Between 2016 and 2017, over 1.7 million high school students took a revamped version of the SAT. On average, girls scored over 20 points lower than boys on the math section. Even before the College Board released its official data, however, the new SAT had already been accused of gender bias for underpredicting girls’ college readiness, especially in quantitative fields. In fact, calls for rectifying the gender bias in standardized testing are now at least thirty years old, brought to national attention by Phyllis Rosser during the 1980s, yet this bias continues to harm girls’ college, career, and financial prospects.

Although Rosser wasn’t the first to investigate why girls scored lower on standardized math tests, her work received congressional attention and provided a foundation for successful challenges to testing policies. Rosser’s efforts also highlighted the extent to which colleges trusted standardized test scores more than girls’ classroom achievements.

A sculptor by trade, Rosser first engaged with the idea of sex bias in standardized testing as a contributing editor to Ms., writing the article “Do SATs Shortchange Women?” for the magazine in 1985. Later, in her roles directing the Equality in Testing Project and consulting for FairTest, Rosser demonstrated in The SAT Gender Gap: Identifying the Causes that “sex bias” in standardized testing described several related phenomena. Standardized tests created sex bias by underpredicting women’s collegiate performance. Educational institutions exacerbated this bias by failing to account for this underprediction when making admissions decisions. Standardized tests could also exhibit sex bias through their content and context. Test-makers disadvantaged girls by using reading comprehension passages and word problems that depicted women in stereotypical roles, and by relying on scientific and mechanical information that boys had been socially conditioned to favor. This surface-level material could hamper girls’ ability to focus on the underlying logic and reasoning skills that these tests purported to measure. High school-aged girls were, thus, shortchanged by the SAT through both its design and interpretation.

During the late 1980s, Rosser conducted an unprecedented analysis of SAT questions to demonstrate that the College Board and Educational Testing Service (ETS) did far from enough to reduce the math gender gap in their premier testing program. Working with the Princeton Review founder John Katzman and the sociologist James Loewen, Rosser analyzed the SAT performances of roughly 1,100 students. Girls consistently scored worse on the math portion of the SAT, even though they performed just as well as boys in their high school math courses. Boys correctly answered 10 of the 60 math questions more frequently than girls by a margin of at least ten percent. This discrepancy led Rosser’s team to wonder why ETS hadn’t removed these questions during its cultural sensitivity vetting process, especially since the organization edited the SAT’s verbal content in the early 1970s to eliminate girls’ slight advantage on that section.

Rosser’s research also served a vital role in a 1987 US House Judiciary hearing on the role gender and racial bias played in standardized testing. Rosser testified before the Subcommittee on Civil and Constitutional Rights that the millions of SATs, ACTs, and PSATs taken each year created systemic bias against women in higher education. As Rosser argued in her Congressional testimony, if the SAT were truly an accurate predictor of college-bound women’s first-year grades, “girls would score 20 points higher than boys, rather than 61 points lower” (3). Girls of color were penalized even more by these tests. This chronic under-prediction shut girls out of higher-ranking schools and robust scholarship competitions, eventually causing “a real dollar loss for females in later life, as they get less prestigious jobs, earn less money, and have fewer leadership opportunities” (4). A test with weak predictive capabilities on its own often created markedly different life paths for boys and girls.

Rosser’s testimony resonated with Colorado Representative Patricia Schroeder, who observed that her sixteen-year old daughter’s peers had already fallen prone to the disorienting effects of surprisingly low SAT scores compared with strong math grades, worrying “if they had charmed their high school teachers, if maybe suddenly they’re not as good as they used to be” (61). For Schroeder, her daughter’s and peers’ sense of fraud compounded the damage the SAT scores did to the girls’ collegiate ambitions. Girls who dismissed their actual academic accomplishments because of the SAT’s faulty measurements may have preemptively shut themselves out of fields of study in which they already displayed promise.

The subcommittee’s Republican minority counsel, Alan Slobodin, remained dubious. In the hearing, Slobodin challenged Rosser to define when exactly test bias began—“I mean, if there’s a one point difference, would you consider that bias? How about five points?” (67). He wondered why Rosser did not object to women’s slight score advantage on the law school admission test (LSAT). For Slobodin, the idea of sex bias made little sense considering how drastically certain postsecondary student bodies had changed over the past decade to increasingly include women; test scores didn’t seem to have held those women back. Yet, Slobodin’s objections missed the point that standardized testing was not solely about admission to college. Those results often determined scholarship opportunities, college possibilities, and major and career choices, thereby affecting lifelong earnings and networking potential.

The College Board and ETS officials interviewed by the subcommittee also countered Rosser’s argument by insisting that standardized testing opened more collegiate pathways for young women than any other method. The test-makers maintained that the SAT gender gap reflected an expanding number of girls who took the test between the mid-1960s and the early 1980s. These test-takers had, on average, poorer educational opportunities than boys; their lower SAT scores reflected their limited college readiness. As part of its formalized Sensitivity Review Process, ETS had developed “Special Review Criteria for Women’s Concerns.”. By eliminating references to “women doctors” and uses of “generic he,” for example, the SAT’s developers believed they had removed factors within their own control that could skew girls’ test scores. Anything else was a matter of students’ lack of preparedness or admissions officers’ misuse of scores, they claimed. By pointing to their internal review protocols, ETS deflected any responsibility for how its massive testing programs limited the educational and economic opportunities for a new generation of college-hopeful girls.

Despite the College Board and ETS’s claims of fairness, young women filed suit against New York’s education department during the late 1980s and changed how SAT scores factored in statewide scholarship programs. The plaintiffs in Sharif by Salahuddin v. New York State Education Department alleged that the state’s use of SAT scores as the sole determinant for its Regents and Empire Scholarships violated girls’ Title IX rights and constitutional guarantee to equal protection. Judge John M. Walker agreed, reasoning that these violations occurred even if no discriminatory intent existed. Taken alone, the SAT did more harm than good as a tool for assessing merit.

Although women now outpace men in the number of postsecondary degrees they attain, they remain relatively underrepresented in mathematics, computer science, engineering, and physical science degrees. Women remain even more marginalized in STEM-related employment. If standardized tests play a role in preventing women’s access to esteemed educational institutions and prized fields of study, they also hold a deleterious effect on women’s economic power.

Surely, as Rosser pointed out three decades ago, there are measures that can be taken to make the SAT and other similar tests more equitable assessments. But rather than back away from tools that promote educational disparity, the American education system has increasingly bound itself to standardized testing of all sorts. Girls continue to be set up for markedly different life paths than boys by standardized math tests that fail to add up.

Further Reading

Phyllis Rosser, The SAT Gender Gap: Identifying the Causes (Washington, D.C.: Center for Women Policy Studies, 1989).

U.S. House Judiciary Committee, Subcommittee on Civil and Constitutional Rights, 100th Congress, 1st Session, Sex and Race Differences on Standardized Tests (Washington, D.C.: U.S. Government Printing Office, 1989).

KJ Shepherd recently finished a PhD in History at the University of South Florida. They are a historian of standardized testing and American culture and the Social Media Editor for Lady Science.

Lady Science is an independent magazine that focuses on the history of women and gender in science, technology, and medicine and provides an accessible and inclusive platform for writing about women on the web. For more articles, information on pitching, and to subscribe to our newsletter, visit ladyscience.com.