Folk Etymologies

A sensitive speaker of English could consider removing the phrase “rule of thumb” from their vocabulary. The term has its roots in domestic violence: a British law stipulated that a man could beat his wife provided he used a switch no wider than his own thumb. This was the history referenced by The Elimination of Harmful Language Initiative in a list intended for use in Stanford University’s IT department, which made the rounds online just before Christmas last year. In a column helpfully labeled “Context,” the authors explained why “rule of thumb” should be nixed from readers’ usage. “Although no written record exists today,” the document reasoned, “this phrase is attributed to an old British law that allowed men to beat their wives with sticks no wider than their thumb.” A similar text—this one circulated in 2021 by a student resource center at Brandeis—references the same claim, linking the phrase to domestic violence. The Brandeis list (which, like the Stanford one, was removed from official university websites following backlash) includes a similar disclaimer that “no written record of this law exists today.” But it suggests, nevertheless, that speakers opt for “general rule,” over “rule of thumb.” The Stanford list favors both “general rule” and “standard rule.”

As the hedging about the lack of records hints, there is no evidence that the expression “rule of thumb” has its roots in spousal abuse. In fact, this claim has been consistently debunked by scholars for decades. It’s a folk etymology, and an incredibly persistent one at that, that arises with whack-a-mole insistence as fast as linguists and historians can challenge it.

“Rule of thumb” isn’t the only English idiom haunted by folk linguistic history. Nor is it the only case in which that false history is redolent of past (and present) atrocities: domestic violence, slavery, brutal class inequality. Around the same time the Stanford list came out, a viral tweet thread claimed the phrase "knocked up" has roots in slavery. Periodically, somebody—a Florida politician, for instance—insists that the word “picnic” originated with lynchings in the Jim Crow South (the Brandeis list notes somewhat less apocryphally that the term is associated with lynchings and suggests “outdoor eating” instead). Meanwhile, in the U.K., it’s often claimed that “chav,” an insult for young working-class people, isn’t merely offensive, but that it is also an acronym for “Council House and Violent,” with rudeness in its very roots. Similarly, “pussy” hasn’t just become a slight over time—its source is a truncation of “pusillanimous,” meaning “cowardly.”

None of these sources are real. They’re folk etymologies—rumors, essentially—some relatively new and some perennial. The relationship between “knocked up” and slavery has been disproven, and seemed to stem from a well-meaning misconstrual of (nevertheless severely depressing) primary sources, in particular a joke made by Davy Crockett, of all people. The joke was wildly racist, but it’s not where “knocked up” comes from. “Chav” is likely borrowed from a Romani word meaning “child,” and the “Council House and Violence” source has been identified as a “backronym” rather than a true acronym. Even if “pussy” is now lobbed as an emasculating barb, nobody uses the word “pusillanimous” except to pretend that it’s the source material for “pussy.”

Efforts to clean up language, like the Stanford and Brandeis lists, inevitably prompt conservative accusations of censorship and political correctness run amok. These conservative reactions are, of course, so overblown and so obviously hypocritical that they result in a backlash. On the left, some earnestly argue that such lists are actually good, because it’s easy to make some minor alterations to one’s language for the sake of avoiding dehumanization and cruelty. Others wave away the cleanups as stupid but ultimately uninteresting performative exercises, worthy of brief mockery but otherwise meaningless. Online, apparently inconsequential documents like the Stanford list are subjected to a combination of stark scrutiny and gleeful dissemination, becoming culture war playthings. Both aspects of this culture war drama—the conservative panic and the liberal downplay—elide the most intriguing elements of the lists themselves. While these false histories are circulated by individuals as well as by organized bodies embarking on efforts to alter speech, a closer look at these lists—without judgment or defensiveness—can shed light on our relationship to language; nightmarish pasts; and the long, entangled project of trying to perfect our speech.

The neatness of the lists, in particular, is revealing. Their organization makes explicit the logic behind the horrific-but-false folk etymology as a whole. On the lists, each word or idiom receives a three-step treatment: the vetoed word or phrase in one column, the alternatives in another, and the reason, justification, or history behind the condemned language in the third. More broadly, they are sectioned by thematic category: the Stanford list, for instance, contained headings like “Institutionalized Racism,” “Violent,” and “Gender-based.” This organization conflates three distinct types of potentially-insulting speech, and that conflation is sanctioned and legitimized by most of the backlashes, as well as the responses to those backlashes. First, there’s contemporarily offensive language (for instance, slurs). Then there’s language with sinister but non-obvious roots: it might seem entirely innocent, but a speaker, learning the history of the word or phrase, might elect to not use it any longer. Finally, there’s language with an entirely fictional, but appalling, history. The Stanford and Brandeis lists split their recommendations thematically, but, for our purposes, it would be more telling to instead divide archaeologically: is the supposed linguistic transgression a visible, surface-layer one, or is it located in the linguistic past? Or is it a fiction—accessible only via a labyrinthine etymological rabbithole?

These narratives about everyday language reveal a fear that our language will reproduce the political structures that already determine our realities and a simultaneous desire for them to reflect those structures. Often, these realities are fully visible on the very surface of everyday language, even without convoluted and untrue taxonomies. “Knocked up” doesn’t have its roots in the slave trade, but it carries a tinge of misogyny, and “chav” is, in a U.K. context, an insult that permeates deep into the membrane of the British class system—even if its roots lie in the more mundane word-formation process of borrowing. White people treated lynchings as leisure activities, even if the origins of “picnic” have nothing to do with them. Domestic violence, slavery, and classism are too often omitted or insufficiently plumbed in official histories—that’s clear in the panic around critical race theory, and the frenzy to keep these facts as far as possible from American children. Against this backdrop, there is satisfaction in seeing language as an index of historical truth. Even if the specific linguistic history at hand isn’t real, it seems to vividly express a deeper reality of injustice. Injustices are revealed to be crouching in the corners of our speech, and reasonably horrified readers pledge not to use this or that word or idiom any longer. The following too-late, feeble fact-check appears hyper-literal and know-it-all-ish beside the vivid evocation of historic horrors.

This is a thought process similar to the one that’s often used to explain why women love true crime: yes, the stories are exaggerated, cherry-picked and ill-sourced. Yes, they fearmonger, encouraging racist paranoia and overpolicing. But, the argument goes, women do live with very real and reasonable fears of male violence. And so this genre legitimizes women’s fears, offering a twisted kind of comfort. If you can’t fix the problem, you can at least expose it. If that doesn’t work, then conjure a fiction or a half-truth or an exaggeration, a parable that gets your meaning across with brutal efficiency, whether that’s a true crime podcast or a false etymology.

This desire to see speech and political reality as harmonious parts of a working whole—every injustice reflected in our idioms, clear to anyone who bothers examining them—tracks with the enduring appeal of the Sapir-Whorf hypothesis, which posits that the very structure of a given language shapes, influences, and maybe even limits the cognition of its speakers.

The discipline of linguistics generally rejects the more extreme ends of the hypothesis, favoring what is known as a milder “linguistic relativism” over linguistic determinism. This determinism is embodied in the work of linguist Benjamin Lee Whorf, who posited (absurdly) that the Hopi people, based on their grammar, essentially lacked a concept of time. As far as linguists can tell today, it’s not that language doesn’t impact perception (anybody who has ever given a moment’s consideration to literary style, among other things, could tell you that), but rather that it does so in ways that are relatively subtle and non-dramatic. It’s not that Russian speakers’ eyes see more shades of blue than English speakers’ do, but that, over time, their richer vocabulary for that segment of the color wheel enables better discrimination between those differences in hue.

Still, the Sapir-Whorf hypothesis, in its more extreme interpretations, flourishes in the popular imagination. There seems to be a desire or maybe an inevitable tendency to see language as deterministic, as tracking and shaping reality. The plot twist of the film Arrival, perhaps the most prominent linguistics-focused pop culture product of recent history, hinges on linguistic determinism: a new language imbues its speaker with cognitive and emotional superpowers, renovating their very cognition through the passive presence of its lexis and syntax. Whorfianism has even cropped up in self-help content: in Forbes, readers are exhorted to weaponize something called “Leadership Whorfianism.”

If language can shape and limit thought, these cultural products suggest, it can also be the key to revolutionizing our minds and social systems. Unraveling the past is then a simple exercise in historical linguistics: bring the roots of our speech out into the light, and replace the flawed remnants of the linguistic with better alternatives. In projects like the Stanford and Brandeis lists, those flawed remnants are disproportionately figurative speech, which by definition is entangled and complex, relying upon convention and association, introducing more stops on the route between sign and signified. Meanwhile, the suggested alternatives tend to reject idiom, metaphor, and polysemy. Even if a figurative expression doesn’t in actuality have its roots in slavery, misogyny or classism, it is suspect by virtue of its figurativeness. Language is the problem; language is the solution.

Linguistic determinism is not new. Its appeal seems transhistorical—it thrived, under other names, for hundreds of years before Whorf’s hypothesis about Hopi perceptions of time. Language has always been the problem, and the solution, too. “The dream of a perfect language,” wrote Umberto Eco, “has always been invoked as a solution to religious or political strife.”

Seventeenth-century England offers us the strangest and most elaborated examples. In this period, “natural language,” with its shifting meanings and its arbitrariness, its fanciful metaphors, seemed at fault for the unrest and violence of a civil war. Thomas Hobbes’ reasoning in the 1651 text Leviathan rhymes peculiarly with contemporary suspicion of idiom and figurative language. Hobbes’ vision was for a scientifically precise, socially useful tongue, which would have each utterance convey just what it appears to convey in perpetuity. Leviathan condemns “inconstancy of signification” and metaphor as “abuses of language” (it should be noted that removing metaphor from speech is harder than it might sound, and in fact, Hobbes himself uses plenty of it in Leviathan. Figurative language is such a common source of semantic change that it lurks in speech we think of as quite literal).

He detested paradiastole, the rhetorical art of substituting a more positive-sounding euphemism for a distasteful word. To Hobbes, this discipline was politically dangerous, a way to manipulate and incite. What ties together all the unproven and disproven folk etymologies is also a distaste for euphemism, albeit an overactive one that churns out sinister underlying truth where none lies—a kind of allergic reaction, in which sophisticated defenses against danger turn on something benign.

Others in Hobbes’s period went further. Some of the most ambitious projects sought to abolish semiotics entirely, thus rendering language entirely immune to euphemism and manipulation. This semiotic dissolution had been the goal of generations seeking to recreate the language of Adam, enamored of what Eco calls the “myth of a language that followed the contours of the world.” At the moment where Adam named God’s creatures, it was thought, each linguistic sign existed in perfect and non-arbitrary relation with its referent. In John Wilkins’ 1668 “Essay towards a Real Character and a Philosophical Language,” this striving for prelapsarian speech collided with Hobbesian distaste for euphemism, all packaged in the supremely unruffled assuredness of the scientific revolution. Wilkins painstakingly lays out a language and accompanying writing system intended to map logically onto all of creation by assigning clearly defined syllabic units or visual markings to scientifically delineated categories. As Jorge Luis Borges put it, in Wilkins’ language, “each word defines itself,” since any concept can be expressed by combining these well-defined markings or syllables in predictable ways. The problem of “inconstant signification” disappears in Wilkins’ system, thereby eliminating the possibility that a piece of language might conceal its past. Language has no past, and it has no power to conceal. It wears its genesis in its form, and its form is its meaning.

For me, looking at Wilkins’ project induces a visceral blend of admiration and pity: admiration at his idealism, pity at his shortsightedness. His confidence, in both the practical possibility of the project and the urgency with which it is needed, is evident in the intricacies of his fold-out charts, diagrams and engravings. Trying, after decades of upheaval, to reflect the organization of the world in language, he failed to anticipate the waves of historical and technological upheavals that would follow, rendering his perfectly organized vocabulary archaic. John Wilkins did not account for the concept of a telephone or Darwinian evolution or a true crime podcast, or for the ways that the entire organization of reality might seem to form and re-form. He also did not anticipate the endless conveyor belt of the semantic treadmill, nor did he consider the sheer arbitrariness of his own perspective as a white European clergyman and speaker of English within the vast, multivariegated world.

Connotation, metaphor and metonymy stick like gum to the soles of language. Language wants complexity. That was true in Wilkins’ day, no matter how impressive his attempts to hold language fast, and it’s true now. This is not a nihilistic thing to say. On the contrary, it’s depressing and untruthful to think that a justice-oriented language should be a sanded-down, streamlined one. It’s more realistic and intellectually honest––and exciting––to instead see the palimpsest quality of language, its gummy shoe-sole-ness, as a vengeance of the silenced. One need only to look at the complexity of the creole languages formed by enslaved people, the revival of Yiddish following the Holocaust, or the creation of dictionaries for indigenous languages so that they can be spoken, reinterpreted, and re-complicated by new generations..

Consider the creole languages of the African diaspora. A first generation was kidnapped and enslaved alongside others with whom they did not share a common language. These survivors of the Middle Passage had no choice but to piece together contact languages. We can’t know how most of these individuals felt, subjected together to the most extreme and bizarre cruelties but in many cases armed with just a shaky, makeshift tool to talk about them. But we know that the next generations, born into these impromptu linguistic spaces, lived inside them and built them out, into fully-fledged languages still spoken. On examination, these creoles reveal themselves to contain the metamorphosed materials of European and African and Indigenous languages.

Or else, consider the revivals of Yiddish following the Holocaust. A world was wiped out. The presentist gaze reduces this world, its hundreds of years of language and culture and survival, to a bleak but naive prelude preceding inevitable tragedy. Dead people speaking a dead language, all mere doomed pawns, perfect silent silhouettes in the drama of history to be killed or silently saved. Fewer Jews remain alive today than before World War II, but among them, some are recovering the language of their ancestors’ lost world, and, in doing so, remembering those ancestors as more than mere tragic bodies.

Consider America’s Indigenous language revivals. The status of America’s indigenous languages vary enormously, but, in some cases, only a few elderly speakers remain. Residential schools instituted sustained and systematic efforts to quash entire ways of speaking, and punished students for using their native tongues, in twisted acknowledgment of the truth that languages contain the histories of their speakers. Now, elders, activists, indigenous linguists, young people—young people whose own parents may not speak these near-extinct languages—are collaboratively crafting dictionaries and curricula. New generations are speaking ancestral languages out loud in person and online, sometimes to audiences who have never heard them spoken.

What I mean to say, ultimately, is that “chav” is an ugly word, but its real etymology is a living record of the linguistic heritage of generations of Romani people in Britain, conversing in a language that was long viewed with suspicion, derision, and ignorance—which, despite all this, in a small way edged into the speech of the wider population, and stuck around.