Tiktok LLM

Tiktok's opaque censorship protocol has led to the development of euphemisms adopted offline too, but the language acquisition users are modeling has precedent.

A grinning young couple points upward toward a caption reading “our simple seggs aftercare routine.” They playact the steps: shower, snuggle, bed. Scroll: next video, a woman in sunglasses tells us, “once again in Central Florida, Yahtzees are out in numbers protesting.” Behind her, green-screened, a photo of men in red shirts and black masks, holding flags emblazoned with swastikas. TikTok is home to a mirror-lexicon, a distinct vocabulary that has arisen in response to the platform’s intense and often inscrutable censorship. In this newly conventionalized glossary of euphemisms, sex becomes seggs, and nazis are yahtzees. Kill becomes unalive. Sexual assault is S.A. Porn is corn, or the corn emoji, and rape is grape. “Boycott MAC! They support unaliving watermelon people,” read a recent TikTok comment, screenshotted and mocked elsewhere on the internet. What this means is: “Boycott MAC! They support killing Palestinians.”

In linguistics, this would be called a “replacement vocabulary”, though others have dubbed it “algospeak”—a reference to the peculiar way in which language develops on algorithmically-centered platforms. TikTok as a body is extremely coy about what they choose to censor, as well as the mechanisms by which they do so (removing videos that include disallowed words versus merely suppressing views versus “shadowbanning” creators). Their community guidelines are vague: “TikTok provides content that ranges from very family friendly to more mature. Given the diversity of our global community, developmental and cultural considerations inform our approach to potentially sensitive and more mature content that may be considered offensive by some.” The guidelines go on to suggest that TikTok will censor or limit videos touching on certain “potentially sensitive and more mature” themes, including “Sexually Suggestive Content” and “Shocking and Graphic content.” TikTok is equally opaque about the mechanisms they use to censor content. In more extreme cases, they’ve removed videos that include disallowed words. Sometimes they’ll allow content to remain online, only to restrict its reach by suppressing its ranking in the algorithm or “shadowbanning” the creator entirely.)

A good deal of evidence suggests that TikTok censors more or less as it pleases, and seemingly has little appetite for developing the kinds of universal community guidelines that govern most other social media platforms. Some content—videos critical of the Chinese government, for instance—will almost certainly be censored. Censorship also seems to function on a per-user level, with nonwhite and visibly disabled creators facing worse consequences for violating guidelines (to the extent that it’s even possible to violate unspoken, inconsistent guidelines).

This panopticonish murkiness has produced a micro-folklore, a communal culture of guesswork as creators try to suss out the motivations of a shadowy algorithm, and an accompanying array of taboos and rituals and euphemisms as creators try not to invoke negative consequences. Each creator has a different risk tolerance when it comes to posting potentially “sensitive and more mature” content and a different set of strategies for navigating that risk: seek a less-inflammatory synonym? Speak normally but leave risky words out of your written caption? Or, perhaps, use part of the mirror-lexicon: seggs or unalive or yahtzee are far less risky ways of getting your point across, without sacrificing linguistic clarity.

Inevitably, TikTok-speak has crept outside of the censored arena, spreading into the wider internet and even offline. “When was the last time you had a good, rough, out-of-this-world seggs?” asks one tweet (I urge you not to search “seggs” on Twitter, by the way). In response to a tweet asking people to share little-known Black history facts, someone has written “Martin Luther King Jr. tried to unalive himself twice before the age of 13.” These euphemisms have even left the bounds of social media entirely—the copy for one new book, seemingly quoting the text, teases “sometimes life gives you choices. Therapy or unaliving spree. Therapy was too expensive.” Teachers fret about students using TikTok euphemisms. “Student wrote “Un-alive” in an email to me,” worries one Reddit user, “I keep trying to figure out if it means what I think it means: she really truly thinks that nobody is supposed to use the word “dead” in real life??” In other words, these euphemisms are no longer really algospeak, in the sense that they are in many cases not being driven by the TikTok algorithm. They have become genuine, voluntarily-chosen real-world euphemisms, driven by real-world linguistic taboos.

Linguistic taboos have odd and counterintuitive effects on language, expanding it even as they draw limits around what speakers can and cannot say. As the linguistic anthropologists Luke Fleming and Michael Lempert observe, “Efforts to proscribe speech may be justified variously, by appeal to religious dictates, state policy, or etiquette. They may be conventionalized and institutionalized, policed and punished in myriad ways. But a familiar irony haunts all these efforts: proscription is, in a word, productive.” In other words, taboos drive the creation of new words. When one route to expression is blocked, speakers build alternative paths to get to the same place. We often call these paths euphemism. Sometimes, euphemisms become tainted too. They require speedy replacement, bringing new vocabulary to speakers at a much higher rate than would occur without the taboo in play.

Consider, for instance, so-called avoidance speech, also known as the “mother-in-law taboo.” In a wide range of unrelated languages—most famously Kambaata, spoken in parts of Ethiopia, but also in, for instance, certain Bantu languages as well as some spoken in Australia—avoidance speech limits how sons or daughters-in-law address the parents of their wife or husband. There is a great deal of diversity among types of avoidance speech: in some varieties, a man does not address his mother-in-law directly, or woman does not address her father-in-law directly. In other cases, the taboo concerns the use of the in-law’s name—including (at times) words or syllables phonetically similar to that name. This is the case, for instance, in the language Datooga, spoken in parts of Tanzania. In Datooga, many women avoid words that sound similar to the names of their in-laws. In these cases, a relatively large amount of vocabulary becomes impolite for the speaker. This in turn prompts speakers to employ a replacement register. The linguist Alice Mitchell offers the following example:

... if a woman’s father-in-law is called Gidabasooda, a name which consists of the masculine name prefix gida and the common noun bas´ooda,` ‘lake’, she will refrain from ever saying this name or the word bas´ooda`.

But, Mitchell goes on to explain, this does not lead to a simple excising of terms from the speaker’s vocabulary. “In place of [the avoided words], she will say heyw´anda`,” Mitchell writes. This term, heyw´anda`, does not appear in Datooga vocabulary except in avoidance speech. But, because it is a conventionalized word within the avoidance register, it is “unlikely to impede communication.”

Avoidance speech offers an excellent example of the productive nature of taboo. When it becomes advisable to avoid a given set of vocabulary items, language expands rather than shrinks. For an English speaker unfamiliar with avoidance speech, the practice might initially sound confusing or intricate—yet these complex systems of taboo, avoidance, and euphemism are in some ways linguistic universals. An English speaker experiences a milder version of avoidance speech when they use terms like “Mr.” and “Mrs.” and “Ma’am” in place of names. These are not the only ways in which English speakers can bear witness to the productive nature of taboo. Consider the rate at which words for concepts like “toilet” are replaced and euphemized: ladies’ room, lavatory, privy, W.C., washroom. “Toilet” itself began as a euphemism, originally referring to a piece of cloth covering a dressing table. In contemporary American English, the word “toilet” is a little too direct, slightly too redolent of the bodily acts it implies—speakers are more likely to say they need to “use the bathroom” (my grandmother preferred “the john”). Words, when associated with impolite concepts, pejorate easily. Euphemisms imported to replace a rude word soon become rude themselves. Euphemism gives way to euphemism; proscription produces productivity.

This principle is easy enough to see on TikTok, and in the increasing number of speech scenarios where TikTok mirror-speak now appears. But while TikTok’s censorship is linguistically productive in that it has created new words or new senses of words, this productiveness differs from the typical politeness-driven process around euphemism. The transformation of “toilet” to “restroom” was organic, a collective decision to change how we speak about this particular room. The TikTok version, on the other hand, is top-down. It is about serving the needs of a censor whose rules and motives are unclear. Thus, the parallel vocabulary emerges out of a hermeneutic process that involves speculating about, and trying to understand, the unstated preferences of the corporation. A TikToker can’t simply adhere to the rules of the platform (which are unknowable) — they have to intuit the thought processes that underlie those rules. In doing so, the TikToker reinforces these unspoken rules. By carefully acting according to what they believe the platform desires, the speaker creates and articulates those corporate desires.

Another way to think about this is that the TikToker (or at least the TikToker who employs the replacement vocabulary of seggs and unalive and corn) does not simply speak via TikTok the platform. They speak on behalf of and in response to TikTok the company. In this way, TikTok differs immensely from platforms like Twitter and Facebook, which are steeped in Silicon Valley libertarianism. While these platforms mold, suppress, and promote speech in their own ways, their methods of doing so are deliberately underemphasized and invisible, with overt censorship absent from the approach. In Mark Zuckerberg’s imagining, Facebook is “the digital equivalent of a town square”—a flat, neutral public gathering space unmediated by private owners. On TikTok, the company is front and center, always evoked by speakers’ own self-censorship.

Even off TikTok, TikTok asserts its presence through the assimilation of the TikTok replacement vocabulary, Both taboo utterances themselves, and the euphemisms used to circumnavigate those utterances, can evoke and recall a potentially offended party, whether or not the addressee is actually present. “Strong performatives usher into existence not just actions, like blasphemy, but addressees,” observe Fleming and Lempert. Similarly, when speakers deploy TikTok euphemisms like unalive outside of the platform, they summon into being their addressee and the hierarchical frame by which taboos are enforced: TikTok itself.

If TikTok is the structure that enforces the taboos, who is the ultimate addressee of TikTok replacement vocabulary? According to TikTok’s own rhetoric, it is the child—not any specific child, but the figure of the ideal, innocent child in need of linguistic protection. The figure of the youth is reiterated over and over again in TikTok’s official policies and justifications for their censoriousness: “We allow a range of content on our platform, but also recognize that not all of it may be suitable for younger audiences. We restrict content that may not be suitable so that it is only viewed by adults (18 years and older),” the Community Guidelines explain, going on to reiterate that “Youth safety is our priority.” Indeed, the very notion of “mature” themes implicitly invites us to imagine the child as the beneficiary of the platform’s rules, even when youth is not specifically invoked.

While these policies suggest that children are the passive audience that must be protected, they also serve to infantilize creators. By this, I don’t mean that they’re merely shielding their audiences from X-rated concepts like “death,” “Palestine,” and “sex.” They are engaging in a language-acquisition process not dissimilar to the one young children undergo. The growing discipline of parsing the platform’s “desires” resembles the way children learn to engage in polite conversation without causing offense or invoking censoriousness from adult authorities. English-speaking adults might instruct children to say please, or avoid profanity—but for the most part, the pragmatic elements of a child’s linguistic education are unspoken and unwritten. They are learned by trial and error and observation, rather than by instruction and explanation. After all, these pragmatic principles are unconscious to adult speakers as well as child ones.

Eventually, most children internalize the unspoken rules of polite speech, and come to reinforce, shape, and shift them in their own speech communities. Just as a child learning to speak for the first time comes not only to internalize unspoken norms but also to help create the next iteration of those norms, so too does the TikTok creator. The use of the TikTok replacement vocabulary elsewhere on the internet reflects the degree to which these norms are internalized by users, whether consciously (to help content spread smoothly across platforms, for instance) or unconsciously, in the manner of most of our linguistic habits.

At the same time, a competing regime of euphemism makes real children, described in third-person reporting, into so many small adults. We can bear witness to a jarring reversal of the infantile and the adult.

As one variety of infantilization occurs online, much traditional media avoids characterizing actual children as childlike—a tendency on full display in current coverage of Gaza. In fact, this particular system of proscription is so conventionalized and widely-acknowledged that The Onion, at the end of 2023, published a satirical listicle titled “Every Word Besides ‘Children’ Used To Describe Palestinians Under 18.” In January, The Intercept completed an analysis of coverage about Israel/Palestine in three major U.S. papers, and noted that “Despite Israel’s war on Gaza being perhaps the deadliest war for children — almost entirely Palestinian — in modern history, there is scant mention of the word ‘children’ and related terms in the headlines of articles surveyed.” Tiktokers make clumsy but sincere negotiations, trying to counter a dominant media framing by highlighting the plight of these children robbed of their childhoods. In a recent interview, Mitt Romney mused that “Some wonder why there was such overwhelming support for us to shut down potentially TikTok or other entities of that nature. If you look at the postings on TikTok and the number of mentions of Palestinians, relative to other social media sites—it’s overwhelmingly so among TikTok broadcasts.”

The American creators who substitute the watermelon emoji for the word “Palestinians” are, it’s true, complicit in their own infantilization. Yet they use this childlike replacement vocabulary to crack an inscrutable system of censorship, to recoup denied childhoods. Meanwhile, American efforts to ban TikTok at times hinge on the belief that the platform enforces the wrong type of censorship—allowing the wrong type sympathy for the wrong type of child. Through this system of taboo and euphemism, social-media creators are made to seem—and to think and feel—younger than they truly are, like children navigating the capriciousness of language for the first time.