On Twitter, when a simple ha won’t do, there’s always hahahaaaa, haaaahaaaa, or even hahahahahahahahahahahahaha, indicating you’ve just read the funniest thing you’ve ever seen. (Or that you’re a sarcastic talking raccoon.) These are known as stretchable or lengthened words, and now researchers from the University of Vermont have figured out just how pervasive they are on Twitter, uncovering fascinating patterns about their use.
Stretchability is a powerful linguistic device that visually punches up a written word, imparting a wide range of emotions. That goes for the gooooooaaaaaaal of a soccer announcer, a teenager’s exasperated finallyyyyy, and a surfer’s aweeeeeesome. And booooy are they popular on Twitter. Writing today in the journal PLOS One, the researchers detail how they combed through 100 billion tweets, mapping how often these words are stretched, and how far they are elongated—haha versus hahahahaaaa, for example.
Consider dude and its many formulations. “That can convey basically anything, like ‘Duuuuude, that's awful,’” says University of Vermont applied mathematician Peter Sheridan Dodds, one of the study’s coauthors. On the other hand, “Dude!” is different. “It could be excitement; it could be joy,” says Dodds.
But not everyone is down with using exclamation marks for emphasis or emotion, including yours truly. “I hate using exclamation marks because they just don't fit my personality,” I tell Dodds and his coauthor, Chris Danforth, also an applied mathematician at University of Vermont. But I do stretch words: “I’ve found myself recently in texts to friends or messages to coworkers doing thaaanks with three As, to signify some sort of excitement and appreciation without having to use a stupid exclamation mark.”
“Just three?” asks Danforth. “That's restraint. Because two would not work. Two is like, this person doesn't know how to spell. They've made a mistake.”
All right, sooooo, we use stretchable words all the time to convey extra meaning—sadness, anger, excitement. And that can be particularly powerful on a platform like Twitter, whose inherent brevity doesn’t exactly encourage nuanced communication. Those extra letters add some oomph to a brief message, making it more attention-grabbing. “You're taking what we would think of as the dictionary text and you're turning it into something visual,” says Danforth. “It can't be ignored when you see 20 As in a row.”
To quantify this, Dodds, Danforth, and the lead author of the paper, University of Vermont computational linguist Tyler Gray, randomly selected 10 percent of all tweets sent out between 2008 and 2016, around 100 billion in all. (They have an arrangement with Twitter to obtain this data.) Gray wrote a program that searched the data for stretched words, specifically looking for repeated letters.
First, they wanted to quantify which letters were repeated, and how often. So take gooooaaaaal for example. The program “sees one G, and then it sees an O,” says Dodds. It counts up the As and Ls, too. Even if it only counts one G, it will see that the rest of the letters are highly repetitive—maybe there are 20 Os and 20 As. “So this seems like a candidate for a stretchable word,” continues Dodds.
The system then represents these stretchable candidates with a simple notations. If the G and the L in gooooaaaal aren’t repeated, the formula will look like g[o][a]l. Gggooooaaaallll, on the other hand, would look like [g][o][a][l], because each letter is repeated.
This quantifies what the researchers call the “balance” of a stretchable word. Goooooaaaal isn’t very balanced, because the four different letters repeat at different rates. Hahahahaha, on the other hand, is highly balanced, because H and A repeat at the same frequency. Haaaaa, though, is unbalanced.
The researchers could then visualize the average number of repetitions per character, like in the graph above. With the various stretched spellings of the word goal on Twitter, the G repeats maybe once or twice. (Think about a soccer announcer yelling guh-guh-guh-guh-guh-oal and how quickly they’d be fired.) So here in this graph, you can see the number of characters as the vertical axis, and the repetition of specific characters as the horizontal axis. Moving from the top of the graph to the bottom, the word stretches. But if you look at G, its frequency doesn’t increase much at all as the word lengthens. You can see that the O, A, and L, by contrast, are repeated more as the word stretches.
This is because the G sound is plosive, a consonant that is spoken by stopping the airflow in your mouth. You can’t drag it out like you can an aaaaah or ooooh. So in the case of the word goal, it’s the vowels that do the lengthening, and they tend to lengthen in lockstep with one another. “What we didn't know beforehand is that those lines are pretty linear,” says Dodds. “So if you do 140 characters or 80 characters, the balance of O, A, and L is actually pretty much the same.” Which is in keeping with the classic soccer announcer cry of “Gooooooaaaaaaaaalllllll”—it is light on the Gs and heavy on the rest of the word.
Now, consider ha. Boring, unenthused, but stretchable into a galaxy of different forms, visualized in the image above—call it the Tree of Laughter. That H at top is where any tweeted “ha” begins. Branching to the left is what happens if the tweeter for some reason adds another H instead of an A. Some tweeters finally add an A to make hha, branching to the right, but at far left you can see what happens if they keep adding Hs at the beginning.
Returning to the top of the image, if we move right from the starting H, tweeters are adding an A to begin making hahahaha instead of hhhhaaaa. This is the more popular path, so the bars connecting the letters here are thicker. Going from ha to hah, for instance, is more popular than going from ha to haa. The prevailing path, as we might expect, is a nice, clean, highly balanced hahahahahaha. The aberrant haaha or hahhah is likely just a mistype.
In general, two-letter words stretch farther than regular words, like finallyyyy. The words in the above trees also play out as we might expect. Fuuuuuu is a popular expression of that particular linguistic rage. “People start with F, and then they lay on the Us,” says Danforth. Same for awwwwwww.
Because stretched words can be embedded with so much extra meaning beyond the words themselves, understanding them is critical for artificial intelligences that analyze text, like chatbots. At the moment, a stretched word may be so perplexing for an AI that the program just skips over it entirely. We don’t want to have to bold or italicize words to emphasize them for the chatbot to parse—and even then, such formatting can’t replicate the range of emotions that stretched words convey.
“If we're ever going to get to a point where an AI can understand the range of communication that people actually use in a day-to-day basis, this is one of the places where it's at,” says Sam Brody, who published his own research on Twitter word lengthening in 2011, prior to joining Bloomberg's AI group as a senior research scientist. This new research, which Brody wasn’t involved in, is a step toward quantifying and translating stretched words into subtle linguistic rules that machines can understand.
Who, after all, will help save Justin Bieber from attention-hungry fans? One quirk the researchers noticed was that when Twitter users were trying to be uber-emphatic, like to attract the attention of a celebrity, they elongated everything. “There was a second kind of word,” says Dodds, “like: ‘fffffooooolllllllloooooowwwwww mmmmmmeeeeee, Justin Bieber.’ People would stretch out the F, the O, the L, or they would just stretch the whole thing out. Because there was a sense that this would be exciting to Justin.”
It probably doesn’t work. But no harm tttttrrrrrryyyyyiiiiiinnnnggggg.