You probably assume that someone can only see what's on your computer screen by looking at it. But a team of researchers has found that they can glean a surprising amount of information about what a monitor displays by listening to and analyzing the unintended, ultrasonic sounds it emits.
The technique, presented at the Crypto 2018 conference in Santa Barbara on Tuesday, could allow an attacker to initiate all sorts of stealthy surveillance by analyzing livestreams or recordings taken near a screen—say from a VoIP call or video chat. From there, the attacker could extract information about what content was on the monitor based on acoustic leakage. And though distance degrades the signal, especially when using low quality microphones, the researchers could still extract monitor emanations from recordings taken as far as 30 feet away in some cases.
"I think there's a lesson here about being attuned to the unexpected in our physical environment and understanding the physical mechanisms that are behind these gadgets that we use," says Eran Tromer, a cryptography and systems security researcher at Tel Aviv University and Columbia University, who participated in the research. The acoustic leaks are "a phenomena that in this case was not intended by the designers, but it's there and therefore forms a security vulnerability."
The attack is possible because of what's known as a "physical side channel," data exposure that comes not from a software bug, but from inadvertent interactions that leak information between a computer's hardware and the data it processes. In the case of the monitor investigation, the researchers—who also include Daniel Genkin of University of Michigan, Mihir Pattani of University of Pennsylvania, and Roei Schuster of Tel Aviv University and Cornell Tech—found that the power supply boards in many screens emit a high-pitched or inaudible whine as they work to modulate current. That whine changes based on varying power demands from a screen's content-rendering processor. This connection between user data and the physical system creates an unforeseen opportunity for snooping.
"One day I happened to be browsing a particularly boring legal agreement with many lines of proverbial small print," Tromer says. "It was too small, so I zoomed in, and then I realized that something in the ambient noise in the room changed. So I zoomed back out and the sound changed back. After awhile I realized that something about the periodicity of the image was affecting the periodicity of the sound."
The researchers tested dozens of LCD monitors in a variety of different sizes, and found acoustic emanations of some sort in all of them. The test models were made as early as 2003 and as recently as 2017, and came from virtually all leading manufacturers.
All electronics whir and whine, but monitors specifically produce a type of acoustic emanation that proves particularly useful for an attacker. "The thing about this one is that it’s at a high frequency, and therefore it can bear much more modulated information on top of it," Schuster says. "And it is indeed modulated by something sensitive, in this case the screen information."
Having confirmed those ultrasonic whines, the researchers next tried to extract information based on them. The built a program that generated different patters of alternating black and white lines or chunks, then made audio recordings as they cycled through. Once they had a solid base of data, they moved to taking measurements while displaying popular websites, Google Hangouts, and human faces, to see if the they could differentiate between them in the recordings.
The group fed all of this information into machine learning algorithms as training data, and began generating increasingly accurate translations of what was on the screen based on the inaudible emanations captured in recordings. On some zebra patterns and websites, the researchers had a 90-100 percent rate of success. The researchers even started to notice that their system could sometimes extract meaningful data from recordings of screens their machine learning model had never encountered before.
"Even if an attacker can't train on a specific monitor model, there's still a very good chance that the attack will work anyway," Schuster says.
The group then expanded its work, training the system to decipher letters and words onscreen. While a much more challenging task—words don't follow reliable visual patterns like the layout of a website—the researchers could generate reliable results for words in large font. As Genkin notes, black words on a white screen are similar in many ways to zebra stripes, and while there are countless word combinations, there are still only 26 letters in the Roman alphabet for the system to learn.
The researchers even realized that they could detect what someone typed on a smartphone's onscreen keyboard with some accuracy. Typically, digital keyboards are seen as safer than mechanical keyboards, which can give away what someone's typing with their acoustic emanations. As it turns out, on-screen keyboards aren't immune to these acoustic side-channel attacks either.
Though the researchers used high-quality studio microphones for some of their experiments, they focused primarily on consumer-grade microphones like those found in webcams and smartphones. They found those to be totally adequate for extracting the acoustic emanations of a screen. If an attacker wanted to surveil the screen of someone she was video chatting with, for instance, she could simply record the sound output from their microphone.
In another scenario, like an interview, an attacker could put their smartphone on a table or chair next to them and use it to record room noise while their interviewer looked at a screen turned away from the attacker. The researchers also note that the microphones in smart assistant devices can pick up monitor emanations. So if you keep one of those gadgets near one of your screens, the snippets of audio the smart assistant sends to its cloud processing platform likely contain emanations from the monitor. And since acoustic leakage from screens is mostly ultrasonic, audible noise like loud music or talking doesn't interfere with a microphone's ability to pick it up.
The researchers say that this speaks to the larger challenges of mitigating these attacks. It's not practical to flood most spaces with radio frequencies across the spectrum that would interfere with screen emanations. Manufacturers could better shield electronic components inside monitors, but that would add cost. Another approach would be to develop software countermeasures that specifically work to manipulate the information a monitor is processing, making it harder to discern. But you would need to embed those measures in every application, which the researchers concede is likely not practical. At the very least, though, it might be worth considering for browsers, or heavily used video chat programs.
For a hacker, using this type of acoustic screen attack would obviously be much more complicated and labor intensive than phishing or infecting a computer with malware. But the researchers say they were surprised by the accuracy they could achieve, and a motivated attacker could potentially refine their machine learning techniques much further. With so many screens unintentionally leaking these signals, the world is a playground for an attacker skilled and motivated enough to try.