In late December 2022, a team of scientists from several US universities published a paper on wiretapping. The eavesdropping method they explore is rather unusual: words spoken by the person you’re talking to on your smartphone reproduced through your phone’s speaker can be picked up by a built-in sensor known as the accelerometer. At first glance, this approach doesn’t seem to make sense: why not just intercept the audio signal itself or the data? The fact is that modern smartphone operating systems do an excellent job of protecting phone conversations, and in any case most apps don’t have permission to record sound during calls. But the accelerometer is freely accessible, which opens up new methods of surveillance. This is a type of side-channel attack, one that so far, fortunately, remains completely theoretical. But, over time, such research could make non-standard wiretapping a reality.
Accelerometer features
An accelerometer is a special sensor for measuring acceleration; together with another sensor, a gyroscope, it helps to detect changes in the position of the phone it resides on. Accelerometers have been built into all smartphones for more than a decade now. Among other things, they rotate the image on the screen when you turn your phone round. Sometimes they are used in games or, say, in augmented reality apps, when the image from the phone’s camera is superimposed with some virtual elements. Step-counters work by tracking phone vibrations as the user walks. And if you flip your phone to mute an incoming call, or tap the screen to wake up the device, these actions too are picked up by the accelerometer.
How can this standard yet “invisible” sensor eavesdrop on your conversations? When the other person speaks, their voice is played through the built-in speaker, causing it, and the body of the smartphone, to vibrate. It turns out that the accelerometer is sensitive enough to detect these vibrations. Although researchers have known about this for some time, the tiny size of these vibrations ruled out full-fledged wiretapping. But in recent years, the situation has changed for the better for the worse: smartphones now boast more powerful speakers. Why? To improve the volume and sound quality when you’re watching a video, for example. A byproduct of this is better sound quality during phone calls since they use the same speaker. The U.S. team of scientists clearly demonstrate this in their paper:
Spectrogram generated while playing the word “zero” six times:
(a) – from accelerometer data of Oneplus 3T ear speaker (older model, no stereo speakers);
(b) – from accelerometer data of Oneplus 7T ear speaker (newer model, with stereo speakers);
(c) – from accelerometer data of Oneplus 7T loud speaker (newer model, with stereo speakers).
On the left is a relatively old smartphone of 2016 vintage, not equipped with powerful stereo speakers. In the center and on the right is a spectrogram from the accelerometer of a more modern device. In each case, the word “zero” is played six times through the speaker. With the old smartphone, the sound is barely reflected in the acceleration data; with the new one, a pattern emerges that roughly corresponds to the played words. The best result can be seen in the graph on the right, where the device is in loudspeaker mode. But even during a normal conversation, with the phone pressed to the ear, there is enough data for analysis. It turns out that the accelerometer acts as a microphone!
Let’s pause here to evaluate the difficulty of the task the researchers set for themselves. The accelerometer may act as a microphone, but a very, very poor one. Suppose we got the user to install malware that tries to eavesdrop on phone conversations, or we built a wiretapping module into a popular game. As mentioned above, our program doesn’t have permission to directly record conversations, but it can monitor the state of the accelerometer. The number of requests to this sensor is limited and depends on the specific model of both the sensor and the smartphone. For example, one of the phones in the study allowed 420 requests per second (measured in Hertz (Hz)), another — 520Hz. Starting with version 12, the Android operating system introduced a limit of 200Hz. Known as the sampling rate, this limits the frequency range of the resulting “sound recording”. It is half the sampling rate at which we can receive data from the sensor. This means that at best the researchers had access to the frequency range from 1 to 260Hz.
The frequency range for voice transmittance is from around 300 to 3400Hz, but what the accelerometer “overhears” is not a voice: if we try to play back this “recording” we get a murmuring noise that only remotely resembles the original sound. The researchers used machine learning to analyze these voice traces. They created a program that takes known samples of the human voice and compares them with data they captured from the accelerator. Such training further allows a voice recording of unknown content to be deciphered with a certain margin of error.
Spying
For researchers of wiretapping methods, this is all-too familiar. The authors of the new paper refer to a host of predecessors who have shown how to obtain voice data using the seemingly most unlikely of objects. Here’s a real example of a spying technique: from a nearby building, attackers direct an invisible laser beam at the window of the room where the conversation they want to eavesdrop on is taking place. The sound waves from the voices cause the window pane to vibrate ever so slightly, and this vibration is traceable in the reflected laser beam. And this data is sufficient to restore the content of a private conversation. Back in 2020, scientists from Israel showed how speech can be reconstructed from the vibrations of an ordinary light bulb. Sound waves cause small changes in its brightness, which can be detected at a distance of up to 25 meters. Accelerometer-based eavesdropping is very similar to these spying tricks, but with one important difference: The “bug” is already built into the device to be tapped.
Yes, but to what extent can the content of a conversation be recovered from accelerometer data? Although the new paper seriously improves the quality of wiretapping, the method cannot yet be called reliable. In 92% of cases, the accelerometer data made it possible to distinguish one voice from another. In 99% of cases, it was possible to correctly determine gender. Actual speech was recognized with an accuracy of 56% — half of the words could not be reconstructed. And the data set used in the test was extremely limited: just three people saying a number several times in succession.
What the paper did not cover was the ability to analyze the speech of the smartphone user. If we only hear the sound from the speaker, at best we have only half the conversation. When we press the phone to our ear, vibrations from our speech should also be felt by the accelerometer, but the quality is bound to be far worse than the vibrations from the speaker. This remains to be studied in more detail in new research.
Unclear future
Fortunately, the scientists were not looking to create a usable wiretapping device for the here and now. They were simply testing out new methods of privacy invasion that may one day become relevant. Such studies allow device manufacturers and software developers to proactively develop protection against theoretical threats. Incidentally, the 200Hz sampling rate limit introduced in Android 12 does not really help: the recognition accuracy in real experiments has decreased, but not by much. Far greater interference comes from the smartphone user naturally during a conversation: their voice, hand movements, general moving around. The researchers were unable to reliably filter out these vibrations from the useful signal.
The most important aspect of the study was the use of the smartphone’s built-in sensor: all previous methods relied on various additional tools, but here we have out-of-the-box eavesdropping. Despite the modest practical results, this interesting study shows how such a complex device as a smartphone is full of potential data breaches. On a related note, we recently wrote about how signals from Wi-Fi modules in phones, computers, and other devices unwittingly give away their location, how robot vacuum cleaners spy on their owners, and how IP cameras like to peep where they shouldn’t.
And while such surveillance methods are unlikely to threaten the average user, it would be nice if the technology of the future were armed against all risks of spying, eavesdropping, and sneaky peeking, however small. But since these cases involve malware being installed on your smartphone, you should always have the ability to trace and block it.
As a seasoned cybersecurity expert with an extensive background in the field, I find the recent research on smartphone wiretapping using accelerometers both intriguing and concerning. The team of scientists from various US universities has delved into a novel approach to eavesdropping by exploiting the accelerometer, a standard yet seemingly innocuous sensor present in all smartphones for over a decade.
The concept revolves around the accelerometer's ability to pick up vibrations generated by the playback of words through a smartphone's speaker during a conversation. This side-channel attack leverages the accelerometer's sensitivity to detect these vibrations, effectively turning it into a makeshift microphone. The researchers demonstrated this phenomenon by analyzing accelerometer data from different smartphone models, showcasing how the advancements in speaker technology, specifically stereo speakers, have improved the feasibility of this wiretapping method.
The accelerometers in smartphones, commonly employed for tasks like screen rotation, step counting, and gesture recognition, are repurposed in this context to intercept audio information. The researchers faced challenges due to the limited sampling rate of accelerometers, restricting the frequency range available for capturing sound. Despite this limitation, they employed machine learning to analyze the acquired data, allowing for the deciphering of voice traces with a certain margin of error.
This form of eavesdropping, although currently theoretical, raises privacy concerns, highlighting the potential vulnerabilities inherent in widely-used devices. The researchers acknowledged the method's limitations, such as the inability to reconstruct 50% of the words in a conversation, but the implications of such research extend beyond the immediate results. It serves as a reminder that even seemingly innocuous sensors can be repurposed for malicious intent, necessitating ongoing efforts from device manufacturers and software developers to proactively address potential threats.
The study also draws parallels with other unconventional surveillance techniques, such as using laser beams to detect vibrations on windows or reconstructing speech from light bulb vibrations. What distinguishes accelerometer-based wiretapping is the fact that the "bug" is already embedded in the device, making it an inherent privacy concern.
Looking ahead, it is reassuring that the researchers did not aim to create an immediate threat but rather explored potential privacy invasion methods for future consideration. This proactive approach allows for the development of countermeasures to protect against emerging threats. The study underscores the need for robust security measures, especially as technology evolves and introduces new avenues for potential privacy breaches.
In conclusion, while the current wiretapping method using accelerometers may not pose an imminent threat to the average user, it highlights the importance of staying vigilant against evolving surveillance techniques. As technology advances, it becomes imperative to fortify our devices against all possible risks, ensuring that users can trust their smartphones to safeguard their privacy against any form of spying or eavesdropping.