Present on more than 31 million devices around the world, Alexa enables user interaction after a wake-up word (specifically, “Alexa”) activates it. Next, the Intelligent Personal Assistant (IPA) launches the requested capability or application – called skill, it either comes built-in or is installed from the Alexa Skills Store.
Checkmarx researchers built a malicious skill application capable of recording user’s speech in the background and then exfiltrating the recording, all without alerting the user.
Because of the required wake-up word, the recording would have to be performed after the activation. However, the listening session would normally end after a response is delivered to the user, to protect privacy, yet the researchers found a way to keep the session alive and to hide that from the user.
A shouldEndSession flag allows a session to stay alive for another cycle, after reading back the service’s text as a response. However, reading back the text would reveal to the user that the device is still listening.
To overcome this issue, the researchers used a re-prompt feature, which works in a similar manner, but accepts “empty re-prompts.” Thus, they could start a new listening cycle without alerting the user on the matter.
Finally, the researchers also focused on being able to accurately transcribe the voice received by the skill application. For that, they added a new slot-type to capture any single word, not limited to a defined list of words. They also built a formatted string for each possible length.
Of course, users would still be alerted on a device listening to them because the blue light on Amazon Echo lights-up when a session is alive. However, some Alexa Voice Services (AVS) vendors would embed Alexa capabilities into their devices without providing the visual indicator, and it’s also highly likely that users would not pay attention to that light.
“While the shining blue light discloses that Alexa is still listening, much of the point of an IPA device is that, unlike a smartphone or tablet, you do not have to look at it to operate it. In fact, these IPAs are made to be placed in a corner where users simply speak to a device without actively looking in its direction,” the researchers say.
As long as speech is recognized and words picked up, the malicious skill can continue to eavesdrop in the background, without the user noticing it. In case of silence, Alexa closes the session after 8 seconds, but a silence re-prompt (defined with an empty output-speech that the user cannot hear) can double the grace period to 16 seconds, the security researchers say.
Checkmarx informed Amazon on their findings and worked with the company to mitigate the risks. Specific criteria to identify (and reject) eavesdropping skills during certification were put in place, along with measures to detect both empty-reprompts and longer-than-usual sessions, and take appropriate actions in both cases.
The security researchers also published a video demonstration of how the attack works.