In many busy homes around the world, it’s not uncommon for kids to shout out commands to Apple’s Siri or Amazon’s Alexa. They might make a game by asking a Voice Activated Personal Assistant (VAPA) time or requesting a popular song. While this may seem like a mundane part of family life, there is more to do. VAPA continuously listens, records, and processes sound events in a process known as “wiretapping,” which is a combination of wiretapping and data mining. This raises major concerns related to privacy and surveillance and discrimination issues, as the voice traces of people’s lives are digitized and censored by algorithms.
These concerns are exacerbated when we apply them to children. The way their data accumulates over a lifetime goes far beyond the data collected from their parents, with far-reaching effects that we haven’t even begun to understand.
The adoption of VAPA is proceeding at a breakneck pace as it expands to include mobile phones, smart speakers, and an ever-increasing number of products connected to the Internet. These include digital toys for children, home security systems that listen for break-ins, and smart doorbells that can pick up sidewalk conversations.
There are pressing problems with collecting, storing and analyzing voice data related to parents, adolescents and children. The alarm has been raised in the past—in 2014, privacy advocates raised concerns about Amazon Echo’s listening volume, the data it’s collecting, and how Amazon’s recommendation engine uses that data.
Yet despite these concerns, VAPA and other eavesdropping systems have spread exponentially. Recent market research predicts that the number of voice-activated devices will balloon to more than 8.4 billion units by 2024.
Recording is more than just voice
More than verbal statements were collected, as VAPA and other eavesdropping systems overheard personal characteristics of voices that involuntarily revealed biometric and behavioral attributes such as age, gender, health, intoxication, and personality.
Information about the acoustic environment (such as a noisy apartment) or specific sound events (such as broken glass) can also be collected through “auditory scene analysis” to judge what is happening in that environment.
The wiretap system has recently had a record of cooperating with law enforcement agencies and has been subpoenaed for data in criminal investigations. This has raised concerns about the spread of other forms of surveillance and the profiling of children and families.
For example, smart speaker data can be used to create profiles such as “Noisy Homes,” “Disciplinary Styles,” or “Problem Youth.” In the future, the government may use this to describe those who rely on social assistance or families in crisis, with potentially dire consequences.
There are also new eavesdropping systems being proposed as solutions to keep children safe, called “attack detectors”. The technologies, which consist of microphone systems loaded with machine-learning software, dubiously claim they can help predict violence by listening for signs of heightened volume and mood in voices, as well as other sounds like glass breaking.
Aggression detectors are advertised in school safety magazines and law enforcement conferences. They are deployed in public spaces, hospitals and high schools under the guise of being able to preempt and detect mass shootings and other cases of deadly violence.
But there are serious questions about the effectiveness and reliability of these systems. One brand of detectors repeatedly misinterpreted a child’s vocal cues, including coughing, screaming and cheering, as indicators of aggression. This begs the question of who is protected and who is made less secure by design.
Some children and youth are disproportionately harmed by this form of safe listening, and the interests of all families are not uniformly protected or served. A repeated criticism of voice-activated technology is that it reproduces cultural and racial biases by enforcing sound norms and misidentifying culturally diverse forms of speech related to language, accents, dialects and slang.
We can foresee that the speech and voices of racialized children and youth will be disproportionately misinterpreted as offensive. This disturbing prediction should come as no surprise, as it follows deep-rooted colonial and white supremacist histories that have consistently maintained “voice color lines.” Sound Policy Eavesmining is a rich information and monitoring site, as the voice activity of children and families has become a valuable source of data collected, monitored, stored, analyzed and sold to thousands of third parties. These companies are profit-oriented and have few ethical obligations to children and their data.
Because there is no legal requirement to delete this data, it accumulates over a child’s lifetime and may last forever. It’s not known how long and far-reaching these digital traces will be as children age, how widely the data will be shared, or how much it will be cross-referenced with other data. These problems have serious implications for children’s lives now and as they age.
Wiretapping poses countless threats to privacy, surveillance, and discrimination. Personalized recommendations, such as information privacy education and digital literacy training, will not be effective in addressing these issues and place an overly large responsibility on families to develop the literacy necessary to deal with wiretapping in public and private spaces.
We need to consider advancing a collective framework to combat the unique risks and realities of wiretapping. Perhaps the development of the principles of fair listening practice – an auditory spin on the “principles of fair information practice” – will help assess the platforms and processes that affect children’s and families’ sound lives.