Recently the military coup government in Myanmar added serious allegations of corruption to a set of existing spurious cases against Burmese leader Aung San Suu Kyi. These new charges build on the statements of a prominent detained politician that were first released in a March video that many in Myanmar suspected of being a deepfake.
In the video, the political prisoner’s voice and face appear distorted and unnatural as he makes a detailed claim about providing gold and cash to Aung San Suu Kyi. Social media users and journalists in Myanmar immediately questioned whether the statement was real. This incident illustrates a problem that will only get worse. As real deepfakes get better, the willingness of people to dismiss real footage as a deepfake increases. What tools and skills will be available to investigate both types of claims, and who will use them?
In the video, Phyo Min Thein, the former chief minister of Myanmar’s largest city, Yangon, sits in a bare room, apparently reading from a statement. His speaking sounds odd and not like his normal voice, his face is static, and in the poor-quality version that first circulated, his lips look out of sync with his words. Seemingly everyone wanted to believe it was a fake. Screen-shotted results from an online deepfake detector spread rapidly, showing a red box around the politician’s face and an assertion with 90-percent-plus confidence that the confession was a deepfake. Burmese journalists lacked the forensic skills to make a judgement. Past state and present military actions reinforced cause for suspicion. Government spokespeople have shared staged images targeting the Rohingya ethnic group while military coup organizers have denied that social media evidence of their killings could be real.
But was the prisoner’s “confession” really a deepfake? Along with deepfake researcher Henry Ajder, I consulted deepfake creators and media forensics specialists. Some noted that the video was sufficiently low-quality that the mouth glitches people saw were as likely to be artifacts from compression as evidence of deepfakery. Detection algorithms are also unreliable on low-quality compressed video. His unnatural-sounding voice could be a result of reading a script under extreme pressure. If it is a fake, it’s a very good one, because his throat and chest move at key moments in sync with words. The researchers and makers were generally skeptical that it was a deepfake, though not certain. At this point it is more likely to be what human rights activists like myself are familiar with: a coerced or forced confession on camera. Additionally, the substance of the allegations should not be trusted given the circumstances of the military coup unless there is a legitimate judicial process.
Why does this matter? Regardless of whether the video is a forced confession or a deepfake, the results are most likely the same: words digitally or physically compelled out of a prisoner’s mouth by a coup d’état government. However, while the usage of deepfakes to create nonconsensual sexual images currently far outstrips political instances, deepfake and synthetic media technology is rapidly improving, proliferating, and commercializing, expanding the potential for harmful uses. The case in Myanmar demonstrates the growing gap between the capabilities to make deepfakes, the opportunities to claim a real video is a deepfake, and our ability to challenge that.
It also illustrates the challenges of having the public rely on free online detectors without understanding the strengths and limitations of detection or how to second-guess a misleading result. Deepfakes detection is still an emerging technology, and a detection tool applicable to one approach often does not work on another. We must also be wary of counter-forensics—where someone deliberately takes steps to confuse a detection approach. And it’s not always possible to know which detection tools to trust.
How do we avoid conflicts and crises around the world being blindsided by deepfakes and supposed deepfakes?
We should not be turning ordinary people into deepfake spotters, parsing the pixels to discern truth from falsehood. Most people will do better relying on simpler approaches to media literacy, such as the SIFT method, that emphasize checking other sources or tracing the original context of videos. In fact, encouraging people to be amateur forensics experts can send people down the conspiracy rabbit hole of distrust in images.
We can prepare better for deepfakes, however. For the past three years, my organization WITNESS, the global human rights, video, and tech network, has led a process to do just that. We met with journalists, activists, and civic leaders in Brazil, sub-Saharan Africa, Southeast Asia, and elsewhere to ask about their concerns about media manipulation. They wanted tools to spot new forms of deception yet had questions about who would have access to these detection tools for deepfakes, if it would be a luxury for the media elites of Europe and the US, and who would have the media forensics skills to use the findings. If we are not attentive to these questions, the default path for access to detection will perpetuate existing global inequalities. A lack of attention to technology’s impacts have already played out with harmful real-world consequences in countries like Myanmar, Ethiopia, and India.
We need to make good detection tools available to selected civic institutions and media, and we need to do this with equity in places that are most vulnerable to deepfakes. However, this is not without its challenges. Civic activists and rights defenders in Africa and Latin America who WITNESS and Partnership on AI consulted rightly asked who could be trusted to have access when media entities are often tightly tied to the very governments that are the sources of misinformation and disinformation. But we should try now, before we are beset by an increasing number of global situations where realistic deceptive videos cannot be countered.
This protocol needs to be combined with the ability to escalate critical would-be fakes to people with expertise. First proposed by the technologist Aviv Ovadya, a standing media forensics expert capacity could be made available for high public-interest emergencies. Rapidly escalating a contested corruption allegation against a leading politician is exactly the type of case where this would be warranted. This approach only works for a small number of cases, so we also need a commitment by funders, journalism educators, and the social media platforms to deepen the media forensics capacity of journalists, rights defenders, and others who are at the forefront of protecting truth and challenging lies. That capacity is currently concentrated in American and European academia and media, social media companies, and intelligence and law enforcement groups, rather than global media and civil society.
To stem the potential chaos of both real deepfakes and claims of deepfakes that exploit our inability to discern falsified video from real, we must ensure that the professionals and committed individuals who spend their days sorting fact from fiction have the skills and tools available to them to question and interpret videos, to appropriately incorporate the outputs from media forensics and detection tools, and to explain these to their audiences and publics. The future doesn’t have to be one in which anything can be called a deepfake, anyone can claim something is manipulated, and trust is further corroded.
WIRED Opinion publishes articles by outside contributors representing a wide range of viewpoints. Read more opinions here, and see our submission guidelines here. Submit an op-ed at email@example.com.