Natural language processing
White-collar drones complain about going to meetings. But they also complain about not going to them: If you miss a crucial discussion, you’re out of the loop. Nobody takes good notes, people forget what was said or remember selectively. Information dies. You can record meetings, but who’s going to listen to hours of that?
Last year, Zack Kanter—the CEO of Stedi, which makes business-to-business software—experimented with a new tool, Rewatch, that tries to solve this problem. Every time Stedi employees hold meetings on Zoom, Rewatch records the whole thing, then uses voice-dictation AI to transcribe it. That means Kanter and his 56 employees, who all work remotely, now have a searchable archive of everything said in all meetings.
“It’s unbelievably useful for us,” he says. Let’s say he’s setting up a meeting with a client. He can search in the archive and find every instance where that client was discussed, and if he really wants context, he can watch the video. If engineers miss a meeting, they can skim the transcript and grasp the essentials of a decision or an argument. “In a distributed company, the biggest question is like, how do I get visibility? I can’t overhear hallway conversations,” Kanter says. “Well, now you can.”
We generate a fire hose of audio and video these days. Corporations use video for their official communiqués; meetings get Zoomed. The breakout social apps are TikTok and Clubhouse. Podcasts eat up gigabytes on our phones. When kids today want to learn something, they don’t go to Wikipedia; they search YouTube.
But the moving image and spoken word are hard to parse and skim, which has limited their usefulness. This is where AI auto-transcription comes in: After years of being kinda lousy, speech-to-text apps are now often extremely good, and the market is crowded with them—Otter.ai, Trint, and Zoom’s built-in wares.
The effects on how we work can be subtle but powerful. Judy Dang is a productivity consultant for small businesses who used to take notes while talking to clients, frantically trying to capture everything. Now she just records every conversation (with permission) and uses Otter to auto-transcribe it. It dramatically improved her ability to remember details—“I have a time capsule of where my clients are when they come to me”—and, more crucially, to pay attention when she’s with them. “I can listen to them carefully,” she says, “and ask really good questions.”
I spoke to Sam Liang, the CEO and founder of Otter, on Zoom, using an Otter plug-in that transcribed our talk in real time. It was spookily accurate. Liang says the experience of watching the words stream by lets people interact with meetings in new ways. They can highlight a snippet of text as a reminder for later; if they space out or have to step away to manage kids, they can scroll back.
“For a long, two-hour meeting—not everybody has to pay attention all the time,” Liang tells me. Why force them to? In the future, he expects there’ll be even better tools for auto-summarizing these long transcripts, making them even more useful.
There are obvious privacy dangers here. Having all your words transcribed could be “a stifling experience as an employee,” says John Davisson, senior counsel at the Electronic Privacy Information Center. The prospect of having idle banter immortalized could make everyone less talkative. Worse, it allows for what Brookings Institute scholar John Villasenor calls “retrospective surveillance” when it comes to authoritarian governments. If your bosses want to fire you, they can rifle through their mammoth files to find something that seems incriminating. Plus, some transcription AIs appear (surprise, surprise) to have racist biases. In a recent study she coauthored, Stanford engineering PhD Allison Koenecke found that AIs from Apple, IBM, Google, Amazon, and Microsoft were worse at transcribing Black speakers than white ones.
It’s worth fixing those biases, though. A world where this tech works correctly would be a huge boon for accessibility. Running automated live captions during a video call helps anyone with hearing loss or just a lousy data connection. Intriguingly, transcription even appears to make social media posts more viral: André Bastié, the CEO of speech-to-text firm Happy Scribe, told me that clients use his app to auto-caption videos they post to Facebook or Instagram. The text attracts viewers scrolling with their audio off. (TikTok users, of course, often manually caption their videos for precisely this reason.)
Consider it the irony of our digital age. Video and audio are the hottest media—but to really thrive, they’ll rely on text, one of our oldest media. Even in a world of cameras and microphones, the power of the written word isn’t going away.