18.3 C
New York
Friday, May 17, 2024

Can a Database of Animal Viruses Help Predict the Next Pandemic?

In 2016, Michael Letko moved from New York City to Hamilton, Montana—a town of 4,800 nestled between Blodgett Canyon and Highway 93 at the southern end of the Bitterroot Valley.

During the state’s earliest days, a strange, deadly disease emerged from these dark lodgepole pine forests, striking down settlers with a black rash and raging infection. Scientists eventually named it Rocky Mountain spotted fever, and they named the facility they built to study the bacteria responsible for the fever (and the ticks that carry it) the Rocky Mountain Laboratory. In 1937, the lab became part of the National Institutes of Health, evolving into a national vaccine factory when the US entered World War II. This is where, in 2008, the NIH opened its first biosafety level 4 laboratory—the highest level there is for biological containment facilities. Today, more than 400 scientists like Letko work inside the red-roofed complex, conducting research on some of the nastiest pathogens known to humans.

Letko arrived in the lab of virologist Vincent Munster, eager to work on some of these germs. Munster studies virus ecology—how they live in different hosts and sometimes jump between species. He often sends research fellows to places like the Democratic Republic of Congo, Trinidad and Tobago, and Jordan to collect blood samples or fecal swabs from bats and camels, which his team then studies back in the lab’s maximum containment facilities. Bats are of particular interest because they’ve evolved a unique ability to coexist with viruses, including ones particularly likely to transfer to humans. SARS, MERS, the Marburg virus, Nipah, and perhaps even Ebola all started in bats.


Letko wasn’t really that kind of scientist. He’d spent his PhD a block off of Central Park in Manhattan, studying a protein produced by HIV and modeling its molecular structure to understand how it shuts down the host’s immune response. He had gotten really good at figuring out the shapes of viral proteins and how those molecular grooves and pockets grant access to cells or fend off attacks. But it wasn’t until 2017, when he met a Belgian student visiting Munster’s lab, that he had an idea for what to do with this talent.

The Belgian student had spent his whole PhD on a virus discovery project, sequencing bat samples like the ones Munster’s team brings back from the field. Many of the genomes he’d put together came from coronaviruses, one of the most abundant families in the viral kingdom. After the SARS outbreak of 2003, scientists realized that maybe they should pay more attention to them, given their ability to jump between species. This new urgency—combined with the arrival of new sequencing technologies catalyzed by the Human Genome Project—kicked off a viral discovery boom. Over the next decade and a half, scientists uncovered a massive trove of coronaviruses circulating in wild animal populations around the world.

Search “coronavirus” on GenBank, a public repository for genomes, and today you’ll find more than 35,000 sequences. Alpaca coronaviruses. Hedgehog coronaviruses. Beluga whale coronaviruses. And, of course, lots and lots of bat coronaviruses.

But very few people have carried out the downstream laboratory work—figuring out how these coronaviruses behave, how they get into the bodies of their hosts, and how likely it is that they could make the hop to humans. “I realized just how much data there is and how little we know about all of it,” says Letko.

He was particularly haunted by a coronavirus called HKU4-CoV. A sequence of its spike protein was published in February 2007 by a team of Chinese researchers who’d discovered it in the blood of bats they’d collected from caves deep in Guangdong province. It was one of hundreds of sequences published during the sequencing boom to no fanfare. Then, five years later, MERS broke out in Saudi Arabia. When scientists sequenced the new MERS virus, they noticed that the protein it used to attack human cells looks almost exactly like the one HKU4-CoV uses. When other researchers looking at relatives of the MERS virus tested the bat virus, they realized that it, too, was capable of infiltrating human cells through the same receptor. But back then, no one had made the link between HKU4-CoV’s protein sequence and its ability to infect humans. “If that data had been available at the time of the MERS outbreak, scientists would have had a head start at figuring out how it’s transmitted and what drugs might work against it,” says Letko.

Letko wanted to make that kind of data available. So he decided to build a platform that could experimentally test the world’s collection of coronavirus genomes, to see which ones had the highest likelihood of infecting human cells.

At any given time, there are tens of thousands of unique coronaviruses being carried by animals. But only a handful have ever crossed into humans. If you could understand what makes those viruses different, Letko hypothesized, you could create a prediction engine for forecasting which ones have the potential to emerge in human populations. “If you want to figure out where the next pandemic is going to come from,” he says, “the coronaviruses are a good place to start, because they cross the species barrier, they can infect people, and they’re everywhere.”

So why had no one else tried this before? For one thing, isolating viruses from field samples is tricky. Cells in culture don’t look much like cells in wild animals. They often fail to offer viruses collected in nature what they need to grow, which means scientists can’t keep them alive long enough to run their experiments. And reverse-engineering a whole virus from its sequence is expensive. Coronaviruses have the biggest genomes of all RNA viruses. Making just one would cost around $15,000.

Coronaviruses are so named because of the array of spike proteins on their surface that, under magnification, look like a crown. Those spike proteins are what the virus uses to gain entry to host cells, where it can replicate and spread. Most coronaviruses have nearly identical spike proteins, save for the very tip of what’s called “the receptor binding domain,” or RBD. Subtle differences in the shape of this part of the spike dictate which kinds of cells the virus can infect. So that’s the part Letko zoomed in on.

Throughout 2018, he worked to build a system of synthetic virus particles engineered to express a generic version of the coronavirus spike protein in which he could swap out the RBDs like Legos. These synthetic particles looked like viruses. And they could get into cells like viruses. But they were missing the key parts they needed to replicate. Instead, when they got into a cell, they would trigger a chemical reaction causing it to fluoresce yellow-green. When Letko let loose these synthetic virus bits on hamster cells he’d made to express different human receptors, he could easily test which RBD sequences could access each receptor: He could tell because they were glowing. It took a whole year for him to develop the concept and prove it could work.

In January 2019, he started to put it into action. Starting with all the published sequences from a sub-branch of the coronavirus family tree called beta-coronaviruses, he identified their RBD regions and began dividing them into sub-groups. Although they are genetically unique from each other, many of these viruses share the same RBDs. (There are only about 30 variants in all 200 known strains of beta-coronaviruses.) Then he copied and pasted those sequences into his synthetic virus particles, exposed them to human receptor-expressing cell lines, and started to rank their infection potential.

In addition to known beta-coronaviruses, like SARS, he investigated uncharacterized strains, mostly collected from Chinese horseshoe bats. It took time to test and validate his results, but as the months passed, Letko was able to refine the system. By the end of 2019, he could grab a sequence off of Genbank, and a week later produce experimental data about whether or not a virus could infect human cells—and discern which cells, and how well the virus could infiltrate them.

In December, he began typing up the results of his last two years of labor. He was getting ready to submit them to a journal for peer review when reports of a mysterious pneumonia started dribbling out of Wuhan, China. In early January, Chinese health authorities announced they had isolated the pathogen behind the mysterious outbreak. It was a novel coronavirus, never before seen in humans.

“That changed everything,” says Letko. Researchers around the world pounced on the data—to try to figure out where the virus had come from and gather clues about how it was attacking human cells. “All of a sudden we had this outbreak and this perfect opportunity to demonstrate the power of the approach. We dropped everything to try to identify the receptor,” he says.

On January 10, Chinese scientists made the virus’ genome public. It was late on a Friday. Letko downloaded the genome and located the RBD sequence, the stretch of code that carries instructions for the key receptor binding tip. He entered it into an Excel spreadsheet that automatically added other fragments of letters to make it work with his system. Thirty minutes later he had a sequence he could test.

Then came the hardest part: waiting. Since DNA synthesis companies don’t take orders over the weekend, he couldn’t submit the sequence until Monday morning. But by Thursday, the DNA fragment had been mailed to Munster’s lab in Hamilton and Letko began cloning the code into his viral particles. Soon, they were expressing spike proteins with a little piece of the novel coronavirus on the end. These virus lookalikes, Letko discovered, could infect human cells using the same receptor that SARS uses, ACE2. This receptor is prevalent in lung cells, notable because the new coronavirus causes a cough in mild cases and severe respiratory distress in the worst.

Time elapsed from the release of the sequence to Letko identifying its attack site: seven days.

“It’s unbelievably fast, almost too fast to imagine,” says Kristian G. Andersen, an infectious disease geneticist at Scripps Research Institute, who was not involved in the work. His lab uses DNA data to trace the evolution of outbreaks including Ebola, Zika, and now, the novel coronavirus officially named Sars-CoV-2.

Such speed could prove pivotal during the current outbreak, says Andersen. With vaccines and new therapeutics still months away from being ready for human testing, the only hope of combatting—rather than simply containing—the virus is repurposing pre-existing drugs. And the trick to picking the right one is to know which might block the virus’ path to entry. “A lot of that comes down to how it binds to human cells,” says Andersen. “Studies like this, which show the binding experimentally, are critical.”

Other groups, working with just sequence data in that first week following the genome’s publication, used computer modeling to guess at what the spike protein looked like and which receptors it might use. They too, posited that it would use ACE2. But in their simulations, the virus appeared to not be able to attach to that site as strongly as SARS does. In a pre-print posted online January 21, a group from City University of Hong Kong and Hong Kong Polytechnic University wrote that “the infectivity and pathogenicity of this new virus should be much lower than the human SARS virus.” Within days, as the number of new infections in China exploded beyond those of the SARS epidemic, the limitations of such computational approaches became clear.

In a sign of the breakneck pace at which scientific research is being done during this outbreak, Letko and Munster posted their pre-print (which has since been accepted for publication) the following day. They didn’t have to wait long for validation. The next day, January 23, a research group from Wuhan’s Institute of Virology reported they had tested live samples of the new virus against human cell lines expressing ACE2 proteins and those without ACE2. It could only infect the ones that carried the receptor.

Currently, the only ACE inhibitors already approved by the FDA only work to block a different receptor, not ACE2. Screening for chemicals that might prevent the new coronavirus from entering ACE2 has already begun. But Andersen says any new drugs targeted to ACE2 likely won’t be developed in time to quell the current outbreak.


In the meantime, clinicians in China are testing an experimental antiviral called remdesivir, which had previously been used in 2018 to try to bring the Democratic Republic of Congo’s Ebola outbreak under control. It works by blocking an enzyme viruses use to self-replicate. Genomic analyses suggest coronaviruses have a similar enough enzyme that the drug might be effective against the current outbreak. Last week, scientists in China published a report showing that remdesivir could in fact block the virus. And on Thursday, the New York Times reported that Chinese health authorities have begun enrolling patients in two clinical trials of the drug that are expected to conclude as soon as April.

So while he hopes his contribution gives drugmakers and public health authorities the clues they need to contain this outbreak, Letko is already thinking about the next one. His survey of beta-coronaviruses turned up a number of strains that currently reside in bats but are capable of infecting humans. He wants to learn more about them so that data will be available next time a novel disease suddenly appears. “The ultimate goal is to predict spillover events. And you can only do that if you know which viruses circulating right now in animals are capable of infecting people,” says Letko. “If we had these types of tools, then we could see the looming threats much sooner.”

Since December, Sars-CoV-2 has infected nearly 45,000 people globally, and claimed the lives of 1,114, according to a real-time outbreak dashboard maintained by researchers at Johns Hopkins.

In the next few months Letko will be leaving Hamilton to start his own lab at Washington State University. There, he plans to expand his project to study the other families of coronaviruses, and the proteins they use not only to enter cells, but to evade immune systems and spread between people. Eventually, he hopes his lab will be one of many across the world using the system he built to characterize coronaviruses, creating a database of information about protein interactions that scientists can use to quickly flag new viruses that might have pandemic potential.

“For all the people collecting and generating all these sequences, we need just as many people characterizing them,” says Letko. “It’s going to take a really big effort. But I think it will be worth it.”

Related Articles

Latest Articles