Jacob West-Roberts, a computational biologist at the University of California (UC) Berkeley, was scouring microbial DNA sequences for giant genes and discovered what he thought was a whopper: a gene encoding a protein made up of 1,800 amino acids. The average protein has a few hundred.
“Wait till you see this,” responded his PhD adviser, UC Berkeley environmental microbiologist Jillian Banfield, and pointed out proteins longer than 30,000 amino acids, already known from sequencing data.
What’s next for AlphaFold and the AI protein-folding revolution
Their team has now found dozens of even bigger proteins, including what might be the longest ever: an 85,000-amino-acid behemoth. The mega-molecules could help an enigmatic group of environmental microorganisms to feed on other microbial cells, the researchers propose. They describe their findings in a preprint posted on bioRxiv1 last month.
“It’s a good study,” says Brian Hedlund, a microbiologist at the University of Nevada, Las Vegas. “They essentially doubled the size of the largest known predicted proteins from 40,000 to 85,000 amino acids, which are all insane.”
A roughly 35,000-amino-acid protein found in muscles, named Titin, has held the title of world’s biggest protein for decades. But the Guinness World Records might not need to be updated — yet. West-Roberts and his colleagues inferred the existence of their giant proteins from gene sequences and predicted parts of the molecules’ shapes with the artificial intelligence (AI) tool AlphaFold. It’s possible that, after being made by cells, the giant proteins get snipped into bits that have different functions. “You just don’t know at this point,” says Hedlund.
Hidden giants
The most recent comprehensive survey of giant proteins was published2 in 2008. To get a more up to date picture, West-Roberts looked for genes encoding giant proteins in public databases and in new genome data from environmental sources, including a seasonal pond near Banfield’s home in northern California and a Colorado swamp.
Giant proteins were especially common in Omnitrophota, a bacterial phylum first discovered in Yellowstone National Park in the northern United States in the 1990s and now commonly found in environmental samples. In total, the researchers found 46 Omnitrophota genes encoding proteins longer than 30,000 amino acids, including the 85,804-amino-acid-colossus, which turned up in waste water. “They were just absolutely everywhere,” says West-Roberts.
Despite the microbes’ ubiquity, researchers know little about Omnitrophota, beyond what can be gleaned from sequences. In large part, this is because scientists have had little luck growing examples in the lab. But last year, a team led by microbiologist Jens Harder at the Max Planck Institute for Marine Microbiology in Bremen, Germany, reported a breakthrough3. In wastewater samples that had been incubating in the lab in slow-growing cultures since the 1990s, they found cells — small even by microbial standards — that contained Omnitrophota genetic material. The team developed methods to enrich samples with these cells, and then analysed them.
How AlphaFold can realize AI’s full potential in structural biology
Genome sequencing identified a gene predicted to encode a protein nearly 40,000 amino acids long, and matching protein fragments turned up in a biochemical assay. Harder’s team might even have caught a glimpse of the giant in electron micrographs of the Omnitrophota cells, which seemed to show them attacking and devouring other bacteria and microbes called archaea.
Protein predators
To determine whether the giant proteins West-Roberts had discovered are involved in similar predation, he and his team attempted to study the sequences using computational methods designed for determiningwhat much smaller proteins do. “These tools are not made for giant proteins. They just don’t know what to do with them,” he says.
Despite those challenges, the team managed to learn a bit about the titans. Many of the proteins seem to wind their way through the cellular membrane a dozen or more times. A number of the giant proteins include sequences that resemble those of enzymes that attach to and break up sugars and other biomolecules found on cell walls. These could be used to bind and digest the cell walls of prey, the researchers suggest.
To try to find out what some of the giant proteins look like, West-Roberts and his colleagues plugged portions of their sequences into AlphaFold, Google DeepMind’s revolutionary protein-structure-prediction tool. But the network isn’t equipped for proteins larger than a couple of thousand amino acids, so the researchers split their mega-proteins into overlapping 1,000-amino-acid stretches. “If you make it too long, AlphaFold will just kind of give up at some point and give you a ball of spaghetti,” says West-Roberts.
The AI predictions of the proteins’ structures revealed more cell-wall-binding regions, but also a big surprise: a very long tube-like apparatus unlike anything researchers have ever seen. This structure could be involved in delivering molecules to prey, or could attach to other cells before the host microbe devours them.
Martin Steinegger, a computational biologist at Seoul National University, is impressed with how the researchers made sense of the mega-molecules using AlphaFold and other cutting-edge tools. “Being able to annotate such giant molecular machinery beyond the capabilities of traditional methods marks a substantial leap forward,” he says.
The fact that giant proteins are so common in Omnitrophota is especially surprising because of the microbes’ tiny physical size, says Oleg Reva, a bioinformatician at the University of Pretoria in South Africa. The study shows that giant proteins are “sophisticated weapons wielded by the diminutive microbial hunters in their pursuit of bacterial and archaeal prey”, he adds.
Real-world existence
The discovery of genes encoding proteins as longer than 85,000 amino acids does not mean that the molecules exist in this state in cells, researchers say. One possibility is that the protein is chopped into smaller pieces after it’s made, and these portions take on a range of functions in cells. That could explain why Harder’s team was able to find only pieces of its giant protein. “Currently I don’t see experimental evidence that these large proteins exist,” Harder says.
Many of the giant proteins contain protein-breaking enzymes called peptidases, which could chop the Goliaths down into Davids, West-Roberts and his team say. Firm answers might require researchers to grow Omnitrophota cells, something that only Harder’s team has managed to do so far. “All the others, they’re just imaginary,” says Harder. “There’s a lot of mystery to solve.”
West-Roberts will get his PhD soon, and he’s got other projects to tie up, including a study of giant proteins in archaea. He would love to see others study the giant proteins he has found, and he dreams of someone determining what they really look like using experimental techniques such as cryo-electron microscopy or a related method that can map proteins in cells. “I just really want to see it and get a ground truth of what it really is,” he says. “It would be such a cool photo.”
Source link