

Evo composes realistic bacteriophage genomic sequences. (A) Key considerations highlight- ing the complexity of genome-scale design. (B) Generative genome language models have the potential to access novel phage genome design space when trained on a subset of observed natural evolution. (C) We benchmarked Evo 1 and Evo 2 on a broad zero-shot prompting task for phage genome design. (D) Generated sequences consistently classified as viral by geNomad (left), with high average virus classification scores across prompts (right). D, Duplodnaviria; M, Monodnaviria; R, Riboviria. (E) Generated sequences show low query cover and sequence identity against natural sequences in nucleotide BLAST searches, indicating high novelty. (F) Generated sequences contain predicted phage-like architectures, with regions that match natu- ral sequences (blue) and novel regions without nucleotide BLAST hits (gray). (G) Predicted coding densities of generated sequences are high, similar to natural sequences and unlike scrambled natural sequences. (H) ESMFold-predicted protein structures from generated sequences have mean predicted local distance differ- ence test (pLDDT) scores similar to natural proteins, substantially higher than scrambled natural sequence controls. (I) Generated proteins align to proteins in the OpenGenome and PHROGs databases with gener- ally low sequence identities, indicating high novelty. (J) Functional annotations of generated proteins closely match those of natural phages when queried against the PHROGs database. — biorxiv.org
Editor’s note: As we expand outward from Earth to other worlds we are almost certainly going to encounter things we did not expect to find – things that are unlikely or impossible on Earth. Life on other worlds may arise from a totally different set of chemical pathways than was the case on Earth. Or it may follow a very similar path. Or both. How do we estimate what could exist such that we are better prepared to search for the unexpected?
The genetic code of all life on our home world (with a few minor exceptions) is based on a genomic code consisting of 4 standard nucleotides. It is possible that Earth life may have once used a different assortment of letters in its instructional alphabet but the evidence suggests that the current code has been in use for quite some time. But the genomics of life on other worlds might be different and use a different genetic alphabet.
As we tinker with earthly genomics for commercial and health reasons we are discovering novel ways to tweak the standard genomic model to alter the outcome of a genetic sequence. This study using the Artificially Expanded Genetic Information Systems (AEGIS) shows that the pairing of non-standard nucleotides is at least possible. Whether the new sequences will work is another matter. But it does give us insight into how genetic sequences work and, by extension, how they might work elsewhere.
Out of about 300 phage genomes the scientists synthesized and tested in dishes full of E. coli, 16 were functional.
The experiment itself wasn’t dangerous, and designing “life” is a far heavier lift than the simple phage — a bacteria-infecting virus — that they created. Scientists used “Evo,” a generative AI model trained on the genomes of living things. Similar to how other AI large language models are trained on a massive corpus of text, the most advanced version of Evo ingested about 9 trillion letters of DNA from an atlas spanning all domains of life.
Many important biological functions arise not from single genes, but from complex interactions encoded by entire genomes.
Genome language models have emerged as a promising strategy for designing biological systems, but their ability to generate functional sequences at the scale of whole genomes has remained untested.
Here, we report the first generative design of viable bacteriophage genomes. We leveraged frontier genome language models, Evo 1 and Evo 2, to generate whole-genome sequences with realistic genetic architectures and desirable host tropism, using the lytic phage ΦX174 as our design template.
Experimental testing of AI-generated genomes yielded 16 viable phages with substantial evolutionary novelty. Cryo-electron microscopy revealed that one of the generated phages utilizes an evolutionarily distant DNA packaging protein within its capsid.
Multiple phages demonstrate higher fitness than ΦX174 in growth competitions and in their lysis kinetics. A cocktail of the generated phages rapidly overcomes ΦX174-resistance in three E. coli strains, demonstrating the potential utility of our approach for designing phage therapies against rapidly evolving bacterial pathogens.
This work provides a blueprint for the design of diverse synthetic bacteriophages and, more broadly, lays a foundation for the generative design of useful living systems at the genome scale.
Generative design of novel bacteriophages with genome language models, biorxiv.org (open access)
Astrobiology, genomics, SynBio, nanotechnology,






