DNA is our identity: it contains all the instructions needed to shape us and make us function. In recent decades, researchers have compiled vast databases of genetic sequences. However, interpreting them – understanding what they do, what happens when they change, or how to create new sequences with a specific function – has remained a formidable challenge. This is where Evo 2 comes in. Evo 2 is a foundation model of artificial intelligence specialised in analysing and generating genomic sequences. A foundation model is a type of large-scale machine learning model, trained on a massive quantity of general and unspecific data, which can then be adapted to a wide variety of tasks. Evo 2 was trained on over 9 trillion bases from more than 128,000 complete genomes, covering bacteria, archaea, plants, animals and other organisms (excluding viruses, for safety reasons). So, what can Evo 2 do? The practical applications of Evo 2 are broad and promising. One of the most delicate fields is the diagnosis of genetic mutations. Evo 2 has demonstrated an accuracy rate of over 90% in predicting whether a mutation in key genes is likely to cause disease. This could improve diagnostics, prevention, personalised treatment — and even enable the generation of new genomes. The model is capable of generating functional DNA sequences: for example, an entire mitochondrial genome (the DNA inside mitochondria) with a coherent structure, or an entirely new bacterial genome that retains original biological characteristics. Evo 2 can also be used to design genetic regulators that control gene expression — such as sequences that activate a gene only in specific tissues (like the liver or brain). This capacity is fundamental for precision medicine and for safe gene therapies. Its potential extends to synthetic biology: designing enzymes for pharmaceutical production, creating sequences for more resilient crops, or engineering microorganisms to produce biofuels. Evo 2 is not the only AI tool applied to biology, but what sets it apart is the scale of the data it has “digested”. While many models are trained on specific genes or organisms, Evo 2 was trained on a vast portion of Earth’s biodiversity. It’s also one of the first open-source models in this domain — offering the scientific community a powerful and accessible starting point for future research. Naturally, with great power comes great responsibility. The ability to generate new genetic sequences raises ethical concerns, such as the risk of inadvertently creating harmful sequences or manipulating human genes inappropriately. For this reason, the developers excluded viruses from the training dataset — better not to mess with potentially lethal material. Evo 2 marks a major leap forward in the use of artificial intelligence in biology. It’s a powerful demonstration of how AI can not only imitate human language but also understand and generate the language of life. In the near future, models like Evo 2 could help us design new therapies, create organisms that benefit the environment, or unlock deeper insights into the very origins of life itself.
