Google announced on Wednesday (28) AlphaGenome, a new artificial intelligence tool aimed at analyzing the human genome, focusing on understanding how extensive regions of DNA influence gene regulation within cells. The initiative seeks to go beyond simply reading the genetic code, investigating the mechanisms that control when, where, and how genes are activated or silenced in the organism, explains Folha de São Paulo.
During the presentation of AlphaGenome in the journal Nature, Google DeepMind’s vice president of research, Pushmeet Kohli, emphasized that the complete sequencing of the human genome, completed in 2003, represented only the first step. “Decoding the entire human genome in 2003 gave us the book of life, but reading it remains a challenge,” he said. According to him, although the genetic text is available — consisting of about 3 billion nucleotide pairs — understanding its “grammar” remains one of the great frontiers of science. “We have the text, but understanding the grammar and how it governs life constitutes the next great frontier of research,” said Kohli.
Most of human DNA does not directly encode proteins. Only about 2% of the sequences perform this essential function for the functioning of living organisms. The remaining 98% play a complex regulatory role, acting as a conductor that coordinates, protects, and adjusts gene expression in each cell. It is precisely in these regions that numerous variants associated with diseases are concentrated, and it is this territory that AlphaGenome proposes to explore. The new model complements other tools developed by Google’s artificial intelligence laboratory, such as AlphaMissense, focused on the analysis of coding DNA sequences, AlphaProteo, dedicated to protein design, and AlphaFold, responsible for predicting protein structures and winner of the Nobel Prize in Chemistry in 2024. In the case of AlphaGenome, the innovation lies in its ability to analyze long DNA sequences and predict how each pair of nucleotides influences different biological processes within the cell.
Based on deep learning techniques, the system was trained with data from large public consortia that performed experimental measurements on hundreds of types of human and mouse cells and tissues. This foundation allowed the model to learn complex patterns of gene regulation and apply this knowledge in an integrated way.
Before AlphaGenome, there were already models capable of studying regulatory regions of DNA, but they faced technical limitations. It was necessary to choose between analyzing long sequences with lower precision or focusing on smaller segments with more detailed resolution. According to Žiga Avsec, one of the co-authors of the project, fully understanding the regulatory environment of a gene requires the analysis of sequences that can reach up to one million nucleotide pairs. The new tool seeks to overcome this dilemma by combining length and precision.
Another distinguishing feature of AlphaGenome is its ability to simultaneously model the influence of DNA on 11 distinct biological processes. Until now, researchers needed to resort to different models to obtain this type of integrated analysis. For Natasha Latysheva, also a co-author of the study published in Nature, the tool represents a significant advance. “It can accelerate our understanding of the genome by helping to map the location of functional elements and determine their roles at the molecular level,” she said.
Kohli also highlighted the collaborative nature of the project. “We hope that researchers will enrich it with more data and modalities,” he said. According to Google, AlphaGenome has already been tested by approximately 3,000 scientists from 160 countries and is now available in open source for use in non-commercial research. External experts have evaluated the model positively, but with caution. Ben Lehner, head of generative and synthetic genomics at the Wellcome Sanger Institute in Cambridge, stated that the tool is very effective, although it still has limitations. “Accurately identifying the differences in our genomes that make us more or less susceptible to developing thousands of diseases is a crucial step towards better treatments,” he observed. At the same time, he noted that “AI models are only as good as the data used to train them,” emphasizing that many available datasets are still small and poorly standardized.
A similar assessment was made by Robert Goldstone, head of genomics at the Francis Crick Institute. For him, AlphaGenome should not be seen as a definitive answer to all questions in biology, since gene expression also depends on complex environmental factors. Nevertheless, he considered the tool essential for the advancement of the field. According to Goldstone, it will allow scientists to “study and simulate in a programmatic way the genetic basis of complex diseases,” expanding the possibilities for research and understanding of the functioning of the human genome.
Source: brasil247.com
