10/15/2025
MetaGraph an open source new search engine can quickly sift through the staggering volumes of biological data (like DNA and RNA) housed in public repositories. It's dubbed "Google for DNA.
Published in the journal Nature in October 2025, the tool addresses the challenge of making the massive volumes of raw sequencing data from public repositories searchable.
Public repositories such as the American Sequence Read Archive (SRA) and the European Nucleotide Archive (ENA) contain petabytes of genomic data.
How MetaGraph works:
MetaGraph revolutionizes this process by creating a compressed, indexed representation of biological data. Its key features include:
• Graph-based compression: The tool organizes the raw sequencing data into complex mathematical graphs that link together overlapping DNA fragments. This process dramatically compresses the data, reducing the storage needed by a factor of 300.
• Full-text search: Much like a standard web search engine, MetaGraph allows scientists to search the indexed sequence data directly. Researchers can input a DNA, RNA, or protein sequence and quickly find where it appears across millions of public datasets.
• Scalability: The system is designed to be highly scalable. As the amount of biological data grows, the tool requires minimal additional computing power, making it a sustainable solution for future research.
Applications and benefits
MetaGraph has significant implications for accelerating biomedical research:
• Pathogen research: Scientists can quickly scan biological repositories to track the emergence and spread of pathogens, such as tracking variants of SARS-CoV-2.
• Antibiotic resistance: Researchers can efficiently identify and study antibiotic resistance genes in bacteria across various environments.
• Disease research: The tool can be used to identify genetic indicators of diseases and accelerate research into rare genetic conditions.
• Biodiscovery: As demonstrated by a related tool, this technology can help uncover naturally occurring variants of enzymes, such as those that can degrade plastic.
With tools like MetaGraph scientists can make potentially explosive new discoveries, understand etiology of diseases and develop effective treatments or possible cures using gene therapy. It will be an exciting time to watch science!
Now biology has MetaGraph. Detailed today in Nature, the search engine can quickly sift through the staggering volumes of biological data housed in public repositories. “It’s a huge achievement,” says Rayan Chikhi, a biocomputing researcher at the Pasteur Institute in Paris.