
08/20/2025
🚀 Expanding access to data to accelerate AI drug discovery.
A new article in Future Medicine AI looks at the critical role that open datasets play in driving discoveries – and how academic research groups, government agencies, and pharma and biotech companies are coming together to build these datasets and enable more access.
🔹 These efforts include:
▪️ The AI Structural Biology Network which allows researchers to tap into pharma’s proprietary protein structure data to train AI models under a “federated learning” approach that preserves confidentiality and prevents proprietary information from being revealed.
▪️ Recursion’s six open datasets from its 65+ petabytes of proprietary data – the largest of these, RxRx3, is a more than 100 Tb dataset spanning over 17,000 genes (CRISPR knockouts of most of the human genome) and 2.2 million images of HUVEC cells.
▪️ The Billion Cells Project from the Chan Zuckerberg Initiative and others – which involves building a single-cell dataset of 1 billion cells to train AI models.
▪️ The OpenBind consortium – an £8 million investment from the UK government to generate more than 500,000 protein-ligand complex structures along with their affinity measurements. This new dataset would represent a 20-fold increase over all public data produced in the last half-century.
The article also looks at how Recursion and others are leveraging open source data to identify biomarkers (such as specific genetic mutations) that can inform the patient populations who will be most likely to benefit from a new drug in clinical trials.
👉 Read more: https://www.fmai-hub.com/how-open-data-is-fueling-the-ai-drug-discovery-era-2/