AI in Pharma 2025: Hype, Reality, and the Road Ahead

Artificial intelligence (AI) has permeated the pharmaceutical industry with promises to transform everything from drug discovery to clinical trials. By 2025, a clearer picture has emerged of what AI is actually delivering in pharma – and what core challenges remain.
This comprehensive review takes a critical, look at how AI is used in pharma today: its assumptions and limitations, technical hurdles, diverse applications beyond headline-grabbing protein folding, practical resources for newcomers, key players and deals, the evolving model landscape, and the next frontiers on the horizon.
1. Core Assumptions, Proxy Targets, and Limitations of AI Models in Pharma
AI in pharma is often sold on efficiency gains – faster discovery, better success rates – but these claims rest on critical assumptions that warrant scrutiny. One foundational assumption is that the patterns learned from existing data will generalize to future drug candidates and patient populations. In practice, distributional shifts between training data and real-world scenarios are pervasive. Models trained on historical compound libraries or clinical datasets may falter when faced with novel chemotypes or more diverse patient populations.
A recent analysis noted that drug discovery tasks often suffer from overconfident predictions under distribution shift, highlighting the need for benchmarks specifically targeting this concern (neurips.ccopenreview.net). In other words, an AI model might perform impressively on known chemical series or cell-line assays, only to mispredict on a new series or in vivo context – a cautionary tale in over-relying on AI without accounting for where it might break.
Another implicit assumption is that the proxy targets optimized by AI correlate with ultimate clinical success. Drug discovery involves many proxies: binding affinity as a stand-in for efficacy, cell toxicity assays as surrogates for human safety, animal models for human biology. AI models are usually trained on these proxy labels. For example, a deep model might optimize a molecule’s predicted activity against an enzyme in vitro or its clearance in mouse liver microsomes. These are useful proxies but incomplete. As one commentary dryly observed, “it really doesn’t matter if we engineer and optimize a drug to be its best self if its target or mechanism is rendered moot in a clinical setting by the environment, comorbidities, or compensatory mechanisms”(nature.com).
In 2025, we still see AI-designed molecules often hitting known, well-validated targets – which reduces risk – but not yet breaking the mold with first-in-class mechanisms in the clinic (nature.com). The assumption that optimizing proxy metrics (binding, etc.) will yield a successful drug remains just that – an assumption – until translational biology catches up. AI can propose a molecule with perfect predicted properties on paper, yet biology can throw a curveball (off-target effects, metabolic quirks, emergent toxicity) that wasn’t captured in the training data.
Data biases present further limitations. Pharma datasets can be biased and unrepresentative, leading AI models to learn spurious correlations. If a model’s training set under-represents certain chemical scaffolds or patient demographics, its predictions will be skewed.
As one integrative review noted, “AI models may be biased if the training data is not representative of the population or if data are limited. For example, many oncology trials predominantly enroll patients of European ancestry… Models trained on biased datasets can perform worse on previously unseen populations” (pmc.ncbi.nlm.nih.gov).
Indeed, an AI might confidently “discover” a drug that works great – but only in cell lines or genotypes that were over-represented in the data. The push toward diversity in training data (e.g. including multi-ethnic genomic data, varied chemical libraries) reflects recognition of this risk. Even within preclinical research, distribution bias is a concern: a benchmark study called DrugOOD curated out-of-distribution test sets for drug property prediction, revealing that many models struggle when asked to predict beyond the chemistry space they’ve seen (ojs.aaai.org, openreview.net).
Crucially, AI’s effectiveness depends on what biological constraints are (or aren’t) captured in data. Machine learning excels at interpolating within the domain of its training set, but biology often surprises us with emergent behavior. Complex issues like human metabolism, polypharmacology (one drug hitting multiple targets), and compensatory pathways in disease can confound AI predictions. A candid industry view is that “the initial wave of optimism... is being tempered by the realities of complex biological systems and the challenges of translating AI-driven discoveries into clinical successes.”
In other words, AI can design a molecule to hit a target, but if that target’s role in disease was based on an oversimplified model, the drug may still fail to show efficacy in humans. This is not so much a failing of AI as a reminder that correct assumptions about biology are as important as algorithmic prowess. AI often assumes ceteris paribus – e.g. “if I optimize this one property, all else stays equal” – but in drug discovery, improving one property (say potency) can adversely affect another (like solubility or toxicity). These intertwined effects impose a ceiling on purely data-driven approaches.
As one pharma CEO remarked, the problem is sometimes “not AI itself, but the approach… The underlying issue lies in the quality of the inputs and how AI models are used. Think of AI as the teamwork and coaching that enable players to reach their full potential” (labiotech.eu). In other words, garbage in, garbage out – AI is only as good as the data and human strategies guiding its use.
Finally, there’s the oft-cited “black box” phenomenon. Many advanced models (deep neural networks) lack interpretability, which can breed skepticism among drug researchers and regulators. If an AI suggests a molecule but cannot explain why – which atom pattern contributed to activity, or which patient feature drove a prediction – scientists may hesitate to trust it for critical decisions. Explainable AI is making inroads to address this (for instance, attention mechanisms highlighting which molecular substructures most influence a prediction), but it remains a limitation especially in regulated domains where justification is important.
In sum, by 2025 the pharma industry has developed a more nuanced (and sometimes sobering) understanding of AI’s limitations. Biased training data, distribution shifts, reliance on proxy endpoints, and the gap between prediction and complex in vivo reality are all on the radar.
These challenges do not spell doom for AI in pharma – rather, they underscore the need for careful validation and integration of AI into the broader scientific process. AI can and does accelerate parts of R&D (as we’ll explore), but success requires respecting the assumptions and ensuring they hold. In the next sections, we delve into how these principles manifest in specific technical challenges and applications.
2. Key Technical Challenges in Drug Discovery AI
Drug discovery spans diverse tasks – each with its own technical hurdles where AI is being applied. Here we dissect several core challenges: how to represent complex molecules for algorithms, how to identify targets and predict interactions, ways to generate new compounds (even using reinforcement learning), integration of multiple data modalities, predicting pharmacokinetics and toxicity, simulating clinical trials, and inferring causal relationships. Each challenge is an active research frontier in 2025, with notable progress as well as open problems.
2.1 Molecular Representation: SMILES, Graphs, and 3D Formats
The first challenge in applying AI to chemistry is telling the computer what a molecule looks like. Molecules are not naturally grid-like or sequential data, so choosing a representation format is pivotal. The workhorse of computational chemistry has long been the SMILES notation (Simplified Molecular Input Line Entry System) – a linear text string encoding a molecule’s structure. SMILES strings are convenient (one can treat them like sentences and use language models), but they come with quirks.
A small typo in a SMILES (e.g. misplacing a ring closure digit) can render it invalid or correspond to a chemically nonsensical structure (like a pentavalent carbon). Generative models often produced a large fraction of invalid SMILES in early attempts To address this, researchers introduced more robust string representations.
SELFIES (SELF-referencing Embedded Strings) emerged as a 100% robust alternative that guarantees any string produces a valid molecule (pmc.ncbi.nlm.nih.gov). SELFIES use a clever grammar that includes rules of chemical valency, so even a completely random SELFIES string will decode to a realistic molecule.
This has been a boon for generative modeling – no more wasted effort on gibberish molecules. A 2023 update on SELFIES (v2.1.1) demonstrated expanded support for a wider range of molecules and features (aromaticity, charged species, stereochemistry) while retaining its robust nature (pmc.ncbi.nlm.nih.gov). For modelers, SELFIES means you can let a generative model output free-form strings without special validity constraints and still get sensible chemistry out.
Nonetheless, strings are ultimately a 1D serialization of a 2D molecular graph. Many AI approaches now work directly with graph representations of molecules, where atoms are nodes and bonds are edges. Graph Neural Networks (GNNs) have become popular for predicting molecular properties and bioactivities because they natively handle the relational structure of chemistry. A GNN “message passing” algorithm can naturally compute over a molecular graph, aggregating information from neighboring atoms/bonds in layers to produce a learned representation (an embedding) of the molecule.
The advantage is that the model can learn chemistry directly (rings, functional groups, etc.) without the awkwardness of a linear notation. By 2025, GNNs are a staple in many pharma AI models – for example, to predict a compound’s soluble clearance or binding affinity for a protein receptor. Studies have cataloged various GNN architectures (Graph Convolutional Networks, Graph Attention networks, message-passing neural nets) applied to drug discovery, often outperforming classical fingerprints on benchmarks (pmc.ncbi.nlm.nih.gov).
One practical win: the latest ADMET prediction platforms like ADMETlab 2.0 moved from manual molecular descriptors to graph neural nets, significantly improving accuracy and speed in predicting pharmacokinetic properties.
But molecules aren’t just 2D graphs; their 3D shape and stereochemistry matter hugely, especially for protein binding. This brings in 3D conformer representations. Recent AI models increasingly incorporate 3D information – either by using distance matrices of atoms, point clouds, or specialized 3D equivariant neural networks. An exciting development is E(3)-equivariant networks (invariant to rotations and translations in 3D space), which can learn from 3D molecular geometries without being fooled by arbitrary orientation.
For instance, researchers have built models that take a candidate drug’s 3D conformer and the 3D structure of a protein pocket, and predict binding affinity or even generate new molecules that fit that pocket (more on diffusion models for docking later). The technical challenge is that obtaining a relevant 3D pose can be non-trivial (you might need a docking program or a force-field to propose one), and molecules are flexible (many possible conformations). Nonetheless, by 2025 we see hybrid approaches: a GNN operating on a 2D graph augmented with 3D coordinate features, or a transformer that processes a protein sequence and a ligand SMILES jointly.
In summary, representing molecules for AI is a solved problem in the trivial sense (plenty of options exist), but each representation has trade-offs. SMILES allow leveraging text modeling advances but require care with validity; SELFIES fix validity at the cost of a more abstract vocabulary; graphs preserve chemical connectivity elegantly but ignore conformational nuance; 3D models capture true shape but demand more data and computation. Often, multiple representations are used in tandem to exploit their complementary strengths.
The representation chosen must align with the task – for fast virtual screening of millions of compounds, a 2D graph model might suffice; for designing a novel molecular shape to snugly fit a protein site, a 3D-aware generative model might be preferred. The good news is that open libraries make these representations readily accessible: e.g. RDKit (open-source cheminformatics toolkit) can generate SMILES or 3D conformers; DeepChem and PyTorch Geometric provide convenient loaders for molecular graphs (deepchem.readthedocs.io). This democratization of molecular featurization has lowered the barrier for new entrants to play with AI in chemistry.
2.2 Target “Deorphanization” and Drug–Target Interaction Prediction
Finding what to drug – and ensuring your drug hits it – is a critical early step. Target deorphanization refers to assigning functions or ligands to previously “orphan” targets (e.g. proteins of unknown function or no known drug binding). Conversely, predicting the interactions between a given drug molecule and possible protein targets (on- or off-targets) is essential for efficacy and safety. AI is tackling both sides: suggesting new targets for diseases, and mapping which compounds are likely to bind to which targets.
On the target identification front, AI can mine large omics datasets and literature to propose links between a gene/protein and a disease phenotype. For example, graph-based algorithms integrate protein–protein interaction networks, gene expression signatures, and genome-wide association studies (GWAS) to highlight novel disease drivers. By 2025, knowledge graphs – large networks of biomedical entities (genes, compounds, diseases) – are a common tool. Running link prediction on these graphs can suggest, say, that Protein X is likely involved in Disease Y because of various indirect connections (pathways, shared genetics, etc.). These methods assist deorphanization by pointing to possible roles for understudied proteins.
A prominent case was in the early pandemic: BenevolentAI’s knowledge graph analysis identified the kinase JAK1/2 inhibitor baricitinib as a potential COVID-19 therapy within 48 hours, by connecting viral entry mechanisms and inflammation pathways. That AI-driven hypothesis proved prescient – baricitinib later showed clinical benefit in COVID and was granted emergency use (benevolent.com). While that example is drug repurposing, the underlying principle is the same: AI sifting through massive biomedical data to illuminate hidden target biology.
For drug–target interaction (DTI) prediction, deep learning now supplements traditional methods like docking. Classical virtual screening docks each compound into a protein’s 3D structure to evaluate binding – accurate but relatively slow and requiring a known structure. AI offers ligand-based or sequence-based prediction when structures are unknown. One approach is to use two encoders: one for the molecule (e.g. a SMILES-based transformer or graph network) and one for the protein (e.g. an amino acid sequence model or a structural graph), then have the model learn to output a binding score or affinity.
These are essentially neural versions of QSAR and protein-ligand scoring functions combined. Recent models even treat DTI as a language translation task: “translating” from the language of proteins to the language of compounds, generating likely binding pairs.
A multitask deep learning framework called DeepDTAGen (reported in 2023) exemplifies this by jointly predicting drug–target binding and generating new molecules, rather than doing each in isolation (nature.com). Performance-wise, AI DTI models can approximate docking accuracy in many cases, and they’re extremely useful for scanning billions of “virtual” compounds against a target to shortlist candidates – something traditional docking can’t practically do at that scale.
However, challenges remain. If the model’s seen many examples of kinases binding ATP-like molecules, it may excel on similar cases but be clueless for an orphan receptor with no known ligands (the classic cold start problem). Efforts like the Therapeutic Data Commons and Drug–Target Binding databases are expanding the training data, and zero-shot learning techniques (leveraging protein features like AlphaFold structures) are improving generalization.
Also, explainability is important: researchers want to know which part of a molecule or protein drove the prediction. Attention-based models and attribution methods are helping with that, highlighting, for instance, that “this polar group on the drug is predicted to interact with that pocket residue on the protein.” Despite progress, AI-predicted interactions still need experimental confirmation – they’re hypotheses, not guarantees. But they dramatically enrich the pool of plausible targets and pairs to test.
A related task is predicting off-targets – i.e. polypharmacology. Companies are keen to use AI to flag if a new compound might inadvertently hit, say, the hERG potassium channel (linked to cardiac toxicity) or other anti-targets. Indeed, one can treat off-target prediction as a multi-label DTI problem, and some platforms do an in silico safety screen via AI before a molecule ever gets synthesized. The FDA has even invested in machine learning classifiers for key safety liabilities like drug-induced arrhythmia, using proxy targets like hERG blockade as endpoints (pubs.acs.org).
Such classifiers can identify potentially cardiotoxic compounds early by recognizing structural features associated with hERG bindingpubs.acs.org.
In summary, AI is increasingly the compass for navigating protein–ligand space. It can propose which new targets to pursue (especially when human intuition hits its limits combing through genomic data), and it can map which chemical matter is likely to modulate those targets. By accelerating target validation and hit finding, AI helps address the long-standing “needle in a haystack” problem of drug discovery. Still, savvy organizations treat these outputs as leads for augmented decision-making: an AI suggesting “Protein X” or “Compound Y” is just the start of a rigorous experimental campaign to verify the hypothesis.
The hope is that by 2030, we’ll look back at 2025 as the inflection point when AI became a routine part of target and interaction discovery – much like high-throughput screening did in the 2000s – albeit one tempered by careful validation.
2.3 Reinforcement Learning for Molecule Generation
Designing novel molecules with desired properties is a creative task – one well-suited to generative models. In addition to the generative approaches using VAEs or transformers (treated in Section 3), the use of reinforcement learning (RL) has gained traction for de novo molecular design. The idea is tempting: treat the molecular generator as an “agent” in a game, where each move is adding or modifying a chemical structure, and define a reward function for desirable drug properties. The agent then explores chemical space, guided by the reward signal, to find molecules that maximize the reward.
For example, one might reward high predicted activity against a target and low predicted toxicity. Standard generative models might struggle to juggle multiple objectives or hit a precise range (they often just mimic the training distribution). But an RL agent can, in theory, learn to balance these by trial and error feedback.
A simple early approach was to start with a model (say a recurrent neural network generating SMILES) pre-trained on known molecules, and then fine-tune it with RL where the reward is, e.g., predicted binding affinity from another model. This tends to push the generator to produce molecules that score better on that particular metric.
However, RL for molecules is non-trivial. A primary challenge is the sparse and scalar reward: when the agent proposes a complete molecule, you compute a single reward (good or bad), and that’s it. If the molecule was suboptimal, the agent gets no detailed feedback on why or which part of the molecule was problematic. As one study noted, “RL-based methods typically condense the evaluation of sampled compounds into a single scalar value, making it difficult for the generative agent to learn the optimal policy.” (pubmed.ncbi.nlm.nih.gov).
In essence, the agent might try many random modifications before it stumbles on what makes the reward go up, which can be inefficient and get stuck in local optima.
Researchers have developed several tricks to address this. One is using fragment-based or stepwise rewards – e.g., giving intermediate feedback as the molecule is constructed (rewarding partial structures that look promising). Another is using policy gradients with attention mechanisms so that the model can attend to which parts of the molecule contribute most to the reward (pubmed.ncbi.nlm.nih.gov).
A 2023 approach combined a transformer-based generator with an attention mechanism to evaluate how each atom/group contributes to binding affinity, and used that to inform the updates. This helped the model identify which regions of a candidate molecule are key for the target, effectively giving a more informative “policy shaping” than a single numeric score.
There have also been innovations like multi-objective reinforcement learning for chemistry. For instance, Microsoft’s MolDQN (Deep Q Network) approach allowed multiple rewards (activity, novelty, synthetic accessibility) combined via a weighted sum, and the agent could trade off between them by adjusting those weights. Another, batch RL methods attempt to leverage existing datasets as a guide so the agent doesn’t wander into completely unrealistic chemistries.
Despite these advances, RL in practice can be brittle – models sometimes exploit the reward in unintended ways (e.g., by generating weird molecules that trick the predictor model to give a high score – an AI “chemical hack”). To mitigate this, the field moved toward using more robust reward predictors (or even docking simulations as part of the loop) and constraining the chemical space (e.g., only allow adding fragments from a known library, to keep molecules drug-like). Open challenges include ensuring diversity (agents can collapse to a narrow set of high-reward solutions) and synthesizability.
So, how successful is RL for molecule design by 2025? It has certainly produced some impressive case studies: molecules generated de novo that were experimentally confirmed to hit their targets, some even advancing to preclinical stages. One example is Insilico Medicine’s generative pipeline (which included RL elements) reportedly yielding a preclinical candidate for fibrosis in under 18 months (nature.com).
Another is a joint academia-industry project where an RL agent designed novel DDR1 kinase inhibitors within a few weeks – compounds that were synthesized and found active at low nanomolar potency, something published in 2022. These anecdotes show potential.
However, broad success has been patchy; not every paper result translates to a real drug candidate. Many pharma teams use RL as one tool among many: for instance, to slightly optimize an already known lead compound (fine-tuning it to improve a property) rather than to generate an entirely unprecedented structure from scratch.
In summary, reinforcement learning adds a powerful paradigm to AI drug design: the ability to optimize via interaction with a model of the problem (be it a predictive model or even actual lab feedback). It shifts generation from pure pattern imitation to goal-directed exploration. The technical challenges – sparse rewards, credit assignment, maintaining chemical validity – are gradually being overcome with creative solutions (self-attention mechanisms, multi-step rewards, etc.).
By 2025, RL in drug discovery is not a silver bullet, but it’s increasingly a standard component in advanced platforms, used to fine-tune molecules towards better in silico profiles. As compute and algorithms improve, one can imagine an RL agent eventually “playing the game” of drug design at superhuman levels, much as AlphaGo did for Go – but in this game, the victory will be a real-world therapy making it to patients.
2.4 Multimodal and Multi-omics Data Integration
Biology is inherently multimodal – diseases manifest in genomic changes, transcriptomic profiles, proteomic alterations, imaging findings, clinical phenotypes, and more. A single data modality rarely captures the full picture. One of the grand challenges (and opportunities) for AI in pharma is to integrate these diverse data (“multi-omics” integration) to discover drug targets, biomarkers, and patient subgroups that would be invisible to any one data source in isolation.
Consider cancer drug discovery: we have DNA sequencing (to find mutations), RNA-seq (gene expression), proteomics (protein levels and modifications), metabolomics, pathology images of tumor biopsies, radiology scans, patient clinical histories – an embarrassment of riches. The challenge is fragmentation: these data often come from different experiments, different patients, and different time points. Early attempts at “multiomic” analysis often did little more than put results side by side.
In the past few years, AI (especially deep learning) has enabled truly integrative models that learn from multiple data types at once. For example, a neural network might take as input both an MRI image and a gene expression profile and output a predicted treatment response – essentially learning a joint representation of imaging+genomics.
However, achieving this integration requires overcoming noise and alignment issues. As one industry expert lamented, “current ‘multiomics’ typically involve piecemeal workflows... separate sample sets or different conditions, introducing noise and variability that make downstream data integration complex and unreliable” (drugtargetreview.com).
Differences in timing, sample handling, and batch effects can confound naive integration. A crucial insight is that for AI to effectively integrate modalities, the data must be as synchronized and high-quality as possible – an area where experimental innovation (single-cell multiomic assays, spatial transcriptomics, etc.) is helping.
On the AI architecture side, researchers have developed multi-modal neural networks that might have distinct sub-networks for each modality (e.g. a CNN for images, a transformer for sequences, etc.) which then merge into a common layer. Some foundation models in 2024 even handle multimodal input directly; for instance, transformers that can attend to both text (or sequence data) and images simultaneously.
Graph neural networks have also been used to integrate omics by constructing knowledge graphs where different node types (genes, diseases, drugs) are connected – an AI can then propagate information in this graph to find cross-modal links (e.g. a mutation node connecting to a pathway node connecting to a drug node suggests that drug might work for that mutation).
The payoff for successful integration is high. Multi-omics AI has helped accelerate target discovery by finding causal pathways across molecular layers. An example: by combining genomics and spatial proteomics (imaging of protein expression in tissue), researchers discovered why certain tumors with the “right” mutation still resist immunotherapy – it turned out microenvironmental cues (captured in spatial data) were suppressing immune response.
Only by overlaying the where (spatial proteomics) on the what (genomics) did this insight emerge. Another success is in biomarker identification: linking transcriptomic profiles with digital pathology features has revealed novel biomarkers that neither alone would uncover – for instance, a subtle histological pattern correlated with a gene expression signature of aggressive disease. These integrated biomarkers can stratify patients better for clinical trials.
Patient stratification itself is boosted by multi-modal data. For complex diseases like Alzheimer’s or autoimmune disorders, one modality might cluster patients in ambiguous ways, but integrating genotype, blood biomarkers, and imaging might reveal distinct subtypes. As DrugTarget Review noted, “Integrated multiomics can enhance patient stratification by linking genotype to phenotype in a spatial and cellular context” (drugtargetreview.com). In other words, by seeing the full cascade from DNA variant to altered protein in tissue to clinical symptom, AI can define coherent patient groups who might respond differently to therapies.
By 2025, pharma companies and consortia have launched significant initiatives to gather integrated datasets. For example, the UK’s Medicine Discovery Catapult worked on generating large sample sets with paired genomics, proteomics and clinical data for AI modeling. Government programs (like Singapore’s “Ignition” accelerator and others in Asia) explicitly fund efforts to combine multi-omics for drug discovery (biospectrumasia.com).
AI Company | Partner(s) | Partner Type | Country (AI Co.) | Focus Area |
---|---|---|---|---|
Insilico Medicine | Johnson & Johnson, Fosun Pharma | Pharma | China / Hong Kong | Small molecule generation, fibrosis |
Recursion | Roche, Bayer | Pharma | USA | Cell painting, phenomics AI |
Exscientia | Sanofi, Bristol Myers Squibb | Pharma | UK | Generative design, automated lab loop |
Atomwise | Hansoh Pharma | Pharma | USA | Structure-based drug design |
XtalPi | Pfizer, SoftBank | Pharma / Investor | China / USA | Physics-driven simulation |
BenevolentAI | AstraZeneca | Pharma | UK | Knowledge graphs, target identification |
PostEra | NIH | Government | USA | Open-source drug design, synthesis AI |
BioAge | Astellas | Pharma | USA | Aging biology, biomarker discovery |
Deep Genomics | BioMarin | Pharma | Canada | RNA therapeutics, splicing correction |
Genesis Therapeutics | Genentech, Eli Lilly | Pharma | USA | Graph neural networks for binding prediction |
The rationale is clear: if AI is fed with more complete data about human biology, it should make more clinically relevant discoveries – addressing one limitation noted in Section 1: that many AI-designed drugs focused on known targets because they hadn’t truly accounted for human physiological complexity (nature.com). Multi-omics AI is a step toward incorporating that complexity upfront.
One interesting frontier is cross-modal generation: e.g. generating a hypothesis (text) from data (like “Based on gene expression and imaging, this patient likely has subtype X of disease”), or even designing interventions (molecules) conditioned on patient omics. We’re seeing early examples like Genentech’s multimodal AI that, given a tumor’s multi-omic profile, suggests an optimal drug combo (somewhat akin to how Netflix recommends movies based on multi-faceted user data).
The technical challenges in multi-omics integration are far from solved. Models can overfit noisy multi-modal correlations (seeing patterns in the noise). There’s also the curse of dimensionality: each modality by itself is high-dimensional; combining them multiplies the complexity, often needing massive sample sizes to avoid false discoveries. Causal relationships are hard to disentangle – just because an RNA and an image feature correlate doesn’t mean one causes the disease. Hence, there is increasing interest in causal modeling within multi-omics (more in Section 2.7 and Section 8).
Nonetheless, the direction is set: the future of drug discovery AI will not be siloed. In a sense, AI is becoming the master integrator we always needed – capable of synthesizing a mind-boggling amount of heterogeneous data into hypotheses and decisions. The famous line “the whole is greater than the sum of its parts” applies: multi-modal AI aims to create a holistic model of biology that yields insights unattainable from any single data stream alone.
To quote a 2025 article by a bioinformatics SVP: “With truly integrated datasets, AI can transform noisy biological complexity into structured, predictive frameworks – shifting machine learning from a descriptive tool to a discovery engine.” This captures the optimism: once AI can see the forest (not just individual trees of omics), it may uncover fundamental mechanisms and new therapeutic avenues, ushering in an era of rational drug discovery informed by systems-level understanding (drugtargetreview.com).
2.5 ADMET and Pharmacokinetics/Pharmacodynamics (PK/PD) Prediction
Even the most potent drug is useless if it doesn’t reach the target in the body at the right concentration, or worse, if it’s toxic. Predicting ADMET properties (Absorption, Distribution, Metabolism, Excretion, Toxicity) and PK/PD profiles (how the drug concentration changes over time and how that relates to effect) is thus a cornerstone of drug development. Traditionally, ADMET optimization relied on rules of thumb (Lipinski’s Rule of Five, etc.) and lots of in vitro assays (like microsomal stability, hERG binding, etc.). Now AI is supercharging ADMET prediction by learning directly from large collections of historical data on compounds.
Pharma companies have assembled big internal datasets of compounds with various ADMET measurements – and some public datasets exist too (e.g. the FDA’s LD50 database for toxicity, or human PK parameters in literature). Machine learning, especially multi-task deep learning, has shown to be very effective here. One model can simultaneously predict multiple properties, sharing learned chemical features across tasks.
For instance, a graph neural network might output a vector of 50 different property predictions (solubility, permeability, clearance, half-life, various toxicity endpoints) for each compound. A 2023 review highlighted platforms like ADMETlab 2.0, which can predict a whopping ADMET endpoints in one go, up from 31 in the earlier version.
ADMETlab 2.0 uses an attention-based graph convolutional network under the hood, reflecting the state-of-the-art: its multi-task graph attention model improved both accuracy and speed versus the prior random-forest-based models. Impressively, it can evaluate 1,000 molecules in 84 seconds, whereas older methods took hours (pmc.ncbi.nlm.nih.gov). This kind of productivity gain means chemists can get ADMET feedback on virtual designs almost in real time, guiding synthesis toward molecules with better drug-like profiles.
Some specific advances in ADMET predictions by AI include
Human PK parameter prediction
Machine learning models (some published by AstraZeneca and others) have improved the prediction of human pharmacokinetic parameters (like volume of distribution, clearance, half-life) from chemical structure (pubmed.ncbi.nlm.nih.gov). These models often incorporate predicted or experimental in vitro data as inputs (for example, using predicted liver microsome stability plus the structure to predict human clearance). The best models are ensemble approaches combining mechanistic PBPK (physiologically-based pharmacokinetic) modeling with AI – essentially hybrid models where ML fills gaps in knowledge-driven models (frontiersin.org).
An example is an AI-assisted PBPK platform that uses machine learning to estimate certain parameters for simulation of drug concentration–time profiles (ascpt.onlinelibrary.wiley.com). By 2025, the FDA itself has signaled openness to AI in PK modeling, issuing guidance on using AI to support drug safety and efficacy assessments with a “risk-based credibility” approach.
Toxicity prediction
AI is used extensively to predict toxicity endpoints – from acute toxicity in animals to complex endpoints like liver injury or carcinogenicity. Deep learning can capture subtle structural features correlated with toxicity that traditional QSAR missed. One prominent example is cardiotoxicity: blocking of the hERG ion channel in the heart can cause arrhythmias. AI classifiers for hERG liability are now quite reliable, flagging compounds likely to cause QT prolongation (pubs.acs.org).
Companies also deploy models for DILI (drug-induced liver injury) risk, often by integrating chemical structure with gene expression perturbation signatures (if available).
The FDA’s recent draft guidance encourages using AI to support safety assessments, provided the models are validated. It’s telling that by 2025, regulators are actively engaging with AI outputs for decisions – a sign that predictive toxicology has matured. Still, caution abounds: these models must be interpretable and have known applicability domains.
Drug–Drug Interaction (DDI)
As polypharmacy is common, predicting if a new drug will interact with others via enzyme inhibition or other mechanisms is crucial. Deep learning models now exist that take two drug structures and predict likelihood of a DDI. Alternatively, knowledge-graph based approaches combine drug structures with known metabolic pathways. A review in Briefings in Bioinformatics 2023 summarized that integrated graph and deep learning methods show promise in predicting DDIs by leveraging both chemical and network features (pubmed.ncbi.nlm.nih.gov).
For example, a model might predict that Drug A will boost Drug B’s levels by inhibiting CYP3A4, because it “recognizes” features in Drug A similar to known CYP3A4 inhibitors.
Overall, AI-driven ADMET models help filter out poor candidates early (those likely to have low bioavailability or high toxicity) and focus medicinal chemistry efforts on compounds with a higher probability of success. Importantly, these models also help in drug design iteration: medchemists can use AI to answer “if I add a polar group here, will it reduce volume of distribution or make the compound a P-gp substrate?” – essentially running in silico experiments to guide real ones.
That being said, predictive accuracy is not perfect. Many ADMET properties have high experimental variability themselves (different labs may measure slightly different clearance for the same compound), so models face a noisy target.
Models tend to be more reliable for properties governed by clear physicochemical rules (like lipophilicity influencing passive permeability) and a bit less so for complex biological outcomes (like idiosyncratic toxicity). Acknowledging this, a 2022 internal study at Boehringer Ingelheim pointed out that beyond a certain dataset size, returns diminished – bigger data or fancier algorithms didn’t always yield large accuracy gains, hinting at an intrinsic uncertainty in some ADMET predictions.
Nevertheless, the trend is toward more comprehensive, integrated ADMET prediction platforms. We see efforts to connect predictions into a coherent PK/PD simulation. For instance, AI-PBPK models are an emerging idea: using ML to estimate PBPK model parameters (like tissue partition coefficients, clearance rates) and then running a mechanistic simulation of plasma drug concentration over time (ascpt.onlinelibrary.wiley.com). Such models can even incorporate pharmacodynamics: e.g., predicting not just the concentration but the blood pressure reduction over time for an antihypertensive, given patient characteristics.
By 2025, a few papers demonstrated AI can assist in predicting human dose–response curves from preclinical data (frontiersin.org). This remains on the cutting edge, but it’s easy to see how valuable that would be – informing dose selection and likelihood of clinical success before you even go to trial.
In summary, AI has become an indispensable toolkit in ADMET and PK/PD prediction, helping to de-risk candidates early and design better candidates from the get-go. It addresses a major cause of late-stage failure (poor pharmacokinetics and safety) by shifting left those evaluations to the design phase. One could say AI is augmenting “medicinal intuition” with data-driven insights: whereas a chemist might recall that bulky cations often have low CNS penetration, an AI model can quantify that tendency across thousands of examples and flag an issue or suggest a fix.
An interesting reflection is that these models themselves have driven experimentation in AI – e.g. multi-task learning was pioneered in part through ADMET modeling challenges (Merck’s 2012 Kaggle on QSAR is famous in ML lore, pmc.ncbi.nlm.nih.gov).
The continuous feedback loop between cheminformatics and machine learning communities has led to steady improvements. By 2025, we have moved well beyond simple rule-based filters to an era where high-capacity models capture ADMET patterns that even expert chemists might miss (pmc.ncbi.nlm.nih.gov). The ultimate validation will be fewer clinical failures due to ADMET – a metric we’ll watch in the coming years as “AI-born” drug candidates move through development.
2.6 Clinical Trial Simulation and Synthetic Patient Data
Long after a molecule is designed and optimized in the lab, the final challenge lies in clinical trials – testing in humans. This stage is notoriously costly, slow, and failure-prone (attrition rates from Phase I to approval hover around 90%). AI is increasingly being employed to improve clinical development, in two major ways: simulating trials with synthetic data (including digital twins of patients) to optimize trial design, and improving patient recruitment and stratification using predictive modeling (nature.com).
One burgeoning area is the creation of synthetic patient data or “digital twins” for clinical trials. The concept of a digital twin is to have a computational model of a patient that can predict how that patient would respond to treatment. If you have reliable digital twins, you could potentially run in silico trials, or more immediately, you can augment real trials with synthetic control arms.
For instance, instead of enrolling 100 patients on placebo, a trial could enroll 50 on drug and use 50 AI-simulated placebo responses (based on historical data and patient profiles) as the control.
This approach is already being piloted: companies like Unlearn.AI have developed AI systems that, given a patient’s baseline data, generate a “potential outcome” for that patient if they were in the placebo group (unlearn.ai). Regulators have shown cautious optimism – in 2023, the European Medicines Agency (EMA) qualified Unlearn’s digital twin approach for regulatory use in certain trial designsi. The FDA has issued guidance on using real-world data and AI for external control arms, acknowledging the value if bias can be controlled.
The benefit of synthetic controls is huge: it can cut the required sample size and get answers faster, while avoiding giving patients placebos when not absolutely necessary (an ethical win). A news article in late 2024 reported that digital twin technology (specifically citing Unlearn’s platform) could reduce trial timelines by over 25% (drugdiscoverytrends.com).
Essentially, by boosting statistical power (via more precise control comparisons), one could achieve significant results with fewer patients or shorter follow-up. One CEO was quoted saying that such AI “aims to accelerate clinical trials with digital twins”, underscoring that speed and efficiency are the selling points (drugdiscoverytrends.com).
How does AI create these synthetic patients or twins? Typically through a combination of generative modeling and causal inference on historical clinical data. Models are trained on large datasets of prior trials or observational health records to learn how patients with certain characteristics tend to progress on standard of care. Then, given a new trial participant’s profile (demographics, biomarkers, disease stage, etc.), the model generates an outcome trajectory (e.g., tumor size over time without the new drug).
This prediction becomes the patient’s “twin control”. Techniques used include Bayesian models, GANs, and longitudinal deep models that can handle time-series data. Ensuring the synthetic data mimics real variability and correlations is critical – one must convince regulators that the synthetic outcomes are exchangeable with real ones in a statistical sense.
AI is also being applied to clinical trial design optimization beyond controls. For dose finding (Phase I/II), AI models can simulate different dose escalation schemes or adaptive randomization protocols to see which might yield the most information with least patients. For example, AI-driven trial simulators can run thousands of virtual trials under various assumptions (enrollment rates, dropout, effect size) to help choose an optimal design and endpoints.
Some trials are even using AI in real-time: adaptive trials where interim analyses via machine learning help drop futile arms or adjust dosing on the fly.
Patient recruitment is another area ripe for AI. Finding eligible patients (especially for rare diseases or trials with complex criteria) is like searching for needles in haystacks. Machine learning models applied to electronic health records (EHR) can identify patients who match a trial’s criteria far more efficiently than manual screening. Natural Language Processing (NLP) can parse doctor’s notes to find, say, “Stage III lung adenocarcinoma with EGFR mutation” mentions, lining up with a trial’s target population (pmc.ncbi.nlm.nih.gov).
AI can also predict which recruitment sites are likely to yield more patients, guiding site selection to improve enrollment rates. These data-driven approaches help tackle the chronic problem of trial delays due to slow enrollment.
Once patients are in a trial, AI can assist with retention and adherence as well – for instance, by analyzing patterns in who drops out and intervening early (perhaps via digital apps reminding or encouraging patients). In trial execution, AI algorithms monitor incoming data for signals of efficacy or safety in adaptive trials. Some Phase I trials are using AI to adapt dosing faster than traditional 3+3 designs, guided by model-based toxicity prediction (pmc.ncbi.nlm.nih.gov) – these are akin to model-based dose-escalation methods like Bayesian optimal interval design, enhanced with machine learning.
One could say we are witnessing the rise of in silico experimentation paralleling in vivo trials. By 2025, it’s still early days for full-blown in silico trials (regulators will not accept a new drug approval without extensive real patient data anytime soon). But the incorporation of AI in the clinical development toolset is evident. The FDA even created an AI-focused team to evaluate such approaches, and published a framework on using AI/ML in drug development decisions (veranahealth.com). They emphasize “credibility” – that AI predictions must be backed by evidence and uncertainty estimates.
A concrete example: in oncology, an AI may simulate different trial stratifications – e.g., should we restrict to biomarker-positive patients or not? If we include all, the effect might dilute; if we restrict, enrollment slows. AI can model these trade-offs by “creating” virtual patients with and without the biomarker and simulating response rate (pmc.ncbi.nlm.nih.gov).
This helps in deciding trial criteria. Another example, as mentioned earlier, is using synthetic arms in rare disease trials where recruiting a control group is ethically or logistically hard (when everyone wants the new hopeful therapy, or patient numbers are scant).
In 2023, a well-known case was a neurological disease trial that supplemented with a matched external control from patient registry data analyzed by AI; this helped demonstrate the drug’s effect without a traditional placebo group.
The use of causal inference methods (like causal forests, marginal structural models) plays an important role here, to ensure the AI is comparing apples to apples – adjusting for confounders when using historical or external control data. We discuss causal techniques more in the next subsection, but suffice it to say, combining deep learning with causal frameworks is an active research frontier for trials.
Finally, the concept of a “digital twin” can extend beyond trials to individual patient care. Some visionary efforts imagine that for each patient, one could have a personal AI model that, fed with their health data, could predict outcomes under various treatment options (a virtual randomized trial for that patient). While that’s beyond current capability for most conditions, steps in that direction are being taken in oncology and chronic diseases where ample longitudinal data exist (unlearn.ai). If realized, it would truly be a paradigm shift: physicians consulting an AI-simulated outcome for “if this patient goes on drug A vs drug B” to inform decisions.
In summary, AI is helping to de-risk and speed up clinical development – historically the most expensive and failure-prone phase of pharma R&D. It does so by learning from the troves of past trial and real-world data to model patient outcomes. By leveraging synthetic data and trial simulations, AI can optimize design and potentially reduce the need for some control patients.
By identifying the right patients (and the right subset for precision trials), it increases the chances that a trial will show a benefit if one exists. This is incredibly important because a failed Phase III due to a dilution of effect or an avoidable safety issue can sink a program and waste hundreds of millions of dollars. As one 2024 headline put it, “The future of drug trials might be virtual AI patients”, hinting that the classical fully physical trial may give way to a cyber-physical hybrid (forbes.com).
We’re not there yet, but the trajectory is clear: clinical trials are the new frontier for AI to prove its worth, and early signs (like faster trials with AI-augmented control arms) are encouraging.
2.7 Causal Inference in Biomedicine
In the hierarchy of evidence, correlation is not causation – a mantra every pharma scientist knows well. Observational biomedical data is full of correlations: genes co-expressed, proteins interacting, clinical factors predicting outcomes. But to develop effective interventions, we need to uncover causal relationships: Does inhibiting this protein cause disease improvement? Would changing this biomarker cause a downstream effect?
Traditional clinical research addresses causality through randomized trials and carefully controlled experiments. However, with the glut of biomedical data available now, there is a drive to apply causal inference algorithms to extract causal insights from data directly, complementing experiments. AI and machine learning are increasingly intertwined with causal inference techniques in biomedicine – an area sometimes dubbed “causal AI”.
A key realization is that many AI applications in drug discovery implicitly seek causality. When we choose a target based on genetic data, we’re hoping that modifying the target will cause a clinical benefit (not just correlate with it). When we use AI on real-world patient data to find who benefits from a drug, we want to know the drug caused the benefit, not just that certain patients happened to do better. Traditional ML often ignores this distinction, happily picking up associational patterns that maximize prediction accuracy.
Causal inference forces us to incorporate knowledge (or assumptions) about the underlying data-generating process – often through directed acyclic graphs (DAGs) or structural causal models – and then use algorithms to infer causal effects or identify causal structure from data.
By 2025, there’s a notable uptick in applying causal inference in pharma R&D. One 2023 Drug Discovery Today review plainly stated: “To discover new drugs is to seek and to prove causality… causal inference holds the promise of reducing cognitive bias and improving decision-making in drug discovery.” (pubmed.ncbi.nlm.nih.gov)
The article offered a nontechnical intro to the concepts, signaling an attempt to bring causal thinking to a wider pharma audience. The promise is that causal approaches can help overcome some limitations of purely correlational AI models (like those that might mislead when training and deployment data differ, as discussed in Section 1).
Some practical areas where causal inference is making an impact
Target validation
Causal AI is used on human genetics and clinical data to validate targets. For example, techniques like Mendelian randomization use genetic variants as instruments to test if a biomarker causally influences disease. AI helps by scanning across thousands of potential biomarkers and outcomes, automating what used to be one-hypothesis-at-a-time analyses. A Nature article by an AI biotech (biotx.ai) described a platform that ingests large genomic and phenotypic datasets (over 22 million cases across 3,300 diseases) and uses causal inference at scale to find causal links between drug targets and diseases (nature.com).
They reported that about two-thirds of recently FDA-approved drugs have genetic evidence supporting their target (i.e., the target gene is implicated by human genetics), and that causal AI on GWAS data is a logical next step to go beyond correlation. This underscores a paradigm: when a gene variant affecting a protein’s function also affects disease risk, that protein is likely a causal driver – a good drug target. AI accelerates finding such genes among the sea of GWAS hits and network interactions.
Analyzing real-world data (RWD)
Observational health data (from EHRs, claims, registries) can be a goldmine for understanding drug effects in the wild, but confounders muddy the waters. For instance, patients on Drug A might fare better than those on Drug B, but maybe Drug A patients were younger (a confounder). Causal inference techniques like propensity score matching, causal forests, or deep causal models are being used to adjust for confounders in these datasets, effectively emulating a randomized trial. AI comes into play to handle the high-dimensional confounders (lots of variables) – using representation learning to summarize patient data in a way that makes treated and control groups comparable.
This is crucial for generating real-world evidence on drug effectiveness or safety. The FDA’s 2024 guidance explicitly discusses using RWD with methods to address bias for regulatory decision-making. An example: using causal inference, one study found no association between a new vaccine and an adverse event in RWD, providing reassurance beyond clinical trial data (trinetx.com).
Causal structure learning
On the discovery end, AI is being used to learn causal networks from complex data (like gene regulatory networks from single-cell omics). Algorithms (e.g., NOTEARS, causal Bayesian networks) attempt to infer which genes regulate which others. While still challenging with many variables, the integration of perturbation data (like CRISPR screens) with observational data via AI is improving these networks. Knowing the causal gene network can highlight key bottlenecks to target or predict combination therapies by identifying synergistic causal pathways.
Counterfactual prediction
In clinical decision support, causal AI allows prediction of counterfactuals – e.g., would this patient have improved if we had given treatment X instead of Y? Some AI models in 2025 for personalized medicine explicitly try to estimate individual treatment effects. For instance, causal forests or uplift models partition patients into those who benefit from drug A vs drug B. This goes beyond standard ML that might only predict overall risk. It’s directly asking the causal question: how does outcome differ if we “do” one treatment vs another?
Causal discovery of disease drivers
By integrating multi-omics and longitudinal data, causal models attempt to differentiate mere disease markers from true drivers. For example, in an Alzheimer’s dataset, many proteins change as the disease progresses; causal modeling (especially with time-series data) can identify which changes precipitate others. Those upstream signals could be therapeutic targets. Traditional correlation clustering might label dozens of proteins as disease-associated; causal inference narrows it down to the mastermind regulators.
One can imagine an AI parsing a time-lagged omics dataset to suggest “protein X’s rise causes protein Y’s accumulation six months later, which correlates with cognitive decline – intervene at X.”
The field is not without challenges. Causal inference relies on assumptions (no hidden confounders, correct model specification) that, if violated, can lead to wrong conclusions with high confidence. Pharma folks have learned to be cautious: some early excitement around in silico causal discovery has been tempered by cases where AI inferred spurious causation due to biases in data.
There’s also the matter of interpretability – causal models often output human-interpretable relations (like a DAG), which is a plus, but ensuring experts agree with those relations is an iteration.
By 2025, there’s a convergence happening: the tools of modern machine learning (like deep learning, which can handle high complexity) are being combined with the principles of causal inference (which provide directionality and robustness). For example, “causal embeddings” are a thing – representation learning that respects causal structure, or using causal objectives in training neural nets to enforce invariances.
This synergy is sometimes dubbed the quest for causal AI: AI that not only predicts well but also understands cause and effect, enabling better generalization and decision-making (research.ibm.com, nature.com).
One concrete sign of progress: pharma companies are hiring causal statisticians and forming groups like “Machine Learning for Causal Drug Development” (research.ibm.com). They’ve recognized that unlocking the full value of AI requires embedding it in a causal framework to answer the “what if we do X?” questions that drug development ultimately hinges on. As an IBM research blog provocatively put it, “we’re developing ML models for innovative drug discovery technologies and causal inference”, essentially merging these once-disparate domains (arxiv.org).
To wrap up, causal inference in biomedicine is bringing us closer to the ideal of rational drug discovery and development. Rather than just fishing correlations from big data and hoping some pan out, AI-driven causal methods aim to identify the true levers to pull – the targets to hit, the patients to treat, the biomarkers to measure – that will causally change patient outcomes.
It’s a powerful complement to both wet-lab experiments and standard AI, and in the coming years we can expect it to play a role in everything from target discovery (e.g. using human causal genetics) to trial analysis (e.g. adjusting for confounders in observational comparisons) to post-market pharmacovigilance (distinguishing causally drug-related adverse events from background noise).
Or, as one 2023 panel on the topic mused: when properly combined, causal reasoning and machine learning could make AI a reliable partner in the scientific process, not just an oracle of correlations (nature.com).
Having surveyed the technical challenges and how AI is tackling them, we next turn to concrete use cases beyond the famous example of protein folding, diving into the breadth of applications where AI is proving its value in pharma.
3. AI Use Cases Beyond Protein Folding
When DeepMind’s AlphaFold cracked the protein folding problem in 2021, it grabbed global headlines as a triumph of AI for biology. Yet drug discovery involves many tasks beyond predicting protein structure. In fact, a large share of real-world AI investment in pharma is going into other use cases that, while less publicized, are critical for bringing new drugs to market.
Here we provide a comprehensive overview of AI applications beyond protein folding, including de novo molecular design, retrosynthesis planning, drug repurposing, biomarker discovery, patient stratification, knowledge mining, and real-world evidence analysis.
3.1 De Novo Drug Design (Generative Chemistry)
One of the most exciting use cases is de novo drug design – using AI to generate novel molecular structures with desired properties, essentially creating new chemical matter from scratch. Traditional medicinal chemistry relied on human intuition and stepwise modification of known scaffolds. AI flips this by learning the rules of “good molecules” from data and then inventing candidates that a chemist might never conceive unaided.
Generative models for molecules come in many flavors: variational autoencoders (VAEs) that learn a continuous latent space of chemistry, generative adversarial networks (GANs) that try to produce molecules indistinguishable from real ones, and transformers or RNNs that treat SMILES strings generation akin to natural language. We discussed reinforcement learning in section 2.3 as an overlay to guide these models towards specific objectives.
By 2025, these approaches have matured to the point that several AI-designed molecules are in clinical trials – a tangible validation of the technology. For example, Exscientia, a UK-based AI drug design company, advanced multiple molecules to trials in under 2 years each.
One of their compounds (for obsessive-compulsive disorder) was touted as the first AI-designed drug to enter Phase I (back in 2020). Since then, Exscientia has also initiated trials for an immuno-oncology agent in collaboration with Bristol Myers Squibb, and reported promising phase 1 results.
Insilico Medicine similarly took an AI-designed fibrotic disease drug (targeting DDR1) from project start to IND filing in ~30 months (nature.com). These cases suggest that AI can compress the design cycle (which often takes 4–5 years) by a significant factor – roughly cutting it in half by some claims. How? Mainly by exploring chemical space more efficiently and prioritizing only the most promising candidates for synthesis/testing.
The typical workflow for de novo design: an AI model generates a batch of virtual molecules optimized for predicted target binding and drug-like properties; these are synthesized and tested; data comes back and is used to refine the model or the selection for the next round. This iterative loop is sometimes called AI-driven DEL (Design–Evaluate–Learn). It stands in contrast to brute-force screening of millions of random compounds.
One metric that has emerged is “AI hit rate” – the percentage of AI-generated molecules that meet activity criteria. Several pharma reports claim hit rates an order of magnitude higher than random screening, thus saving time and cost (pmc.ncbi.nlm.nih.gov). For instance, instead of 0.1% of tested compounds being hits, an AI-guided selection might yield 1–5% hits, because the AI pruned the space to those likely to bind.
Beyond single-target potency, AI design is expanding to tackle multi-objective optimization: designing molecules that not only hit the target but also avoid certain off-targets, have good ADMET, etc. This is where the reward function engineering (from section 2.3) and multi-task models come in. Generative AI can be asked to propose, say, “a CNS-penetrant kinase inhibitor that doesn’t hit hERG and is stable in liver microsomes”.
Handling such complex requests pushes the limits of current models, but progress is steady. There have been success stories of multi-target designs too – e.g., AI generating a single molecule that modulates two separate disease pathways (polypharmacology by design), something that historically was found serendipitously if at all.
A noteworthy sub-area is 3D structure-based generative design. With protein structures (experimental or AlphaFold-predicted) in hand, algorithms can design molecules to fit the binding site. Diffusion models (as mentioned earlier) and reinforcement learning are used to generate conformations directly in the pocket. For example, MIT’s DiffDock approach uses a diffusion generative model to propose binding poses and new analogs for a given protein (news.mit.edu).
Early results show it can identify plausible binders more efficiently than traditional docking in some cases. Companies like Schrödinger (a physics-driven software firm that increasingly integrates ML) and startups like Atomwise (pioneer of CNNs for structure-based design) are active here – often combining physics (force fields) with AI to get reliability.
Generative design isn’t a panacea, however. One limitation is model bias: if trained on known drug-like molecules, the AI might stick to safe, known chemotypes (essentially remixing what’s in its training set). There’s a tension between novelty and reliability – push the AI too much towards unexplored chemistry and you risk weird, synthesis-intractable structures; constrain it too much and you only get me-toos.
Generative models have to be carefully tuned and often incorporate chemistry rules (either implicitly through fine-tuning on realistic datasets or explicitly through filters for stability, synthesizability using tools like retrosynthesis analysis).
Another limitation is that generative AI predictions are only as good as the predictive models guiding them. If the property predictors (activities, ADMET) are off, the generator will optimize toward a mirage. Therefore, generative pipelines often involve an ensemble of predictive models and are updated continuously with new experimental data – a sort of co-evolution between the model and the project data.
All told, de novo design is one of the crown jewels of AI in pharma – it directly aims at the core creative act of drug discovery. By 2025 it has moved from academic demos to industrial practice, with some drugs in trials to show for it.
As one Nature commentary opined, “We have seen several success stories where AI substantially accelerated preclinical development... 12–18 months to advance programs to IND-enabling studies”, though they wryly note that improved speed hasn’t yet translated to improved Phase II success rates (nature.com).
It’s a reminder that designing the molecule is just step 1 – but an important step it is. If nothing else, AI de novo design is expanding the reachable chemical space and could lead to therapies for targets that medicinal chemists had struggled with by conventional means.
3.2 Retrosynthesis and Route Prediction
Once you have a promising molecule (whether AI-designed or not), the next question is: can we make it? Retrosynthesis planning is the art of figuring out how to synthesize a target molecule from simpler starting materials. It’s like solving a puzzle in reverse – breaking down the molecule into pieces until you reach commercially available building blocks.
Historically, retrosynthesis has been the domain of expert chemists and rule-based software (like the long-standing program CASP or Reaxys’s planning tools). In recent years, AI has taken a strong foothold here, significantly improving the speed and scope of retrosynthetic analysis.
How does AI tackle retrosynthesis? Typically by treating it as either a graph transformation or a sequence prediction task. One popular approach is a template-based model: it uses a library of reaction templates (generalized patterns of how bonds break/form) learned from databases of reactions, and it predicts which template applies to a given molecule to yield plausible precursors.
Another is template-free: directly using a model (like a seq-to-seq transformer) to propose reactants given a product, essentially “translating” a product SMILES to a set of reactant SMILES (uui.adsabs.harvard.edu).
These models often leverage the same neural machine translation techniques that revolutionized language translation – treating chemistry as a language where reactions are sentences.
The results are impressive. An MIT thesis noted that since molecules can be encoded as sequences, “Seq2Seq learning serves as an effective tool for retrosynthesis prediction”, with models generating correct retrosynthetic steps that mimic expert logic (dspace.mit.edu). Modern retrosynthesis AIs (like IBM’s RXN or AI startups like Synthekine’s tools) can solve a route in seconds that might take a human days to reason through.
They also propose multiple alternative routes, allowing chemists to choose based on practicality (cost, safety, available equipment). For instance, a model might suggest three synthetic routes for a 10-step target, two of which involve known chemistry and one that’s more creative – the chemist can then evaluate which is best.
One of the challenges is ensuring the AI’s suggestions are reasonable and valid. Early models sometimes gave chemically implausible disconnections. Today’s models incorporate chemical knowledge: they implicitly learn from thousands of known reactions what typical reagents and conditions work. A Nature Communications paper in 2023 introduced RetroExplainer, a deep learning retrosynthesis model with an interpretable “molecular assembly” view, which outperformed earlier approaches on benchmarks (nature.com).
Notably, it matched literature routes about 87% of the time in multi-step planning – a high figure indicating it often converges on the same steps a human would have published. The interpretability means it’s not just a black box: it can show which part of the molecule is being disconnected in each step, aligning with human reasoning.
Beyond single-step retrosynthesis (predicting one reaction), AI tools handle multi-step planning by recursively applying models and sometimes using search algorithms. This can blow up combinatorially, so advanced planners use heuristics and optimization (like Monte Carlo Tree Search guided by a neural network, analogous to how DeepMind’s AlphaGo plans moves). The result is that AI can plan synthetic routes of 5–10 steps reliably within minutes, something that was basically impossible a decade ago without hours of manual work.
Route optimization is another aspect: not just how to make it, but how to make it cheaply, in high yield, with minimal steps. AI can rank routes by estimated yield or cost, if trained on data that includes such info. Some systems integrate with cost databases and even Green Chemistry metrics to suggest the most cost-effective or environmentally friendly route. This is valuable for process chemists in late-stage development, who must scale up a synthesis – the AI might propose a different route that uses fewer steps or avoids a toxic reagent.
It’s worth mentioning the synergy between generative design and retrosynthesis AI. One criticism of AI-designed molecules was that they might be hard to synthesize. Now, the workflow often includes a retrosynthesizability filter: generate molecules, then immediately run retrosynthesis AI to filter out those with no viable make-route (pmc.ncbi.nlm.nih.gov). Conversely, if a highly potent molecule is hard to make, AI might suggest an easier-to-make analog with similar properties.
In the pharmaceutical industry, retrosynthesis planning AI is probably one of the most widely adopted tools (perhaps second only to docking and QSAR models) because it directly boosts chemists’ productivity. Major companies have either built in-house models or license them. In 2023, for example, Merck described an internal platform that automatically designs and evaluates synthetic routes for newly proposed targets, cutting down the initial route-scouting from weeks to hours.
However, human oversight remains crucial. Models sometimes suggest obscure reagents or routes that work “on paper” but might be low yielding or require exotic steps. Seasoned chemists use the AI as a brainstorming partner – they often find that the AI surfaces ideas they hadn’t thought of, including using disconnections that mimic named reactions or creative use of building blocks, spurring new approaches. One could say AI hasn’t replaced the chemist, but it has augmented them, acting like a super well-read colleague who instantly recalls literature precedents. As one retrosynthesis researcher put it, “Automating retrosynthesis with AI expedites organic chemistry research in digital laboratories”nature.com – highlighting how such tools accelerate the ideation phase tremendously.
Looking ahead, retrosynthesis models are even being combined with robotics: a fully AI-driven lab might plan a route and then execute it with automated synthesizers, iterating if a step fails. This concept of a “self-driving lab” is on the horizon, and retrosynthesis AI is a key enabler by providing the plan the robots should carry out.
In sum, AI for retrosynthesis and route design is a less glitzy but extremely impactful use case. It addresses a very practical bottleneck: how to manufacture complex molecules efficiently. By democratizing expert-level planning, it frees chemists to focus on creative problem-solving and fine-tuning, rather than brute-force searching through reaction options. The success metric here is straightforward – if AI can cut the time to get a reliable synthetic route or reduce the number of steps, it translates to cost savings and faster development.
Given the progress by 2025, it’s fair to say we’re entering an era where AI-assisted synthesis planning is routine, much like computer-aided design (CAD) is routine in engineering. The days of flipping through thick reagent catalogs and retrosynthesis textbooks may soon give way to simply asking the AI, “How do I make this?” and getting a list of ready-to-run recipes (sciencedirect.com).
3.3 Drug Repurposing
Developing a brand new drug from scratch is costly and lengthy – so the idea of finding new therapeutic uses for existing drugs (or shelved compounds) is extremely attractive. Drug repurposing aims to identify drugs approved for one disease (or failed in one indication) that could be effective in another. AI has become a powerful ally in this search by sifting through vast biomedical datasets to find hidden connections between drugs and diseases.
There are a few angles AI takes for repurposing:
Knowledge graph mining
We have troves of data in literature and databases about drugs, targets, pathways, and diseases. By constructing a knowledge graph where nodes might be drugs, genes, diseases, etc., and edges represent known relationships (drug binds gene, gene associates with disease, etc.), AI algorithms can look for new links. Essentially, if Drug A targets Protein X, and Protein X is implicated in Disease Y (even if Drug A was originally for Disease Z), then Drug A might help Disease Y. In 2020, as cited earlier, BenevolentAI applied this approach during the COVID-19 outbreak. Their AI system queried a knowledge graph of biomedical data to find any existing compounds that could block viral infection or the inflammatory cascade.
It homed in on baricitinib (a JAK inhibitor for rheumatoid arthritis) as a candidate to inhibit both viral entry and cytokine storm. This suggestion, made in January 2020, was remarkably prescient – baricitinib indeed showed benefit and was later authorized to treat hospitalized COVID patients, a success often held up as a case study of AI-driven repurposing (benevolent.com).
Another example: IBM’s Deep Drug Repurposing tool in the 2010s looked at literature and omics data to propose that an old liver drug (tolcapone) could be repurposed for multidrug-resistant tuberculosis by targeting a certain enzyme – a non-obvious connection found via AI cross-referencing.
Signature matching
Another approach is using gene expression or other omics signatures. Diseases often have characteristic expression changes; drugs also induce expression changes in cells. AI can match a drug’s signature that is the “inverse” of a disease’s signature, suggesting the drug might reverse the disease state. This was done in the NIH’s LINCS program where ML models scanned hundreds of thousands of compound-induced gene expression profiles to find ones that counteract the gene signature of diseases like inflammatory bowel disease or certain cancers. It’s like finding a puzzle piece that fits the shape of the missing piece.
If Disease X upregulates a set of genes, look for a drug that downregulates those genes. Some repurposing leads have come from this: for instance, an epilepsy drug was predicted to help inflammatory bowel disease because their expression profiles were opposites, and indeed it showed some efficacy in a small trial.
Real-world data mining: AI can also scour clinical data for unexpected positive outcomes. For instance, some blood pressure medications appeared (in retrospective analysis) to reduce the risk of Alzheimer’s. By analyzing millions of patient records, AI might flag that patients on Drug M have lower incidence or severity of Disease N compared to matched controls.
While such findings are observational, they generate hypotheses for repurposing that can then be clinically tested (with the aid of causal inference to ensure it’s not just confounding – linking back to section 2.7). A classic example: the diabetes drug metformin was observed in epidemiological studies to be associated with lower cancer rates, sparking trials of metformin as a potential anti-cancer adjuvant. AI is making such pattern-finding more systematic.
Cheminformatics-driven repurposing
Sometimes a drug might bind an off-target that is relevant to another disease. AI models that predict drug–target interactions (as in section 2.2) can be used to screen all approved drugs against new protein targets. If, say, an antifungal drug is predicted by AI to also inhibit a human protein that’s a pain mediator, that drug could be repurposed as a painkiller (this is hypothetical but illustrates the approach). This “virtual screening of old libraries” is something companies do – basically asking, for every approved drug, does it have any predicted activity that might help with this new disease’s biology?
A success story here is sildenafil (Viagra) – originally developed for angina (heart pain) by affecting blood vessels, then famously repurposed for erectile dysfunction. But later on, someone realized that the same vasodilatory effect could help pulmonary hypertension (a lung vessel disease), and indeed sildenafil was repurposed (under the name Revatio) for that life-threatening condition. AI wasn’t involved in that older example, but today AI could flag such connections faster by understanding mechanistic overlaps.
Retrospective rescue of shelved drugs
Pharma companies have many compounds that were safe but ineffective for their original indication. AI can help find a new indication where they might be effective. One company, NuMedii, built a database of compounds that failed in trials and uses AI to find disease matches for them. Another company, Healx, focuses on rare diseases, scanning existing drugs to see if any might hit pathways known to be relevant in a rare disease, often suggesting combos of two existing drugs to cover a pathway network. In the knowledge graph , we see Healx partnering with pharma (Sanofi, etc.) to find therapies for rare neurological diseases using an AI platform.
The COVID-19 pandemic was a major catalyst that showcased AI repurposing efforts: dozens of teams applied models to find existing antivirals or anti-inflammatories that could be quickly deployed. Besides baricitinib, AI models highlighted other candidates like famotidine (a heartburn medicine, though that one had mixed results in trials)weforum.org.
One big advantage of repurposing is speed – since the drugs are already known in humans, you can often skip straight to Phase II trials for efficacy in the new indication. This shaved years off timelines, as seen in COVID or in rare diseases where testing a known drug can be done very quickly if there’s anecdotal support.
AI has some limitations in repurposing: it can produce many hypotheses, but testing them still requires experimental work and trials. The “hit rate” can be low if one naively trusts the AI without further validation. However, because repurposing deals with known entities (approved drugs), it’s easier to test a hypothesis (you can often go directly to an animal model or small human trial). This synergy of AI hypothesis and quick testing led to a burst of repurposing trials. Not all succeeded, but the cost was relatively low.
The regulatory and business angle: many repurposed drugs are off-patent, so who will invest in trials? This is where some AI-driven repurposing companies partner with foundations or use creative IP strategies (like formulation patents or combining two generics into a new patented combo). Governments also stepped in – e.g., the UK’s AI Center for Health has funded AI repurposing projects for COVID and for rare diseases.
By 2025, repurposing via AI is an established part of the pharmaceutical playbook. It might not have the glamour of a brand-new molecule, but it’s arguably one of the highest ROI activities: find a new use for something that already passed safety. A neutral, analytical voice might note: repurposing often tackles low-hanging fruit (e.g., diseases with shared pathways), and many obvious ones have been tried by now; nonetheless, AI opens up less obvious connections and can continuously adapt as new data emerges.
The fastest way to discover a new drug could be to realize you already had it all along – you just needed an algorithm to point it out.
3.4 Biomarker Identification
Biomarkers – measurable indicators of biological state – are central to modern drug development and precision medicine. They can indicate disease presence (diagnostic biomarkers), predict disease progression (prognostic), or predict/monitor response to therapy (predictive and pharmacodynamic biomarkers). AI is supercharging biomarker discovery by finding subtle patterns in complex datasets (genomic, imaging, proteomic, etc.) that correlate with clinical outcomes.
In oncology, for example, AI has helped identify gene expression signatures or immune cell patterns in tumors that predict which patients respond to immunotherapy. Traditional biostatistics might consider one marker at a time; AI can consider combinatorial patterns of markers. A deep learning model might learn that a combination of 5 genes being high and 3 genes being low is predictive of response – a pattern no single-gene test would capture.
Such multi-gene signatures have become common (like the 21-gene Oncotype DX for breast cancer, though that one was developed pre-AI). Now, with AI, we’re seeing even more refined signatures emerging, possibly involving different data types.
Proteomic and metabolomic biomarker discovery is another area where AI shines. Mass-spec datasets are huge and complex; ML helps sift through thousands of molecules to find a handful that differ between patients who do well vs poorly on a drug. For instance, an ML analysis might discover a particular lipid metabolite that consistently rises in patients who later experience neurotoxicity – that metabolite becomes a candidate safety biomarker to monitor.
Imaging biomarkers
Imaging data (MRI, CT, digital pathology slides) is rich but hard to quantify. AI (especially convolutional neural networks) can identify features in images predictive of outcome. In neurology, AI has found MRI patterns that predict progression to Alzheimer’s years before cognitive symptoms. In oncology, radiomic patterns extracted by AI from CT scans (like texture or shape features of tumors) can serve as non-invasive biomarkers for tumor aggressiveness or likelihood of response to a treatment. The beauty is these patterns are often invisible to the naked eye; the AI picks up on pixel-level variations correlated with biology.
Composite biomarkers: The holy grail is integrating multiple modalities (like Section 2.4’s multi-omics) into composite biomarkers. AI is enabling this by taking, say, genetic + clinical + imaging data together to output a risk score. For example, an AI model might combine a patient’s genomic mutation profile, tumor histology image, and blood test results to predict if they will respond to a certain therapy. The output (e.g., a probability or score) functions as a composite biomarker more powerful than any single measure (drugtargetreview.com).
By 2025, regulators and industry have warmed to AI-derived biomarkers. The FDA has begun qualifying digital biomarkers (for instance, an AI-derived speech pattern for diagnosing neurological disease). In drug trials, biomarkers found by AI are used to enrich patient selection – e.g., only enrolling patients with the signature that the AI says is likely to respond, which increases the trial’s chance of success (biospectrumasia.com). This is especially useful in diseases with heterogeneity: using AI-defined biomarkers, one can split what used to be one disease into subgroups that can be targeted differently.
A poignant example: researchers used an unsupervised ML on Parkinson’s patient data (symptoms, genetics, etc.) and identified distinct subtypes of Parkinson’s, each with different progression rates. These subtypes served as biomarkers to stratify patients – which could allow future trials to target the fast-progressing subtype with neuroprotective drugs and not dilute effect with slow-progressors (a problem that plagued previous trials).
Similarly, in cancer, AI often reclassifies tumors beyond classical pathology – you might have an “immunogenic” vs “non-immunogenic” subtype across cancers, discovered through gene expression clustering, guiding use of immunotherapies beyond a single tissue origin.
Safety biomarkers are another angle: AI can analyze past trial data to find early signals of toxicity. For instance, perhaps an elevation in a certain lab value after 2 weeks on drug correlates with eventual liver injury. If AI finds that, that lab value can be used as a safety biomarker in future trials to monitor patients closely or make stop/go decisions early.
The FDA’s Sentinel initiative and others are exploring such patterns in pharmacovigilance data with ML to find biomarkers of adverse events (like EKG changes preceding arrhythmias, etc.).
The challenge with AI-driven biomarkers is validation and interpretability. A model might churn out a high-dimensional signature that correlates with outcome, but to become a useful biomarker, it often needs to be distilled into a simpler test (like an immunohistochemistry panel, or a PCR kit for a gene signature). There’s a process: discovery (AI finds it), then reduction to practice (choose a small set of markers that approximate the model), then prospective validation.
This takes time and rigorous testing. We’ve seen some AI-found biomarkers already make it to that level: e.g., a company called Freenome uses AI on combined genomic and proteomic blood data to find early cancer signals; their test is now in large trials.
In terms of style: one can say that AI is helping make precision medicine more precise, by finding the molecular signposts that tell us who needs which drug. It’s somewhat ironic that while blockbusters of the past treated millions with a one-size-fits-all drug, the trend now is to develop drugs paired with biomarkers that narrow down who should get them – smaller populations, but higher success and often higher willingness to pay (since the drug is very effective in that subgroup).
Economically, biomarkers can salvage drugs that would fail in an unselected population by identifying the niche where they work. For example, an AI could analyze a failed Phase III trial and realize that a subset of patients (maybe those with a specific inflammatory marker) did benefit – that suggests a path to “rescue” the drug by targeting that subgroup in a new trial.
Pharma companies have definitely used AI in post-hoc analyses to identify such responder subsets (though one must be careful to avoid false discovery; causal techniques help ensure it’s real and not just data dredging).
In sum, biomarker identification is a wide-ranging use case where AI’s pattern recognition prowess directly contributes to personalized medicine. The success of targeted therapies like HER2 inhibitors for breast cancer (enabled by the HER2 biomarker) has set the stage; AI is now expanding our repertoire of biomarkers from a few well-known ones to potentially hundreds of new ones across diseases.
The ultimate beneficiaries are patients – instead of trial-and-error treatment, they get the therapy likely to work for their biological profile, and they get it sooner. As for the stylistic angle, one might dryly note: In a sense, AI is making sense of the biological Babel – translating the omics and images into clinically useful signals that can guide the next generation of precision drugs.
3.5 Patient Stratification
Closely related to biomarker discovery, patient stratification is about grouping patients into subpopulations that differ in their prognosis or treatment response. In the age of “one-size-fits-all” medicine, patient stratification was coarse (maybe by disease stage or a single lab value). Now, with AI, we can stratify patients on complex patterns gleaned from data, enabling truly personalized therapeutic strategies and more efficient clinical trials.
AI-driven patient stratification takes many forms:
Clustering analyses: Unsupervised machine learning can cluster patients based on multi-dimensional data – genetics, gene expression, metabolomics, symptoms, etc. Often, this reveals that what we considered one disease is actually several distinct subtypes. For example, in diabetes, clustering of blood biomarkers and genetic traits unveiled distinct subtypes beyond the classic Type 1 and Type 2 – such as a subgroup of Type 2 with severe insulin deficiency versus one with obesity-related insulin resistance, with different complication risks.
In psychiatry, ML on symptom patterns and maybe imaging or genetics can divide depression into subtypes that respond differently to meds (some may have inflammatory markers high, suggesting an immune-related depression subtype). These insights were hard-won without AI, as the interactions are multi-factorial.
Risk scoring: Stratification often involves predicting risk or likely outcomes. AI can produce risk scores (essentially continuous stratification) for events like disease recurrence or mortality. For instance, an AI might integrate tumor genomics, patient age, and treatment details to output a risk score of cancer relapse in 5 years. Patients above a threshold might get more aggressive therapy or closer follow-up, while those below could be spared chemo. Traditional risk scores exist (e.g., 10-year cardiovascular risk calculators), but AI is making them more dynamic and fine-tuned by incorporating more data (like imaging of coronary arteries, genomic risk scores, etc. into one model).
A live example: polygenic risk scores for heart disease – AI can combine millions of genetic variants into a single risk stratifier that identifies individuals in the population who have the risk equivalent to someone a decade older (drugtargetreview.com).
Therapy response stratification: The holy grail in many diseases (cancer especially) is identifying upfront who will respond to a treatment and who won’t, thus sparing non-responders the side effects and switching them to something else. AI is used to build predictive models: feed in a patient’s data, output a probability of response to Drug A vs Drug B. For instance, some startups use deep learning on pre-treatment biopsy images plus molecular data to predict if immunotherapy will work for a given patient’s tumor.
Those models essentially stratify patients into likely responders and likely non-responders. Clinically, this can guide therapy choice, as seen with emerging AI tests in oncology that advise on chemo vs targeted therapy decisions beyond what single gene mutations can tell us.
Clinical trial stratification: In trial design, AI can stratify to ensure balanced groups or identify enrichment factors. By training on previous trial data, an AI may find that patients with a certain profile drive most of the drug’s benefit. Stratifying enrollment to include more of those, or at least stratifying randomization on that feature, can make results clearer. This overlaps with what we discussed in Section 2.6 (using AI in trials).
Regulatory agencies now often expect pre-specified stratification factors in trial analysis if known predictors exist. AI is helping find those predictors. For example, in a heart failure trial, an AI model might reveal that patients with high blood pressure respond differently to the drug than those with low BP, so in a follow-up trial, “baseline blood pressure” could be a stratification factor ensuring equal representation and separate analysis in high vs low BP strata.
Social determinants and beyond: AI can stratify based not just on biology but on factors like socioeconomic or environmental exposures that affect health outcomes. While not traditionally in pharma’s purview, these factors influence patient adherence and outcome. Some advanced models include these to better stratify risk or tailor interventions (e.g., identifying patients who might need extra follow-up due to social risk factors).
One of the most prominent successes of AI stratification is in rare diseases, where phenotypic heterogeneity is large. AI can cluster patients by similarity to identify who truly has the same condition. This was done in one case for autism spectrum disorder: unsupervised ML on a large battery of behavioral and neuroimaging measures suggested there are multiple “autisms” with different underlying biology. That could eventually lead to stratified interventions (some kids benefiting more from behavioral therapy, others maybe from microbiome-based treatments, who knows – but at least you have the subgroups delineated).
From a neutral perspective, patient stratification by AI is essentially making medicine more granular. Instead of treating diseases, we treat subgroups or even individuals as a “segment of one.” The challenge, however, is that the more granular you get, the fewer patients per group, which complicates drug development economics. That’s why stratification needs to be done smartly – you want to find the meaningful subgroups that you can act on. While in theory every patient is unique, in practice grouping them into a few actionable strata is both an art and a science, one where AI is aiding the science part.
Interpretability in stratification is critical too. If AI says “there are 5 clusters of patients,” clinicians need to characterize those clusters in plain terms (“Cluster 1: young, metabolically healthy, Cluster 2: older with inflammatory phenotype,” etc.) to make use of it. There’s a back-and-forth between AI and experts to label and validate these clusters. It’s not uncommon that initial AI clusters are refined after seeing clinical meaning (sometimes two clusters might be merged if they were not clinically distinct, or a cluster might be split further if outcomes within it differ).
In terms of outcome, we’re seeing stratification pay off: drugs that fail in an unselected population sometimes succeed in a stratified one. For example, a certain cancer drug might have no significant benefit overall, but in patients with a certain gene expression signature (identified via ML) it actually improves survival a lot. If that signature can be prospectively identified, the drug can be approved for that subgroup. This is increasingly how oncology drugs get approved – alongside a companion diagnostic biomarker. AI is expanding that companion diagnostic concept beyond single mutations to complex signatures.
In summary, patient stratification by AI is making possible a more nuanced approach to therapy, where we acknowledge not all patients with the same diagnosis are alike. It’s like going from broadcasting one radio signal to the whole population to narrowcasting targeted signals to various subpopulations. By identifying those segments rigorously, AI enables both more efficient drug development (targeted trials) and better patient care (tailored treatments). One might say the industry is moving from treating “the average patient” to treating the right patient, and AI is the compass guiding that journey.
3.6 Knowledge Graphs and Literature Mining
The sheer volume of biomedical literature and data is overwhelming – millions of papers, clinical trial reports, patents, and databases. Knowledge graph construction and literature mining are AI-driven efforts to structure this information and extract insights that no single human could synthesize. In pharma, these techniques are employed to support everything from target discovery to competitive intelligence.
Text mining has long been used to pull facts from papers (like “Gene X interacts with Protein Y”). What’s different now is the scale and sophistication. Modern NLP (natural language processing), especially large language models fine-tuned for biomedical text, can read and interpret scientific text with some level of understanding. For example, NLP can scan all of PubMed to find associations: “What diseases has this gene been mentioned in connection with?” or “List all compounds that inhibit this target.” In 2025, tools like EBI’s Europe PMC text mining and commercial ones from startups can answer these queries pretty well, backed by extracted knowledge graphs.
A knowledge graph (KG) is basically a network where nodes are entities (drugs, genes, diseases, phenotypes, etc.) and edges are relationships (e.g., “inhibits”, “causes”, “treats”). Pharma companies build proprietary KGs combining internal data and public info to serve as a centralized knowledge base. AI helps populate these graphs by reading literature and databases. One open example is OpenBioLink, a benchmark KG that integrates multiple biomedical databases to evaluate link prediction models (academic.oup.com). Within companies, KGs often have tens of millions of nodes/edges covering known biology.
The value comes when running graph algorithms on these KGs. For instance, link prediction (like we discussed in repurposing) can suggest new plausible edges (e.g., “Drug A – treats – Disease B?”) that weren’t explicitly in the graph but are inferred. Or one can do graph-based clustering: find communities in the graph (say a cluster of genes and diseases all interlinked could imply a disease pathway). Some companies use KGs to power their AI target selection: they might rank targets by graph connectivity measures (like PageRank or embeddings) that correlate with being critical nodes in disease sub-networks.
An example of knowledge graph use: A few years ago, a startup (twoXAR, now Aria Pharmaceuticals) used an algorithm on an integrated KG to identify a new target for a liver disease and even found a drug candidate in months. They basically automated the literature review and hypothesis generation process. Another example is knowledge-driven gene prioritization: given a disease, a KG traversal might score which genes are most connected to known aspects of the disease, surfacing novel genes for validation.
Literature mining can also help with things like mode of action elucidation. Suppose you have a phenotypic screen hit (compound that works but target unknown). AI can take the compound structure, find similar compounds or mentions in literature, and suggest possible targets or pathways (maybe it finds that similar compounds often mention a particular pathway, hinting at the mechanism).
Another area is patent mining: companies use NLP to scan patents for mentions of targets or chemical structures. An AI might alert, “Competitor X has a patent mentioning our target Y with a new series of compounds” – valuable intel.
AI models like BERT (BioBERT, SciBERT) or more specialized like GPT-style for texts (and newer ones in 2025 possibly akin to ChatGPT but trained on biomedical knowledge) allow more semantic search. Instead of keyword matching, you can ask a complex question: “What evidence is there that IL-17 is involved in uveitis?” and the AI will bring relevant snippets from papers, possibly even summarizing them. This helps R&D teams not miss important findings across the avalanche of publications.
One might mention the challenge of knowledge maintenance: new papers come out daily, how to keep the KG updated? AI pipelines are set up to continuously ingest new literature and update relationships. For example, every week, scan new PubMed entries: if a sentence like “We show that inhibiting ABC kinase reduces fibrosis in a mouse model” appears, the pipeline adds an edge (ABC kinase – causal role – fibrosis) with reference (thelancet.com). Over time, a richly annotated graph emerges.
Question answering systems in pharma R&D are also being built on these KGs. Scientists can query in natural language and the AI system uses the KG and source texts to give answers with citations (similar to how this very response is constructed!). This is blurring into the realm of interactive AI assistants for scientists – experimental but being piloted in some companies.
Now, one must be cautious: automatically extracted knowledge can contain errors (e.g., a negation misread by NLP could assert the opposite of what a paper said). Therefore, many companies have a human curation step for critical knowledge or at least a confidence scoring mechanism.
The benefit of knowledge graphs is particularly felt in multi-disciplinary areas like Systems pharmacology: linking chemistry to biology to clinical outcomes. A KG might connect a drug to a protein to a pathway to a side effect, thereby enabling something like side effect prediction or mechanism hypotheses. There was a well-known KG called SIDER for side effects, linking drugs to side effects. ML on that was used to predict unknown side effects by completing the graph (if two drugs causing similar side effects both bind an off-target, predict that off-target involvement for similar new drugs, etc.).
Also, as the review by Tekade et al. shows in Figure 4【64†】, leading pharma companies form associations with AI organizations – many of those partnerships revolve around building or leveraging knowledge graphs. E.g., Roche with Owkin and XtalPi, Pfizer with IBM Watson (which attempted a KG+QA approach for immuno-oncology), Sanofi with Exscientia and Healx, etc. The collaborative work often mentions “knowledge graphs, NLP, and combination of data for target discovery”.
One interesting anecdote: IBM’s Watson for Drug Discovery was an attempt to do large-scale literature mining and hypothesis generation. It famously had both successes and failures. It did suggest some interesting targets in Parkinson’s that were later experimentally validated, but as a product it didn’t gain huge traction likely due to integration challenges. Still, it paved the way for next-gen startups focusing on narrower but deeper knowledge extraction tasks.
In the style of an analytical commentary: While not as glamorous as designing a drug, the grunt work of reading and connecting the dots across biomedical knowledge is where AI quietly excels. It’s turning what used to be “reading stacks of journals” into a more computable query. One might note with mild irony that in the early 2000s, information overload was a lament of scientists, yet two decades later, we’ve unleashed even more data but also smarter algorithms to tame it – albeit the algorithms themselves require careful handling.
To sum up, knowledge graphs and literature mining are the digital librarians and analysts of pharma R&D, ensuring that no pertinent piece of knowledge remains buried in the noise. They provide the connective tissue between disparate findings, often revealing serendipitous insights – like repurposing opportunities or new target-disease links – that fuel innovation.
3.7 Real-World Evidence Analysis
Real-World Evidence (RWE) refers to insights derived from real-world data – such as electronic health records (EHRs), insurance claims, patient registries, wearable devices, and other sources outside of controlled trials. In pharma, RWE has become increasingly important for understanding how drugs perform in routine clinical practice, for pharmacovigilance (safety monitoring), and for supporting regulatory and reimbursement decisions. AI is instrumental in unlocking the value of these messy, heterogeneous datasets.
One major application is in safety signal detection. Traditionally, adverse event reports (like to FDA’s FAERS database) were analyzed with disproportionality algorithms to flag unusual reporting rates of side effects. Now, AI can cast a wider net, analyzing EHRs and even doctors’ notes to identify potential side effects that might not be formally reported. For example, an NLP system might scan millions of clinical notes to find that patients on Drug X often have mentions of Symptom Y soon after, more so than similar patients not on Drug X. This can lead to earlier detection of rare or long-term side effects.
The FDA’s Center for Drug Evaluation and Research has been actively exploring ML for pharmacovigilance, and a 2024 final guidance on RWD quality included considerations of AI tools to process EHR and claims data (mwe.com).
Comparative effectiveness is another area. AI can emulate trials by finding treated vs control cohorts in observational data and adjusting for differences (as discussed under causal inference). For instance, if a new cancer drug is launched, long before any head-to-head trial is done, one can use ML to compare outcomes of patients on the new drug vs similar patients on standard therapy, to glean early evidence of whether it’s better in some way (effectiveness, or perhaps different side effect profile). During COVID, real-world studies often informed treatment guidelines faster than trials (e.g., noticing in large hospital data that steroid use was associated with lower mortality in severe COVID, which later was confirmed in trials). AI played a role in many of those analyses, especially to correct for confounding in the absence of randomization.
Patient journey analysis: AI can track patterns in patient histories to find milestones or issues. For example, in diabetes, an RWE analysis might find that patients tend to switch medications or escalate doses about 6 months before an A1c level spike – indicating perhaps earlier intervention is needed. For pharma, understanding patient adherence and outcomes is crucial: AI can help identify why patients discontinue a medication (maybe side effect mentions in notes, or pharmacy fill patterns), or predict which patients are at risk of non-adherence, enabling interventions like nurse follow-ups.
External control arms in clinical trials (discussed in section 2.6) are a form of RWE usage: using historical patient data or parallel registry data to supplement a trial. AI is needed to properly match these external patients to the trial participants (to ensure apples-to-apples comparison).
Market access and value evidence
Health economics and outcomes research (HEOR) teams use RWE to demonstrate a drug’s value in the real world (like reducing hospitalizations or improving quality of life). AI models can predict the total cost offset by a drug by analyzing claims data. For example, an AI might show that patients on Drug A have 20% fewer ER visits than matched patients on Drug B, a compelling point for payers. This involves analyzing many variables, which ML is well-suited for, and dealing with missing data, etc.
Subpopulation identification in RWE: Sometimes clinical trials might not detect a benefit in the overall population, but RWE could hint at a benefit in a subgroup under real-world conditions. AI can search for such subgroups. For instance, a drug that failed to show broad benefit might actually be helping patients with a specific biomarker or comorbidity in practice – if the data exists and the AI can parse it out. This can inform label expansions or revisions.
Digital phenotyping: New types of RWE like data from smartphones or wearable sensors require AI to interpret. For instance, Parkinson’s disease progression can be monitored via a smartphone app tracking movement and speech. AI models turn those raw sensor readings into clinically meaningful endpoints (e.g., an “activity score” or detecting early tremor). These can then serve as RWE endpoints in studies or for regulatory submissions. The FDA’s 2024 guidance on digital health technologies and AI acknowledges such novel endpoints.
One interesting development is causal AI for RWE (mixing section 2.7 and RWE): using causal inference to not just observe but simulate interventions in RWD. E.g., using AI to simulate “What if this patient had started Drug X 3 months earlier?” If robust, that could answer questions even trials didn’t (like optimal timing of therapy). We aren’t fully there, but initial attempts exist (like the AI-PBPK models for dosage decisions (ascpt.onlinelibrary.wiley.com), or causal forests to estimate individual treatment effects from RWD).
Regulators increasingly accept RWE as part of evidence, especially for label expansions and post-market studies. In 2023, the FDA approved a new indication for a cancer drug partly based on RWE (data showing it worked similarly in the broader population as in trials). They scrutinize the methods, which is where transparent AI methods are key. Properly done, AI analysis of RWD can meet regulatory standards – for example, using a predefined protocol and statistical analysis plan with ML on claims data to confirm a drug’s outcome benefit.
Public health: RWE analysis by AI also helps in signals like identifying comorbidities or long-term disease outcomes. A high-profile case is using ML on health records to understand long COVID – by clustering symptoms and lab results, AI helped define what long COVID looks like and what factors predict it. Another example: AI analysis of UK Biobank (a large population study) found novel risk factors and gene-lifestyle interactions that inform disease prevention strategies.
We should not forget privacy and bias issues: RWE is often observational and may reflect healthcare disparities. AI can inadvertently learn those biases (e.g., if certain minority groups got less access to a drug, the AI might misinterpret outcomes). So careful bias analysis and fairness adjustments are needed – something that’s a topic of discussion in 2025 when deploying AI on RWD.
Real-world data is messy and not randomized, so drawing causal conclusions from it is like “divining truth from an imperfect mirror” – requiring statistical wizardry (which AI provides, with caution).
All in all, the use of AI in real-world evidence analysis is about bridging the gap between trial efficacy (does it work in ideal settings?) and real-world effectiveness (does it work in actual practice?). It’s adding a layer of evidence that is more representative of everyday patients, which regulators and payers care about. By intelligently analyzing RWD, AI provides insights that can refine treatment guidelines, improve drug safety, and demonstrate drug value beyond the controlled bubble of clinical trials.
In pragmatic terms, as of 2025, every major pharma company has an RWE analytics group, and those groups are heavy users of AI/ML. They mine vast insurance claims and EHR datasets via partnerships (like Flatiron Health data for oncology, or Optum’s claims data) to support their products’ lifecycle. For example, after launch, they will track with AI if any unexpected safety issues pop up, or if patient outcomes align with trial results, adjusting strategy if not.
To tie it up: AI in RWE turns the messy mosaic of routine healthcare into actionable insights, making drug development and use not just about the pristine lab and trial results, but about how interventions truly play out in the doctor’s office and patient’s life. It’s converting what was once anecdotal or fragmented evidence into quantifiable, large-scale evidence, thereby completing the evidence continuum from bench to bedside and beyond.
4. Getting Started with AI in Pharma: Datasets, Models, and Toolkits
For researchers and innovators looking to dive into AI for drug discovery, there is now a rich ecosystem of public datasets, pre-trained models, and open-source toolkits to leverage. Unlike a decade ago when one had to scrape together data and write algorithms from scratch, in 2025 much of the infrastructure is readily available – democratizing the field. This section provides a guided tour of key resources to get started.
4.1 Public Datasets for AI in Pharma
Molecule datasets
A foundational resource is MoleculeNet, a curated collection of datasets for molecular property prediction tasks (part of the DeepChem project). MoleculeNet includes classic datasets like QM9 (quantum chemistry properties for small molecules), ESOL (solubilities), FreeSolv, Lipophilicity, HIV activity, BACE, and larger ones like Tox21 (toxicology outcomes) and SIDER (drug side effects). These datasets come in standardized format and with benchmark splits, providing an easy starting point for training or evaluating models (deepchem.readthedocs.io).
For chemical structures, ZINC is a go-to dataset – a library of ~1 billion purchasable compounds (with subsets of various sizes). A notable subset is the ZINC15 250k or the new ZINC20 collections, often used for generative model training. Indeed, the popular ChemBERTa model was trained on 77 million SMILES from ZINC (deepchem.io).
ChEMBL is another crucial dataset: a database of >2 million compounds with bioactivity data against thousands of targets (IC50s, Ki, etc.). ChEMBL is essentially the bread-and-butter for QSAR modelers; it’s publicly accessible and regularly updated.
Protein and genomics datasets
For protein modeling tasks, AlphaFold Protein Structure Database (from DeepMind and EMBL-EBI) provides predicted structures for virtually all catalogued proteins (over 200 million) – an unprecedented resource for those doing structure-based design or protein property prediction. On the genomics side, TCGA (The Cancer Genome Atlas) and GTEx provide rich data linking genotype to phenotype (e.g., tumor -omics data paired with outcomes), useful for multi-omics integration modeling.
Drug-target interaction datasets
BindingDB, STITCH, and the Davis and KIBA datasets are commonly used for training DTI predictive models. These contain measured binding affinities of drug-like molecules to protein targets. The Drug Target Commons (DTC) is another emerging crowd-sourced database of bioactivity.
ADMET datasets
There are compilations like ADMETlab’s open dataset (as referenced in their publications), and Tox21 and ToxCast for toxicity. Also, ClinTox (from MoleculeNet) which has drugs labeled by whether they failed approval due to toxicity or not – a neat classification set for toxicity prediction (pmc.ncbi.nlm.nih.gov).
Clinical and RWE datasets
These can be trickier due to privacy, but some de-identified datasets are available for research. For example, MIMIC-III/IV (ICU records) and eICU are open EHR datasets for developing healthcare ML models (pmc.ncbi.nlm.nih.gov). OHDSI provides some datasets in OMOP common format for observational studies. And FDA’s FAERS database of adverse events is public (one can build AI models to detect safety signals from it).
Knowledge graphs and ontologies
Datasets like OpenBioLink (academic.oup.com) or BioKG are available for benchmarking link prediction and multi-modal reasoning. Also, ontologies like Gene Ontology, Disease Ontology, and DrugBank knowledge can be pulled in to add structured information to models.
In summary, newcomers don’t need to spend months collecting data – they can tap into these public troves. The DeepChem library even provides easy loaders for many of these datasets (MoleculeNet is integrated, etc.).
4.2 Models and Pre-trained AI in Pharma
The rise of pre-trained models has accelerated AI projects. Rather than training from scratch, researchers can fine-tune existing models that have already learned relevant patterns.
For molecules, ChemBERTa is a prime example – a transformer model trained on millions of SMILES, available on Hugging Face Hubdeepchem.io. You can download “seyonec/ChemBERTa-zinc-base-v1” and fine-tune it on your specific task (e.g., toxicity prediction). Another is MolBERT by AstraZeneca, or Graphormer (a transformer that handles molecular graphs, from Microsoft).
The HuggingFace Hub now hosts a section for chemistry models (huggingface.co), including not just ChemBERTa but also models like GPT4Mol (a GPT-2 style model for molecule generation) and DiffStereCNN for stereochemistry.
For proteins, ESM-2 (from Meta AI) is a state-of-the-art transformer pre-trained on protein sequences of massive scale. It can be fine-tuned for tasks like predicting mutation effects or annotating proteins. Similarly, models like ProtBERT and TAPE are accessible. And of course, AlphaFold2 model weights are available, so one can run their own structure predictions or fine-tune (though AF2 is a complex model not trivial to fine-tune).
For knowledge and text, BioBERT and PubMedBERT are BERT models pre-trained on biomedical literature. These are great for tasks like extracting relations or classifying text (e.g., classifying sentences as “drug-target interaction” vs not). More advanced, GPT-3-like models (e.g., Meta’s Galactica trained on scientific text, or Google’s SciML models) can be used with prompting for tasks like summarizing papers or generating hypotheses, though these are mostly research prototypes as of 2025.
One should not overlook classical baselines: e.g., RDKit’s descriptors + random forests often perform decently on many chem tasks – a reminder to compare fancy models with simple ones. But if you want deep learning, these pre-trained models are big time-savers.
Reinforcement learning frameworks
Open source libraries like RLGym or OpenAI Gym environments for molecules (e.g., MolGym) allow you to plug in molecules as environments. There’s also Microsoft’s GuacaMol: an open benchmarking environment for generative chemistry, which includes a set of goal-directed generation tasks and even some reference implementations of various generative models and RL methods to test.
4.3 Open-Source Toolkits and Libraries
The ecosystem of libraries specialized for chemistry and biology has grown:
Category | Tools & Frameworks | Purpose / Use | Link |
---|---|---|---|
Protein Folding | AlphaFold, OpenFold | Predict 3D structures of proteins | AlphaFold, OpenFold |
Molecular Embeddings | ChemBERTa, MolBERT, Mol2Vec | Molecular representations using NLP techniques | ChemBERTa, Mol2Vec |
Cheminformatics | RDKit | Molecular fingerprints, similarity, structure manipulation | RDKit |
Drug Discovery Libraries | DeepChem | Open-source ML toolkit for chemistry | DeepChem |
Chemical & Patent Data | ChEMBL, SureChEMBL | Bioactivity and patent-linked compound databases | ChEMBL, SureChEMBL |
Biomedical NLP | BioGPT, PubMedBERT | Understanding and generating biomedical text | BioGPT, PubMedBERT |
DeepChem
A comprehensive Python toolkit for machine learning in drug discovery (huggingface.co). It provides ready-to-use functions for featurizing molecules (SMILES to fingerprints, graph convolutions, etc.), loading datasets (MoleculeNet), and modeling (various neural nets, including graph neural nets, multitask networks, etc.). DeepChem’s documentation and community make it a great starting point for students and professionals alike.
RDKit
The Swiss army knife of cheminformatics (in C++ with Python bindings). RDKit can do almost everything with chemical structures: parse SMILES, generate 2D & 3D structures, compute descriptors, perform substructure searches, and even do simple docking and pharmacophore searches. It’s essential for data preprocessing – e.g., sanitizing molecules, enumerating stereochemistry, etc. Many AI workflows will use RDKit under the hood (e.g., to get input features or to validate generated molecules). Given its importance, RDKit is actively maintained and free (pmc.ncbi.nlm.nih.gov).
PyTorch Geometric (PyG)
A library extending PyTorch for graph-based deep learning. It’s widely used for building and training GNNs on molecular graphs. With a few lines, you can implement a Graph Convolutional Network (GCN), Graph Attention Network (GAT), or Message Passing Neural Network (MPNN) for molecular property prediction. PyG also has some built-in datasets and example scripts specific to chemistry.
DGL (Deep Graph Library)
Another library for graph neural networks (with a focus on scalability, also has chemistry extensions). DGL’s life science package (DGL-LifeSci) provides pre-trained GNNs and functionalities specifically for molecular graphs, protein-ligand complexes, etc.
OpenBioLink framework and Neo4j or GraphDB: If you want to construct and query knowledge graphs, tools like Neo4j database or OWL/RDF frameworks can be useful. For machine learning on KGs, libraries like PyTorch-BigGraph or DGL-KE (for knowledge graph embeddings) are available.
Hugging Face Transformers
Not just for text – HF’s framework makes it easy to fine-tune models like BERT or GPT-2 on custom data. As mentioned, they host chem and bio models. They also have the datasets library which might already contain some biomedical datasets packaged conveniently.
BioPython / BioJava
Classic libraries for bioinformatics (sequence analysis, etc.) which can complement AI tasks by handling file formats (FASTA, PDB) and doing basic bio-calculations (like sequence alignment).
scikit-learn
For simpler machine learning tasks or prototyping, scikit-learn is still extremely handy (and fast in many cases). Many times a quick random forest or SVM baseline from scikit-learn on Morgan fingerprints (from RDKit) is a wise first attempt.
Keras/TensorFlow
If one prefers that over PyTorch, there are libraries like KerasChem or examples in the community for chem applications. But PyTorch has become more popular in research.
Reinforcement Learning libraries
e.g., OpenAI Gym has some envs for molecules (in GuacaMol or you can wrap RDKit to do molecule modifications as an env). Also, RLPyt or Ray RLlib are general RL libraries that can be adapted for molecule generation tasks.
Federated learning frameworks
For those dealing with sensitive data (like multi-site hospital data), frameworks like TensorFlow Federated or PySyft (OpenMined) exist to do privacy-preserving ML. This is an emerging area, relevant if one tries to train models on clinical data across multiple institutions without sharing patient-level data.
Visualization tools
Essential for interpreting results. Libraries like ChemDraw or Marvin for manual structure drawing; RDKit has simple plotting for molecules; PyMol or UCSF Chimera for 3D protein-ligand visualization (useful when analyzing docking or structure-based model outputs).
DeepChem or MOSES
For generative model evaluation, MOSES is a benchmarking platform for comparing molecule generators, which comes with metrics and a dataset.
Informatics platforms
If not coding from scratch, there are also GUI-based or workflow-based tools like KNIME with AI nodes (some use RDKit and TensorFlow inside), which allow assembling ML pipelines visually – a good option for those less comfortable coding, or for rapid prototyping.
4.4 Communities and Learning Resources
Getting started also means tapping into knowledge. The DeepChem community (via forum and Slack) is very welcoming and regularly hosts tutorial sessions. Likewise, Kaggle has had competitions on drug discovery (Merck’s 2012 one, more recently a few on mechanisms of action, etc.) and hosts notebooks where one can see example solutions.
The literature is rich with tutorials: e.g., “Hands-On Graph Neural Networks for Cheminformatics” is a blog series, or “Deep Learning for the Life Sciences” (a O’Reilly book co-authored by DeepChem’s creator) provides an excellent practical introduction. Many conferences (NeurIPS, ICML) have workshops on AI for science where tutorial materials are shared.
In summary, for a newcomer, a feasible path is:
- Set up environment: e.g., install RDKit and DeepChem (conda makes this easy).
- Pick a dataset (maybe from MoleculeNet) and a target property.
- Try out a baseline model in DeepChem (they have examples, e.g., training a GraphConvModel on tox21).
- Explore pre-trained: e.g., load ChemBERTa via HuggingFace and fine-tune it with your dataset.
- Evaluate and iterate – compare with a simple random forest to sanity-check.
- Use toolkits: if it’s a graph model, try PyTorch Geometric for more flexibility or to test new architecture ideas.
- Visualize: use RDKit to inspect molecules that model gets right vs wrong, etc.
By leveraging this ecosystem of data and tools, even small teams or academic labs can contribute meaningfully to AI in pharma. The barrier to entry has lowered significantly. It’s akin to how in the software world, open-source libraries allow startups to build complex systems quickly; in AI-driven drug discovery, these datasets and models let researchers build on prior successes rather than reinventing the wheel. The result is faster progress – which is much needed, given the high stakes and high costs in this industry.
5. Leading Companies and Players in AI-Driven Pharma
The AI in pharma landscape is populated by a mix of nimble startups and big pharmaceutical companies (often in partnership with tech firms), all vying to translate AI capabilities into real drugs and tangible ROI. In this section, we survey some of the leading companies up to 2025, categorizing them by their primary domain of focus, and include a look at the international scene (particularly China and Japan). Rather than an exhaustive directory, we’ll highlight representative players and what sets them apart (or not).
5.1 Generative Drug Design Startups
Exscientia (UK) – Often mentioned as a trailblazer, Exscientia combines deep learning models (like their Centaur Chemist platform) with a strong human-AI iterative design loop. They have had multiple AI-designed molecules reach clinical trials, including a Phase I immuno-oncology candidate and others in immunology. Exscientia’s strategy involves partnerships with big pharma (Sanofi, Bayer, BMS) and applying their platform to both small molecules and bispecific small molecules (a unique angle).
They exemplify the “AI-first biotech” that can deliver drug candidates faster; one partnership announced that Exscientia delivered a candidate to IND in <12 months, which traditional methods rarely achievelabiotech.eulabiotech.eu.
Insilico Medicine (US/HK)
Insilico is known for its generative chemistry platform (Chemistry42) and multi-omics target discovery engine (PandaOmics). They made headlines by nominating an AI-designed fibrosis drug (targeting DDR1 kinase) and taking it to clinical trials in ~30 months (nature.com). They also recently announced a AI-discovered target for chronic disease and have a pipeline of around 20 programs. Insilico straddles East and West, with operations in both North America and China. It raised substantial funding (over $400M cumulatively) and collaborates with both western pharmas and Chinese firms. Insilico’s CEO has claimed their AI reduces cost by “up to 60%” in preclinical stages (likely optimistic, but indicative of their ambition).
Atomwise (US)
One of the earliest AI drug startups (founded 2012) focusing on structure-based design. Their forte is a CNN model for binding (“AtomNet”). They have dozens of discovery collaborations (including with big names like Eli Lilly) and typically work on challenging protein-protein interaction targets. Atomwise raised >$170M and built a pipeline with partners. They lean on virtual screening and optimization using AI, more akin to improved docking on steroids.
BenevolentAI (UK)
Initially focused on knowledge graphs and target identification (they famously discovered baricitinib for COVID repurposing), Benevolent also built generative chemistry capabilities. They partnered with AstraZeneca on several targets (one being chronic kidney disease). However, Benevolent experienced ups and downs: after a high-profile SPAC listing in 2022, their valuation dropped and they underwent restructuring. Still, they have a sizable pipeline (a Phase II in atopic dermatitis ongoing) and a powerful AI platform named the Benevolent Platform™ that integrates data for target-to-drug. They are a case study in how hard it is to convert AI hype into clinical success – as of 2025 no AI-designed drug of theirs is approved yet, but they have some mid-stage trials.
Schrödinger (US)
A bit unique: an established computational chemistry firm that IPO’d in 2020. They blend physics-based modeling with machine learning. Schrödinger’s software is widely used (for docking, molecular dynamics, etc.), and they also run their own drug pipeline (often in partnerships). They had success with a CDC7 inhibitor (with Lilly) and others in early trials. While not “AI-only”, they epitomize the synergy of AI + physics. They emphasize their platform can design better drug candidates by enumerating and scoring massive libraries (billions) with a combination of ML-augmented physics (biospectrumasia.com).
Recursion (US)
Recursion started with phenotypic screening (using computer vision on cell images to find drug effects) and has built a massive dataset of cellular images under perturbations. They now incorporate transcriptomics and generative models (they developed an LLM called “LOWE” to query their internal data). Recursion’s bold move was acquiring two chemistry AI startups (Cyclica and Valence) in 2023 to beef up its generative chemistry. They also formed partnerships (Bayer, Roche/Genentech).
They even merged/acquired Exscientia, according to Labiotech’s article (though in reality that “merger” might be a reporting error – indeed Recursion did not merge with Exscientia, they collaborated separately with Roche and Bayer). Recursion’s pipeline includes candidates in oncology and rare diseases (REC-994 for cavernous malformation is in Phase IIlabiotech.eulabiotech.eu). They’re a public company with significant funding and one of the few doing AI at scale in-house (over 500 people).
XtalPi (China/US)
XtalPi combines AI and quantum physics for drug design (notably for predicting crystal structures and other solid-state properties as well, which is key in formulation). They have partnerships with Pfizer, and some Chinese pharmas. XtalPi raised a whopping ~$400M in 2021, including from Tencent. Their platform is applied not just to binding affinity but also to things like solubility and polymorphism risk (hence the name Xtal = crystal). They exemplify Chinese tech integration into pharma – by 2025 they reportedly have multiple compounds in preclinical stages, and they partnered with EDDD in Singapore to leverage automation (biospectrumasia.com).
Other notable generative or discovery startups
Zebra Medical, Euretos, BioSymetrics, Cyclica (which was acquired by Recursion), Valence (also acquired by Recursion), Black Diamond (AI-driven allosteric modulator discovery), DeepGenomics (for RNA therapies), Peptidream (Japanese, doing peptide discovery with AI assistance), InSilico Medicine (mentioned), Standigm (Korea, does target and generative), Syntekabio (Korea, with an AI platform and even an AI-designed immunotherapy in trials).
5.2 AI for Clinical Trials and Digital Health
Unlearn.AI (US) – Focused on creating “digital twin” control arms for clinical trials using AI (unlearn.ai). They have collaborations and regulatory interactions (EMA’s qualification opinion). Unlearn’s platform (called TwinRCT) was recently updated (v3.0) and aims to cut trial size/time by ~30%. They raised significant funds and are one of the leaders in applying AI to clinical trial design. Many mid-tier pharmas are testing their approach in Phase II trials to augment control data.
Concerto HealthAI/ConcertAI (US)
Works on oncology real-world data and AI to inform trials and regulatory submissions. They partner with major pharma (e.g., a big one with FDA’s NCI on synthetic controls). They’re not about drug design but about evidence generation.
Tempus (US)
A company initially focusing on genomic sequencing of cancer patients, but has grown into an AI-driven precision medicine company. They amass a huge clinico-genomic dataset, and build AI models to stratify patients, match trials, etc. Tempus has deals with many pharma for trial recruitment (predicting which patients are likely eligible for a given trial) and for identifying new targets. They also forayed into drug discovery by creating Targeted Oncology spinoffs.
Verge Genomics (US)
Uses AI on human transcriptomic data (and in vitro models) to discover drugs for CNS diseases (ALS, etc.). They keep target discovery and drug development in-house. By 2025, they had an ALS drug in Phase I – notable because it’s an AI-identified target and compound for a disease with many failures. Their focus on human data (vs animal) and network-based target ID is a selling point.
Adaptive clinical trial tech
Companies like Phesi, Cytel (through acquisitions of AI startup), etc., incorporate AI to simulate trials, predict enrollment, etc., though they are more service-oriented.
Digital therapeutics and AI
Not a direct drug, but some players like Biofourmis use AI to monitor patients and adjust therapy – sometimes part of drug+device combos (like using AI to optimize heart failure therapy dosing). Pharma partners with such companies to extend their product’s value (e.g., adding an AI app to a drug to improve adherence or outcomes).
China’s clinical AI
Companies like Ping An Good Doctor or Yidu Cloud delve into AI for medical data in China. Xinhua (Shanghai) Hospital’s AI worked on trial patient stratification. These often partner with multinational pharmas for China-centric studies, given the large patient data volume there.
5.3 Knowledge and Data Platform Companies
IBM Watson Health (US)
IBM’s grand plans with Watson in oncology (Watson for Oncology) and drug discovery had mixed outcomes. Watson for Drug Discovery (target ID tool) did identify some targets (like p53 pathway regulators), but IBM scaled back after limited commercial uptake (pubmed.ncbi.nlm.nih.gov).
They sold parts of Watson Health in 2022. Still, IBM’s research wing is involved in partnerships (e.g., with Cleveland Clinic on modeling diseases, or with companies for materials discovery). IBM’s Project Debater
tech also found its way into synthesizing literature, albeit not a major player now in pharma since the retrenchment.
OWKIN (France)
Focuses on federated learning on healthcare data and multi-modal models. Partnered with major pharmas (e.g., a $180M deal with Sanofi in 2021 for cancer AI). Owkin developed HE2RNA (to predict gene expression from histology) and other models. They’re a leading European AI biotech, blending hospital data access with pharma R&D. Owkin also launched Nanostring Bioptimus
(foundation model for biology), reflecting an effort to create large models for biomedical data. They reflect how France and EU are contributing via a federated approach respecting data privacy.
Healx (UK)
Uses knowledge graphs and AI to repurpose drugs for rare diseases. They notably progressed a repurposed combination therapy for Fragile X syndrome to Phase II. They use AI to predict synergistic effects of drug combinations. Healx’s platform (Healnet) ingests literature and omics for rare diseases largely neglected by others.
There are many others focusing on specific niches – e.g., Aria Pharmaceuticals (twoXAR) for in silico repurposing and phenotypic screening, CytoReason (using AI for systems immunology, partner with Pfizer), Anodot Health (Israel, analyzing health data for drug patterns), BioAge (AI on human aging data to find longevity drug targets, has some in clinical trials for muscle aging).
5.4 Big Pharma’s Internal Efforts and Collaborations
It’s not just startups – major pharmaceutical companies have built significant internal AI teams and also invested in external collaborations or acquisitions:
Novartis established an AI innovation lab with Microsoft, focusing on projects like cell image analysis and NLP on documents (biospectrumasia.com). Novartis also partnered with Oxford’s AlphaFold team to apply structural predictions.
Pfizer was an early collaborator with XtalPi, IBM Watson, and invested in startup Atomwise. They also do a lot in-house (Pfizer’s ML group applied causal AI on their clinical data to find new indications for approved drugs, etc.).
Roche has been very active: partnering with Flatiron Health (they even acquired Flatiron for real-world oncology data), with Recursion (as mentioned, for phenotypic screening), and with NVIDIA for an AI research collaboration on drug discovery infrastructure. Roche’s Genentech unit collaborates with various AI firms (e.g., GNS Healthcare on causal modeling for cancer, and smaller ones for specific tasks).
GSK invested heavily in AI – notably hiring a Chief AI Officer in 2017 and partnering with companies like Insilico, Exscientia (GSK had a multi-target deal), and building their own knowledge graph platform (MeRLin). By 2025, GSK has several AI-informed targets in their pipeline (they publicly mentioned an AI-found lupus target entering trials).
AstraZeneca similarly partnered with BenevolentAI and invested in internal AI for chemistry and clinical data. They also open-sourced some models (e.g., their MegaMolBART for reaction prediction).
Merck KGaA (Germany) created an in-house AI platform Synthia (for retrosynthesis) and collaborated with Palantir for data integration. They have a research collaboration with BenevolentAI too.
Takeda (Japan) is interesting: they invested in startups (like Numerate, which they acquired, and partnership with Recursion for rare diseases). They also use AI for drug repurposing (with Healx). Takeda’s approach often integrates external innovation through its venture arm.
Sanofi (France) – big partnership with Owkin for oncology and with Exscientia for up to 15 targets in oncology and immunology (that deal in 2022 was notable $100M upfront). Sanofi also has internal AI units working on omics and early discovery.
Bayer (Germany) partnered with Exscientia (since 2017) and Recursion (2020), and acquired an AI crop science company (not pharma, but shows their interest). They also worked with Schrödinger on some targets.
Johnson & Johnson – through their Janssen unit – had collaborations with BenevolentAI (for data analysis in neurodegeneration), and was active in using AI for clinical trial operations (with companies like Anthem’s HealthCore). J&J also did much on the imaging side for surgery AI.
Eli Lilly – notably partnered with Atomwise and with XtalPi, and acquired a small AI biology startup (Protomer) for diabetes. They also invest via their venture arm in AI companies like Verge Genomics and others.
AstraZeneca – beyond Benevolent partnership, they built their own graph neural network models for chemoinformatics (published “Molecular Attention Transformer”). They also partnered with Schrodinger for some targets and are heavy users of AI in clinical data (one example: using AI to analyze digital pathology in trials to identify responders).
Chinese and Japanese Firms
In China, big pharmas like Tencent (via investent), Baidu (with their PaddleHelix AI drug platform), and Huawei (cloud computing for drug AI) are involved. But specifically:
Baidu has the PaddleHelix platform – they published a tool for predicting RNA secondary structure and a few compound generation models. They also in 2022 announced a collaboration to use their AI to optimize mRNA vaccine sequences for COVID.
Tencent invested in XtalPi and Atomwise, and has an AI healthcare lab that worked on protein structure predictions and medical imaging, but not a big drug pipeline of their own.
Insilico and XtalPi (already covered) are the flagbearers in China. Also, Pine Biotech and Galixir are Chinese startups focusing on retrosynthesis (Galixir was founded by Tsinghua alumni and published in Nature on AI retrosynthesis).
Chinese pharma companies like Jiangsu Hengrui and Tencent’s YouTu Lab did some internal AI collaborations. Also, a startup Deep Intelligent Pharma focuses on NLP for literature and patented an AI-driven clinical trial design system.
In Japan, apart from Peptidream (which is more of a biotech using directed evolution & AI for peptide drugs) and Preferred Networks (which had a partnership with LSTM neural nets for drug discovery, but PFN pivoted away a bit), there’s Fujitsu working on quantum-inspired AI for drug docking (they had a project on COVID) and Sony AI exploring molecules generation (published Graph MBRL method).
Exscientia even expanded to have a office in Osaka through a partnership with Sumitomo Dainippon Pharma early on, reflecting Japanese interest – that partnership delivered a molecule DSP-1181 to Phase I in record time.
Chemicals companies like Sumitomo, Takeda, and Astellas have active AI evaluation units. For instance, Astellas partnered with an Israeli startup Quris for using AI and microfluidic “patient-on-chip” data for safety.
Given this landscape, it is evident that cross-pollination is the norm: almost every notable startup has at least one big pharma partnership or investment, and big pharmas often work with multiple AI partners on different programs (biospectrumasia.com). There’s been consolidation too (Recursion buying Valence and Cyclica, PerkinElmer buying Numerate, etc.), indicating the field’s maturation.
To conclude this company survey with an analytical tone: While a few years ago the space was crowded with hype-fueled startups, by 2025 we see winners emerging with validated pipelines (Exscientia, Insilico, Recursion, etc.), and pharma deeply embedding AI rather than treating it as a novelty.
There’s also a notable East-West convergence: Western companies bring algorithmic sophistication, while Asian companies offer scale and speed (with large data and often government support). It’s telling that South Korea and China each have dozens of AI drug companies and some of the largest drug candidate pipelines involving AI.
However, one might wryly note that not all is rosy: some AI drug companies have struggled with business models (Benevolent’s downsizing, Watson’s retreat). It underscores that delivering actual clinical success is the ultimate yardstick. As of 2025, no drug discovered solely by AI is yet approved (several in trials, none completed Phase III). The next few years will be crucial to see if these front-runners can convert AI-designed molecules into marketed medicines, truly fulfilling the promise that attracted so much investment.
For all the brainy algorithms, getting a pill to market still requires navigating the biological and regulatory realities – tasks for which even the smartest AI has more to learn. But given the alignment of big money, big data, and big compute, the companies highlighted are collectively pushing that frontier, perhaps making that first AI-discovered drug approval a matter of “when, not if”.
6. Investment Trends, Deals, and Collaborations (2023–2025)
The period 2023–2025 has seen dynamic activity in terms of venture capital (VC) funding, licensing deals, R&D collaborations, and government initiatives in AI-driven pharma. This reflects both the maturation of the field and some correction of initial overhype. Below, we outline key trends and notable events.
6.1 Venture Capital and Public Market Activity
In the venture space, after a fervent funding boom around 2018–2021, VC investment in AI drug discovery has become more selective by 2023. The total dollars remain high but are concentrated in proven players or specific hot areas (like generative AI for proteins). Some trends:
Mega-rounds for established startups
In 2023, Insilico Medicine raised a Series D extension ($60M) from notable pharma-backed funds, pushing its valuation into unicorn territory. XtalPi in late 2022 closed a big $250M+ round led by Asian tech giants (biospectrumasia.com). Recursion tapped the public markets with a follow-on offering after their 2021 IPO, partly to fund the acquisitions of Valence and Cyclica. Exscientia, which IPO’d on NASDAQ in 2021, saw its stock fluctuate but managed to raise additional capital through partnerships (Sanofi’s deal included $100M equity investment in 2022).
SPACs and market corrections
2022–2023 saw a cautionary tale: BenevolentAI went public via SPAC at a multi-billion valuation but by 2023 its market cap had shrunk dramatically (down ~90%). Similarly, smaller SPAC attempts like Valo Health were shelved. Public market investors began demanding more concrete pipelines and nearer-term revenues. This led AI biotechs to emphasize their drug candidates, not just their platform, to justify valuations.
New startups still emerging
Despite some cooling, new companies continue to be founded, often by pharma AI leaders spinning out. For instance, ex-GSK AI head started Charm Therapeutics in 2022 focusing on 3D structure-based diffusion models, which raised ~$50M. In 2023, Mosaic Therapeutics (a Cambridge UK spinout for cancer AI) raised £20M. So seed and Series A deals persist, but often with clear data or a narrow focus (e.g., one asset to develop rather than a platform promise).
Geographical trends
Asia’s share of funding has grown. South Korea’s Standigm raised significant funding (and Syntekabio listed on KOSDAQ). Chinese funds (e.g., BAI Capital, Tencent, Legend Capital) not only fund domestic players like XtalPi, Insilico, etc., but also sometimes invest in Western startups or set up joint ventures. The Middle East also dipped in – e.g., Insilico’s 2023 round included Abu Dhabi’s fund.
Analysts estimate the global AI drug discovery market was about $1+ billion in 2022 and is growing ~30% CAGR, expected to reach perhaps ~$5–10B by end of decade. This is small relative to pharma R&D spending ($200B/yr) but shows high growth and interest.
6.2 Pharma Partnerships and Licensing Deals
Nearly every major pharma has forged multiple partnerships in this space, as noted earlier. Some specifics from 2023–2025:
Sanofi–Exscientia (Jan 2022)
A $5.2B deal (including milestones) for up to 15 targets in oncology and immunology. By 2025, at least 2 molecules from this collaboration are in advanced lead optimization. Sanofi also did a $180M (including equity) partnership with Owkin to use federated learning for cancer.
Roche–Recursion (Aug 2023)
Roche’s Genentech division expanded its 2020 deal with Recursion to do phenotypic screens for additional neuroscience and cancer targets, increasing the potential value to over $300M in milestones. They also initiated a pilot using Recursion’s “Low-dose” LLM (LOWE) to query internal data.
Pfizer–XtalPi (2023)
Pfizer invested and partnered with XtalPi to apply its AI+quantum platform to an undisclosed target. A separate Pfizer collaboration with Insilico on drug design was also reported (Pfizer was an early investor in Insilico’s 2017 round).
AstraZeneca–BNV (BenevolentAI)
Extended collaboration in 2023 after initial success identifying 2 novel chronic kidney disease targets (one entered AZ’s pipeline). The continuation involved additional disease areas (systemic lupus, heart failure), showing AZ’s confidence in AI-augmented target discovery (nature.com).
Bayer–Exscientia (2023)
Post-2019 deal, in 2023 Bayer opted to advance one of Exscientia’s AI-designed leads into formal preclinical development (milestone paid). Bayer also struck a new deal with UK’s Charm Therapeutics for AI-designed oncology drugs (up to $250M).
Merck & Co (MSD) – Absci (2022)
For biologics, Merck paid Absci to use AI to optimize a therapeutic protein’s design and expression. Indicated big pharma interest beyond small molecules.
Multiple deals for smaller/medium pharmas
E.g., Antiverse (UK) with Cyclerion for AI-designed antibodies, Helixon (China) with AbbVie for protein design, etc.
A trend is deals focusing not just on small molecules but applying AI to biologics (antibodies, RNA). For instance, Generate Biomedicines (US) got $50M upfront from Amgen in 2023 to use its generative protein platform on 5 targets.
Most deals are structured with modest upfront (a few million to low tens of millions) and large backloaded milestones (hundreds of millions) – a sign that pharma is willing to pay if it works, but wants to see delivered compounds first. Pharma has become a bit more cautious: many remembered that early Watson or Numerate collaborations yielded little, so now deals often involve co-development, where the AI company does early work and the pharma leads later stages.
6.3 Mergers & Acquisitions (M&A)
We’ve seen some consolidation:
Recursion acquiring Cyclica and Valence (2023)
A notable sign of platform consolidation. Recursion spent $47M combined (mostly stock). This gave Recursion a presence in Canada (Cyclica’s base) and bolstered their chemistry capabilities – possibly indicating that having both phenotypic and generative chem under one roof is strategic.
Ginkgo Bioworks acquired Autodesk’s Molecule AI assets (2022) – to integrate small molecule design into their primarily synthetic biology platform.
BenevolentAI’s SPAC (2022) was effectively a reverse merger making it public, not an acquisition by another pharma though.
Potential rumors: talk of big pharma acquiring an AI darling has floated (e.g., “would Sanofi buy Exscientia outright?” so far they haven’t, preferring partnerships).
We might see some underperforming AI biotechs being bought for pennies on the dollar for their data or talent – e.g., if Benevolent’s fortunes don’t reverse, maybe a deep-pocket pharma could pick them up cheaply to get their knowledge graph and team.
6.4 Government and Academic Initiatives
Governments have recognized AI in pharma as strategically important:
USA
The FDA released multiple guidance documents (as noted) on AI and real-world data (mwe.com), indicating regulatory support for AI-derived evidence. The NIH launched programs like Bridge2AI (although more biomedical research-focused) and increased funding for AI-driven translational research (e.g., NCATS had an ASPIRE program encouraging AI for drug screening). In 2023, the White House OSTP held a roundtable on AI in biotech to formulate policy support.
EU
The European Commission’s Horizon Europe research funding allocated sizable grants for AI in health. For example, the “JUMP-Cell Painting” project in EU uses AI on cell imaging to predict drug properties. The EU also launched the European Health Data Space to facilitate data sharing, which indirectly fuels AI research. Countries like UK set up bodies like the London AI Centre for Valuing Health Data – and the NHS has partnered with companies (like Sensyne Health earlier, now closed, but others emerging) to ethically use patient data for drug discovery.
China
The Chinese government in its 14th Five-Year Plan (2021–2025) highlighted AI in drug discovery as a priority for national technological self-reliance. Thus, there’s support in the form of grants to companies (some local governments giving subsidies to XtalPi, etc.) and public-private partnerships. Also, China’s FDA (NMPA) has begun accepting AI-supported evidence e.g., for expanding indications, albeit cautiously.
Japa
The government’s SIP (Strategic Innovation Promotion) program had a focus on AI for biomedical innovation. A noteworthy consortium is the Japanese “AI Hospital” project linking hospital data with drug development, though more clinical AI than discovery.
Public-private consortia
Melloddy (EU project linking pharma for federated learning on chemical data), which involved 10 pharma companies using a secure AI to train on combined libraries without data sharing. Completed in 2022 with some success (showed federated models outperform single-company models for certain prediction tasks). Continuations of these collaborations are likely.
Regulatory innovation
FDA’s Project Frontier (2023) is exploring novel trial designs including external controls and AI. EMA in 2025 is expected to issue guidelines on AI in clinical trials (covering things like bias, transparency).
In a nutshell, investment is steady but more measured: the bubble of “AI will cut drug discovery to 1 year” has sobered into “AI can reduce timelines by ~30-50% in certain phases” – which is still very significant commercially. Partnerships are maturing from pilot projects to multi-year pipelines. And governments are building the rails (data infrastructure, guidance) to integrate AI into the normal course of R&D, rather than treating it as a fringe (nature.com).
One might analogize the current phase to the dot-com post-2000: after an initial boom and bust, the truly value-adding players are emerging, and incumbents (pharma) are fully embracing e-business (AI in this case) in their core strategy.
Finally, an ironic observation: some of the biggest financial wins from AI in pharma so far haven’t been from drug approvals (none yet) but from deals and stock hype. However, those metrics are shifting – by 2025 investors ask companies, “How many clinical assets do you have?
What’s your success rate so far?” not just “How fancy is your AI?”. This pressure is actually aligning interests: it pushes AI companies to prove their method by delivering actual drug candidates (not just papers and patents), and pushes pharma to provide the domain expertise and experimental validation to complement the tech.
If the investment and collaboration trends until 2025 are any indication, the next few years may well see the first fruits of these labors in terms of marketed AI-discovered therapies – which, if realized, would likely trigger another wave of investment, albeit a more justified one. As of now, the money is on the table and the partnerships are inked; the pharma-AI marriage has moved past the courtship phase into the quotidian work of co-parenting new drugs.
7. AI Architectures in Pharma: Beyond Transformers and Hype
The current AI architecture landscape in pharma R&D is rich and varied – it’s not one-size-fits-all. While transformers (the architecture behind large language models like ChatGPT) have stolen much limelight in AI at large, in drug discovery they are only one piece of a broader toolkit. Graph neural networks, variational autoencoders (VAEs), reinforcement learning algorithms, and diffusion models all play significant roles, often tackling different problems. And researchers are already looking at what comes beyond the current mainstream, especially as the limitations of generic large language models (LLMs) become apparent in this domain.
7.1 Transformers in Drug Discovery
Transformers – with their self-attention mechanism – have been game-changers in NLP, and they’ve found potent applications in biomedicine too. For example:
Sequence-based transformers
Models like BioBERT or ESM treat DNA, RNA, or protein sequences like sentences. They’ve achieved state-of-the-art in protein property prediction and variant effect prediction (newsletter.kiin.biobigdatawire.com). A model such as ESM-2 (15B parameters) “speaks the language of proteins” and can predict which mutations will destabilize structure, etc., without explicit structural input – a testament to the power of attention to capture biochemical context. There’s even an effort to create a “protein GPT” that could generate novel protein sequences for a desired function.
SMILES transformers
We mentioned ChemBERTa and similar. These learn an embedding of chemical sequences. They can be fine-tuned for tasks like property prediction or reaction prediction. They excel at capturing some chemistry syntax (like closed rings, valences) implicitly, though pure SMILES has limitations (ordering and chirality issues).
Reaction transformers
In retrosynthesis (Section 3.2), seq-to-seq transformers have been very effective . E.g., MIT’s Molecular Transformer or Schwaller et al.’s IBM RXN use transformer encoders to translate “product” to “reactants”. They handle context like protecting groups in ways rule-based methods struggled with.
Transformers are also being used for multi-modal integration: e.g., a “MolTrans” model might take a protein sequence and a molecule string, pass each through its own transformer encoder, then a cross-attention to predict binding (pubmed.ncbi.nlm.nih.gov).
Similarly, Vision Transformers (ViT) can analyze cell images to derive features for phenotypic screening, which can then be combined with text or omics embeddings (some early works do this for MoA prediction).
However, transformers alone are not panacea. They require large training data to shine. In drug discovery, labeled data can be relatively sparse (e.g., only a few thousand known actives for a novel target). Pre-training helps, but models may still fall short without careful fine-tuning and augmentation. Also, transformers have no built-in chemistry knowledge like valency or stereochemistry – they may generate invalid SMILES if not constrained or supplemented with something like SELFIES or a validity check.
The SELFIES work essentially made transformer output always chemically valid by adjusting the tokenization, showcasing an interplay between architecture and domain-specific representation.
7.2 Graph Neural Networks (GNNs)
If one had to pick the workhorse architecture of AI chemistry, it might well be the message-passing neural network (MPNN) and its variants. Molecules are naturally graphs, and GNNs respect that structure:
MPNNs
The Gilmer et al. (2017) neural message passing framework set a template – where atoms iteratively aggregate information from their neighbors to compute a molecular representation. Many models (GCN, GAT, SchNet, DimeNet, etc.) follow this. For property prediction on molecules, these often outperform or match transformers, especially for smaller datasets, because they encode known structure (adjacency) and are invariant to atom order.
ADMET prediction tools like ADMETlab 2.0 switched to a graph attention network (GAT) approach and saw performance leap (pmc.ncbi.nlm.nih.gov). GNNs also power many of the cloud prediction engines (e.g., Novartis’s in-house “GraphConv” models via DeepChem).
Graph transformers
There is a trend to combine transformers and GNNs – e.g., a transformer that operates on graph nodes with attention, effectively learning which atoms influence which even if they’re not directly bonded (long-range interactions). Microsoft’s Graphormer is one such that even won a few molecule property prediction challenges.
3D Graph networks
For structure-based tasks (like binding affinity given 3D poses, or protein structure modeling), SE(3)-equivariant graph networks (like EGNN, SchNet, PaiNN) incorporate geometric information (distances, angles) and are invariant to rotations/translations. These architectures have enabled end-to-end models that take in raw 3D atomic clouds and predict properties. For instance, an equivariant GNN can predict a small molecule’s conformation stability or a protein-ligand binding energy with high fidelity, something older ML struggled with without heavy feature engineering.
One frontier challenge where GNNs meet transformers is in generating graphs (i.e., molecules) with specific conditions. Reinforcement learning aside, there are graph-based VAEs and now graph diffusion models (which we’ll get to) that generate molecules in 2D or even 3D form. They often rely on GNNs as the underlying architecture to represent intermediate states.
7.3 VAEs, GANs, and Diffusion Models – The Generative Trio
VAEs (Variational Autoencoders)
Popular around 2017-2019, e.g., Gómez-Bombarelli’s chemical VAE turned molecules to continuous vectors and back (pmc.ncbi.nlm.nih.gov). VAEs allowed optimization in a latent space (via gradient methods). They worked okay for small molecules, but often reconstructions failed for complex structures, and the latent space sometimes wasn’t smooth enough.
Nonetheless, innovations like the Junction Tree VAE (which generates molecules fragment by fragment) improved validity and novelty. Some platforms still incorporate VAEs (for instance, BenevolentAI’s early work or AstraZeneca’s VAE for de novo design published in 2020).
GANs
Generative Adversarial Networks were tried (e.g., ORGAN, MolGAN). They can directly generate discrete objects (MolGAN outputs a graph adjacency and feature matrix). However, training GANs is notoriously finicky, and in chemistry they often mode-collapse to limited motifs. The literature saw some success with conditional GANs to optimize properties (a 2018 paper had a property optimized via a GAN that outdid a VAE and RL approach in some metric). But by 2025, GANs have somewhat fallen out of favor, supplanted by diffusion models.
Diffusion Models
The new rising star. Originally from image generation (DALL-E etc.), diffusion models iteratively add then remove noise, learning to sample from complex distributions. For molecules, researchers have adapted them for both 2D and 3D generation. Examples:Diffusion models are powerful because they don’t require an a priori latent space and have shown excellent mode coverage. For instance, DiffDock achieved much better accuracy in cross-docking benchmarks than classical docking (news.mit.edu).
The generative chemistry community is rallying around diffusion – a 2023 ACS review called them “the latest generative modeling wave” in drug discovery (sciencedirect.com). They are not without challenges: they can be slow (denoising step by step), but advancements like denoising in latent space and efficient samplers are speeding things up.Many expect diffusion models to perhaps deliver on generative design promises that earlier models struggled with – producing not just one or two interesting candidates, but a breadth of options that truly explore chemical space.
Fragment-based diffusion
Levy et al. 2022 used fragment-based diffusion to generate molecules; they reported it captured chemical diversity well (pmc.ncbi.nlm.nih.gov).
3D diffusion Models like DiffDock for pose generation (news.mit.edunews.mit.edu), and EDM (Equivariant Diffusion Model) for small molecules in 3D.
Another, 3D-EDiffMG, generates drug-like molecules in 3D and was highlighted for creating stable, diverse leads (sciencedirect.com).
There’s also RFdiffusion for protein design (from David Baker’s lab) which uses diffusion to generate protein backbones fulfilling a binding constraint (bakerlab.org).
7.4 Beyond LLMs: Specialized and Hybrid Models
Large language models like GPT-4 are amazing generalists (ChatGPT can produce a decent summary of a protein’s function from text, for example). But domain-specific requirements in pharma mean that simply throwing a general LLM at the problem may not work optimally:
Knowledge integration
Pharma requires integrating structured scientific knowledge (pathways, ontologies) with data-driven learning. Future AI architectures may tightly combine neural networks with symbolic reasoning or knowledge bases. We see early signs: models that can use a knowledge graph as an external memory to answer drug questions, or architectures that enforce biological constraints (like preserving mass balance in reactions or mechanistic consistency in causal models).
Causal and mechanistic modeling
As discussed, understanding cause-effect is crucial. A trending idea is causal deep learning, where the architecture encodes causal assumptions (for instance, a generative model that simulates a disease progression, and can be intervened on). Or hybrid models that combine ODEs (ordinary differential equations) for kinetics with neural nets (so-called physics-informed neural nets). In pharmacology, one could imagine a neural PK/PD model that learns from data but respects mass conservation and other physical laws.
Multi-modal foundation models
There’s interest in foundation models that simultaneously handle sequences, structures, and maybe textual knowledge. For instance, a “Molecule & Text” model that you could prompt: “Design a drug for this protein target with these properties” and it would output a molecule – this would combine an understanding of language (for target and property descriptions) and chemistry generation capabilities. One can see glimmers: the proliferation of “task-specific” prompts on ChatGPT (like people having it suggest syntheses – albeit error-prone).
Smaller, fine-tuned models
Another likely future is that instead of one giant model to do everything, we’ll have swarms of specialized models: one for predicting binding, another for predicting solubility, etc., all orchestrated by an intelligent agent that breaks tasks apart. This might work better given the variety of data types and scales in pharma (some tasks have million data points, others only hundreds).
Federated and privacy-preserving architectures
With RWE and patient data, architectures that can learn across silos without centralizing data will be key (biospectrumasia.com). This might mean more focus on secure multi-party computation integrated with model training, or distributed training schemes (the Melloddy project basically created a multi-company model by gradient sharing without data sharing).
AutoML and AI-designed architectures
The field may even allow AI to design the best model architecture for a given pharma problem – neural architecture search (NAS) might yield, say, an optimal graph network block for toxicity prediction that human designers might not have considered.
One rhetorical question has been: will generalist LLMs like GPT-4 make highly specialized models obsolete? Currently, evidence suggests domain-specific models still strongly outperform general models on scientific tasks. For example, a science-focused model like Galactica (though it had issues) or BioGPT can better handle biomedical jargon and logic than base GPT-3, and certainly for tasks like predicting a molecule’s property, you need a model trained on molecular data.
So, likely the future is domain adaptation of LLMs: e.g., fine-tuning or prompting large models with domain knowledge. An example trend: using GPT-style models to parse or write chemical patent text (something IBM did) – a large model can decode the nuance of patent claims better than a small one. But for predicting an assay outcome, a specialized GNN will be used.
In any case, the architecture landscape is pluralistic: Transformers for sequences and text; GNNs for structured biological networks and molecules; VAEs and diffusion for generation; RL for sequential decision tasks (like multi-step syntheses or experimental design). Each has its niche and often they’re combined in pipelines (e.g., generate molecules with diffusion, then evaluate with GNN predictors, then iterate with RL – an ensemble approach).
While ChatGPT dazzles at conversation, it doesn’t know chemistry rules or directly design drugs (if you ask ChatGPT to design a drug, it might propose something nonsensical or plagiarize known ones). The future beyond relies on foundation models for science specifically – which are being developed (as referenced: 2024 saw many foundation models for biology appear (newsletter.kiin.bio).
These include very large models for protein structures (Meta’s ESMfold), for cell imaging, for multi-omics data (like Google’s MultiModal Genomics model). They might not be trillion-parameter like GPT-4, but trained on the “language of biology”.
There’s also a hint of integration of physics beyond pure data-driven: e.g., AlphaFold’s success came from blending physical insight (multiple sequence alignment) with deep learning. Similarly, future architectures might integrate differentiable simulation (like docking or quantum calc) as layers within neural nets, to enforce or learn from physical correctness. That’s arguably “beyond current LLMs” in that it’s a hybrid intelligent system, not just statistical pattern completion.
To ground this in a final example: A 2025 Nature Medice perspective titled “The AI drug revolution needs a revolution” argued that current AI efforts focus too much on language-like data and not enough on human physiology context (nature.com). They call for models that incorporate human variability earlier (like in silico patient avatars). Achieving that might require new architectures combining mechanistic disease modeling (differential equations, networks) with machine learning.
It’s quite possible that the next breakthroughs will come from causal modeling architectures, or hierarchical models that operate at multiple scales (molecular, cellular, patient-level concurrently).
In sum, the AI toolbox in pharma is diverse, and rightly so given the multifaceted nature of drug R&D. While transformers grabbed attention, GNNs and diffusion models are arguably the “secret sauce” in many successful pharma AI stories. The field is moving beyond any single architecture dominating; instead, the key is integrating the right model for the task and pushing into areas like causal reasoning and multi-scale modeling where new architectural innovation is needed.
The pharmaceutical industry, ever pragmatic, has learned that no single AI architecture will magically solve all its problems. Instead, success comes from a symphony of models – each addressing a piece of the puzzle, and orchestrated by scientific insight. ChatGPT might help write emails or summarize papers, but an MPNN might predict IC50s, a transformer might interpret genomic variants, and a diffusion model might imagine new molecular shapes – each “beyond” the other’s domain. The real excitement lies in how these components are starting to fit together, moving the field from isolated predictions to an AI-augmented drug discovery pipeline end-to-end.
8. Next Frontiers: Foundation Models, Digital Twins, and Causal Discovery
Looking forward, several frontiers loom on the horizon of AI in pharma – areas poised to redefine what’s possible in drug R&D over the coming decade. These include the development of foundation models for biology, the creation of sophisticated digital twins and in silico trials for patients, and the use of AI for causal discovery in complex biomedical data. Here we explore these nascent but rapidly evolving domains.
8.1 Foundation Models for Biology and Chemistry
Inspired by the success of foundation models in NLP (like GPT-3/4), researchers are building analogous large-scale models trained on vast amounts of biological data to serve as general-purpose tools. A foundation model is characterized by being trained on broad data (often self-supervised) and adaptable to a wide range of downstream tasks.
In biology, examples emerging by 2025 include:
Protein language models
Meta AI’s ESM-2 (Evolutionary Scale Model) with 15 billion parameters was trained on hundreds of millions of protein sequences. It can be probed to predict structure (via ESMfold) or function (it implicitly learned some ontology of protein motifs). Similarly, models like ProtGPT and OmegaFold serve as foundational in that a single training on sequence databases unlocks multiple utilities (mutation effect prediction, novel protein generation, etc.).
Multi-omics foundation models
A 2024 wave of models like Geneformer (a transformer trained on 30 million single-cell gene expression profiles) learned patterns of gene regulation. It can be fine-tuned to, say, identify cell types or predict disease from gene expression in a sample – tasks it was not explicitly trained for but can perform because it captured fundamental structure in gene expression data. Likewise, BaseGraph by Basecamp (UK) is integrating 500 million knowledge triples from biomedical databases into a huge graph embedding model.
Cheminformatics foundation models
Models such as MolT5 (a T5 model on molecules) or MegaMolBART (by AstraZeneca, a large autoencoder for reaction SMILES) aim to provide an all-purpose chemical reasoning engine. One can fine-tune them for tasks like retrosynthesis, or do few-shot learning (e.g., giving a few examples of a new reaction, and the model adapts since it has “seen” so many reaction types in unsupervised training).
The hope with foundation models is to achieve a kind of emergence: capabilities that arise from scale and diversity of training data. For instance, a protein foundation model might, without explicit instruction, learn that certain sequence motifs bind DNA (simply from patterns in the data) and thereby be able to generalize to novel cases. We saw hints: ESM models learned secondary structure propensity and even aspects of tertiary contacts purely from sequence training.
From a pharma viewpoint, foundation models could significantly accelerate the “data understanding” phase. Instead of training a fresh model for each new target or dataset, one could use a foundation model as a starting point. Imagine a model that ingested every known drug, target interaction, all biomedical literature, etc. Such a model might be able to answer complex queries or generate hypotheses that cross modalities (e.g., a prompt: “Propose a novel combination therapy for disease X involving pathway Y”, and the model outputs a plausible suggestion citing supporting mechanisms).
The trend is clearly toward interdisciplinary models. Already mentioned, Galactica (Meta) and BioGPT (Microsoft) were trained on a mixture of text (papers, knowledge bases) to support scientific QA and generation. While Galactica had issues (it sometimes “hallucinated” convincing nonsense), these issues are being addressed (e.g., by grounding responses in a knowledge graph, akin to how Bing’s new AI cites sources).
So in a next frontier scenario, a medicinal chemist could interact with a foundation model through natural language, effectively querying an AI that has “read” all of medicinal chemistry and can suggest designs or point out pitfalls, backed by evidence. This is somewhat happening: IBM’s Project Rxn demo uses a model to allow chemists to ask questions like “how do I make this compound?” and get an answer with references – preliminary but pointing in that direction.
8.2 Digital Twins and In Silico Trials
We touched on digital twins in section 2.6 regarding synthetic control arms, but the vision extends further: creating comprehensive computational doppelgängers of patients (or even populations) that can simulate how a person’s health evolves and how they’d respond to various interventions.
The current cutting edge: Unlearn.AI’s digital twin for trial controls is a narrow twin – it predicts an outcome (e.g., disease progression on placebo). Future digital twins aim to be richer, incorporating multi-organ models, genomic and lifestyle data, etc., to simulate a range of outcomes (efficacy, side effects, comorbidities).
A “full” digital twin might involve:
- A personalized physiological model (like an expanded PBPK model covering individual organ function parameters tuned to a patient via their labs and images).
- Integration of that with AI predictions for high-level outcomes (like risk of hospitalization given current conditions).
- Continuous updating as new data comes (making it a living model of the patient).
Some projects like the Virtual Physiological Human initiative in EU and device companies (Philips, Siemens etc. working on digital avatars for cardiology patients) feed into this. Pharma specifically could use such twins to:
- Run in silico trials: testing a drug on a cohort of virtual patients before real trials, to refine inclusion criteria or dosing. Regulatory bodies like FDA, while cautious, have begun to accept simulation data in specific cases (for medical devices, in silico trials have sometimes replaced an arm of a trial).
- Personalize treatment: In the clinic, a doctor might use a patient’s twin to try out different drugs and see which the model predicts as best (some oncology decision support systems attempt a rudimentary version of this with tumor organoids or computer models).
By 2025, examples include:
- Virtual diabetic patient cohorts to test insulin regimen changes (a combination of mechanistic glucose-insulin models with AI learning of patient behavior patterns).
- Digital twin of a cancer patient’s tumor: combining genomic, histological, and imaging data to predict growth and response. Some startups (like Cellarity or Turbine in Europe) simulate cell behavior to predict drug effects – a cell-level twin, if you will, of a tumor or tissue.
The next frontiers here will be making these twins realistic enough that regulators trust them to partly substitute real data. The EMA’s qualification of Unlearn’s approach in 2022 was one stepunlearn.ai. Perhaps by late 2020s, we’ll see a drug approved with significant in silico trial evidence augmenting the real evidence (especially in rare diseases where recruiting large trials is hard).
8.3 Causal Discovery and Generative Biology
While we discussed causal inference (making use of known causal assumptions to estimate effects), causal discovery is about using AI to learn causal relationships from purely observational data. It’s a frontier because if solved, it would mean AI could identify new causal drivers of disease or unexpected causal connections that human analysis missed – essentially automating hypothesis generation at a mechanistic level.
Approaches here include:
- Causal structure learning algorithms (like NOTEARS, DAG-GNN) being applied to multi-omics time-series or perturbation data. For example, given gene expression data across thousands of conditions, can an AI infer which genes likely regulate others (producing a directed network)? If so, that reveals potential drug targets (key causal hubs).
- Counterfactual reasoning models: e.g., using generative models to simulate “If gene X were knocked down, how would the cell’s state change?” across the whole system, not just predicting one output. There are preliminary works – e.g., DeepNERF generating counterfactual gene expression. Tapping into these could drastically speed up target discovery by evaluating hypothetical interventions in silico before actual experiments.
One can envision a causal discovery AI that ingests all of a pharma’s internal data (assay results, trial outcomes, omics) and external data, and then outputs: “It appears that activating pathway A causes a compensatory change in pathway B that leads to toxicity; thus, a combination blocking both might succeed” – a non-obvious insight gleaned by analyzing dozens of related experiments and patient data.
There’s a whiff of this in a 2023 paper that applied causal AI on GWAS data to prioritize targets – they combined genetics and transcription data to pick out which genes truly drive disease vs. just correlate (nature.com). That led to identification of “causal variants” pointing to druggable targets with a higher success probability.
Another frontier element tied to causality is hybrid mechanistic-ML modeling (as earlier). For example, digital twins of organs often involve mechanistic ODEs for known physiology plus ML components for parts we don’t understand well. This combo can yield more interpretable and extrapolable models (the ML fills gaps but the overall model respects known causal mechanisms). By 2025, initial efforts like a “Causal AI kidney model” or “AI cardiology digital twin” are likely in research pipelines.
8.4 Beyond Molecules: AI for New Modalities and Synthetic Biology
As pharma expands beyond small molecules to modalities like gene therapies, siRNA, cell therapies, and even synthetic biology constructs, AI frontiers extend there too:
- Designing mRNA sequences with optimal protein yield but minimal immunogenicity – basically similar to codon optimization but with ML that can consider secondary structure of the mRNA. In fact, some companies have models for this (Meta’s ESM helped design a more stable Cas9 mRNA).
- Designing cell therapies – e.g., finding optimal CAR-T cell designs (AI to predict which antibody binder domain will work best, or using AI to analyze patient omics to pick the right neoantigen targets for a cancer vaccine).
- In synthetic biology, foundation models might help create novel enzymes or metabolic pathways to produce drugs sustainably (this overlaps with protein design foundation models).
Finally, a somewhat speculative but plausible frontier: AI-designed clinical trials where an AI not only simulates outcomes but actively designs a trial protocol (maybe even adjusting in real time using reinforcement learning to maximize knowledge gain subject to constraints). Already, AI is used to optimize dosage or allocation in adaptive trials in simulations; by late 2020s, regulatory bodies might approve AI-driven adaptive protocols that do mid-trial changes beyond pre-specified rules if done under an approved AI algorithm’s guidance.
In conclusion, the next frontiers of AI in pharma are about scale and integration: scaling models to encompass entire biological systems (foundation models for biology), integrating multi-scale data (digital twins from molecule to whole patient), and integrating causal reasoning into predominantly correlation-based AI. While the current (2025) AI in pharma toolkit mostly addresses individual steps (designing a molecule, predicting a property, etc.), the next generation aims to tackle the full complexity of disease and treatment.
It is an ambitious vision: a future where we can simulate a virtual patient, discover and test a therapy in silico that precisely targets their disease’s causal network, and do so largely with AI guidance before ever going into a human. Achieving that will require those foundation models fluent in biology, robust causal discovery AI, and validated digital twin platforms – all pieces in progress. If successful, it would indeed revolutionize pharma: reducing guesswork, allowing truly personalized medicine, and expanding our understanding of biology itself (as AI often discovers things humans overlook).
The tone for such frontiers is optimistic but grounded: these are in early stages, and caution is needed (garbage in, garbage out still applies – a digital twin is only as good as the data and assumptions behind it). Yet the momentum is undeniable – just as 5 years ago few imagined an AI could predict protein folding at near experimental accuracy (AlphaFold), in the next 5–10 years we may see achievements currently thought remote, such as an AI confidently explaining “here’s why this drug failed and how to fix it” by truly understanding patient biology – a profound shift from AI as an assistant to AI as a genuine scientific partner.
Conclusion: The ongoing marriage of artificial intelligence and pharmaceutical science is nuanced. We have surveyed its present capabilities, from model-driven chemistry to data-driven clinical insights, and projected how emerging advancements could further reshape the industry. AI is not a magic bullet – drug R&D remains constrained by the complexity of biology and the rigor of empirical validation – but it’s a powerful new tool that, when wielded with domain wisdom, can tilt odds in our favor.
After years of false starts, AI in pharma is evolving from youthful exuberance to seasoned productivity. Much as combinatorial chemistry and genomics changed drug discovery in earlier eras, so too is AI now maturing into an indispensable, if unsentimental, workhorse of the lab – accelerating some processes, illuminating others, and occasionally, with dry irony, reminding us of the limits of our data.
The true measure of its impact will be in molecules on pharmacy shelves and improved patient outcomes.
Member discussion