The Bottleneck That Has Defined Medicine
Drug discovery has always been a game of extraordinary odds. The human genome contains roughly 20,000 protein-coding genes, yet only about 4,500 are considered "druggable" — capable of being modulated by a small molecule or biologic. Of those, all approved drugs to date act on just 716 distinct targets. The gap between what biology offers and what medicine has achieved is vast, and the reason is simple: identifying which of those 20,000 genes is the right target for a given disease has historically taken months to decades of painstaking experimental work.
A landmark review published in Nature Reviews Drug Discovery on 20 April 2026 — authored by researchers at Insilico Medicine and collaborating institutions — maps how artificial intelligence is systematically closing that gap. The implications for drug development timelines, costs, and ultimately patient outcomes are profound.
The Data Problem AI Was Built to Solve
The core challenge in target identification is not a lack of data. It is an excess of it. Modern biology generates multimodal datasets of staggering complexity:
- Genomics and transcriptomics — which genes are expressed, in which cells, under which conditions
- Proteomics and metabolomics — which proteins and metabolites are present and at what levels
- Epigenetics — how gene expression is regulated without changing the DNA sequence itself
- Clinical data — electronic health records, trial outcomes, imaging, and patient demographics
- Scientific literature — tens of millions of papers, patents, and regulatory submissions
No human expert, or team of experts, can synthesise these sources simultaneously. AI can.
The Algorithmic Toolkit
The review details the machine learning frameworks now driving target discovery:
Supervised learning trains models on confirmed drug-target pairs to predict novel interactions. Insilico's own PandaOmics platform integrates multi-omic and published text data to nominate disease targets; TargetPro learns features characteristic of clinical-stage targets to prioritise candidates most likely to succeed in trials.
Graph Neural Networks (GNNs) exploit the inherent structure of biological knowledge graphs — networks encoding relationships between proteins, genes, pathways, and diseases. GNNs can predict novel protein-protein interactions, identify synthetic lethality relationships, and uncover target combinations capable of reversing complex disease phenotypes.
Generative AI and foundation models represent the newest frontier. Models pre-trained on tens of millions of single-cell transcriptomes — such as Geneformer and scGPT — can simulate how a cell responds to genetic perturbations, effectively running in silico experiments that would take months in a wet laboratory. Insilico's PreciousGPT series generates synthetic multi-omics data to augment sparse datasets and facilitate target discovery in rare diseases where patient samples are limited.
Domain-specific LLMs and AI agent frameworks — including BioGPT and OriGene — act as "virtual biologists", processing vast repositories of biomedical literature, synthesising information across disparate databases, and autonomously generating and refining therapeutic hypotheses.

From Hypothesis to Clinical Candidate: A Compressed Timeline
The practical impact of these tools is already visible. Insilico Medicine made headlines in 2023 when it nominated the first AI-designed drug candidate for idiopathic pulmonary fibrosis — ISM001-055 — using its end-to-end AI platform, compressing a process that typically takes four to five years into less than 18 months. On 23 April 2026, the company announced a further landmark: the nomination of the first generative AI–designed drug candidate for a new indication, demonstrating that the pipeline is not a one-time achievement but a repeatable process.
The Multimodal Advantage
What distinguishes the current generation of AI tools from earlier computational approaches is their ability to integrate heterogeneous data types. A GNN trained on a biological knowledge graph can simultaneously consider:
- A gene's expression pattern across 50 tissue types
- Its known protein-protein interactions
- Its association with disease phenotypes in genome-wide association studies
- The competitive landscape of existing drugs targeting related pathways
- Unpublished data from clinical trial registries
This multimodal integration is precisely what human researchers struggle to perform at scale. The result is a systematic, data-driven approach to therapeutic hypothesis generation that complements — and in some domains now surpasses — traditional experimental biology.
Challenges and Limitations
The review is candid about the field's limitations. The quality of AI predictions is bounded by the quality of training data, and biological datasets are frequently incomplete, biased toward well-studied organisms and diseases, and inconsistently annotated. The "black box" problem — the difficulty of explaining why a model nominated a particular target — remains a barrier to regulatory acceptance and scientific trust.
There is also the question of wet-lab validation. A 2026 survey by the Pistoia Alliance found that only 1% of laboratory professionals report AI having direct value in the wet lab — a reminder that computational predictions must ultimately be tested in cells, animals, and humans. AI accelerates the front end of drug discovery; it does not replace experimental biology.
Implications for Biosecurity and Biosafety
The same AI tools that identify therapeutic targets can, in principle, identify targets for harm — proteins whose disruption would be maximally damaging to human physiology, or pathogen genes whose enhancement would increase transmissibility or virulence. Governance responses are developing but lag behind the technology. The Biosecurity Commission published a comprehensive report in April 2025 recommending AI-specific biosecurity screening for drug discovery platforms.
Conclusion
The transformation of drug target identification by AI is not a future prospect — it is the present reality of pharmaceutical R&D. The 716 targets that underpin all of modern medicine may soon look like a modest foundation. AI is systematically expanding the map of druggable biology, compressing timelines, and surfacing targets that human intuition alone would never have reached.
Sources: Insilico Medicine / Nature Reviews Drug Discovery, 20 April 2026 (DOI: 10.1038/s41573-026-01412-8); Pistoia Alliance survey, April 2026.