AI, GIGO, and the Biopharmaceutical Venture
AI, GIGO, and the Biopharmaceutical Venture
By Marcel V. Alavi
May 13, 2025
Garbage In, Garbage Out (GIGO). Image generated by Google's Gemini language model on May 12, 2025.
Artificial Intelligence (AI) is actively permeating and reshaping nearly every facet of our lives presenting both tremendous opportunities and significant challenges. AI is also profoundly transforming the pharmaceutical industry potentially reducing costs, accelerating development timelines, and improving success rates of bringing critical therapies to patients.
Yet, the quality of the input in computer science directly dictates the quality of the outcome. As the old saying goes, Garbage In, Garbage Out (GIGO). Asked for comments, Gemini concludes rightly “GIGO isn't just an AI-specific problem; it's a basic truth in all computing. However, its implications are amplified in AI due to the data-driven nature of learning algorithms.”
AI is currently being deployed by the industry in many areas, such as target identification and validation, small- and large-molecule drug discovery and development, prediction of drug absorption, distribution, metabolism, excretion, and toxicity (ADMET), personalized medicines, drug repurposing, patient recruitment and clinical trial design.
However, the overall success rate of any given drug development program is with about 10% quite low, which raises questions about the quality of the training data for AI algorithms. This humiliating low success rate is mirrored in the lack of reproducibility of about 90% of peer-reviewed preclinical research studies (Prinz et al. 2011, Begley & Ellis 2012, Baker 2016). Fact is, there is just an awful lot of garbage out there.
While former FDA Commissioner Scott Gottlieb recognized the prime opportunity to employ artificial intelligence in a recent op-ed on How to stop the shift of drug discovery from the U.S. to China, he also acknowledged: “Mice and even primates are often poor proxies for many of the remote toxicities the FDA is trying to test for.”
Another important aspect is the need of causally linked data to understand cause-and-effect relationships. Key-word: AI Hallucinations. Many publicly available datasets focus merely on correlations and associations. Identifying or annotating datasets to facilitate causal inference is much more challenging. But how else can one test and validate causal-inference algorithms?
Modeling biology is a complex problem, not just a complicated one. Unlike a complicated machine that still can be disassembled and reassembled to learn its function and understand the interactions of its individual parts, biology requires approaches that go beyond simple reductionism. Biological interactions are context-dependent and highly dynamic, which makes them so difficult to predict. Even worse, we do not know what parts we are still missing.
Maybe gone are the days when simply throwing a million stock options into a random San Francisco coffee shop could readily yield a team of programmers to code groundbreaking AI algorithms. (A warm meal might do the trick today.) Yet, the drug development bottleneck is not the lack of AI algorithms. The main bottleneck is the lack of clean and robust training data.
I cannot offer a faster solution to the specific challenges of the life sciences other than cleaning up the garbage. Nevertheless, I do see a pertinent use for AI in biomedical research in bringing down general and administrative expenses (G&A). Drawing upon my experience as a scientist and entrepreneur, I am always puzzled by the numerous managerial layers governing research and development programs (R&D) so common in the industry and academia. One can leverage AI to eliminate these layers substantially reducing G&A costs. In view of this, I believe the National Institutes of Health head in the right direction by imposing a cap on overhead costs for research grants.
I would greatly enjoy hearing your perspectives on this critical topic.
#artificialintelligence #biotec #bigdata
Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov. 2011; 10(9):712. doi: 10.1038/nrd3439-c1. PMID: 21892149.
Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012; 483(7391):531-3. doi: 10.1038/483531a. PMID: 22460880.
Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016; 533(7604):452-4. doi: 10.1038/533452a. PMID: 27225100.