Where Drugs Win or Lose: Mapping the Competitive Bar in Biopharma with AI

How Formation Bio uses Pareto frontiers to analyze the high-water marks for drug development

by Kenny Peng

Principal Engineer

7 min. read•Feb. 13, 2026

Where Drugs Win or Lose: Mapping the Competitive Bar in Biopharma with AI featured image

The potential for a pharmaceutical drug to positively impact the lives of patients does not hinge solely on whether it works in clinical trials. A new pharmaceutical product must prove itself against both existing treatments and competing drugs in development across efficacy, safety, tolerability, and fit for different patient populations. Drug developers who misjudge these competitive dynamics make costly mistakes: investing in uncompetitive clinical programs, while undervaluing promising assets that could succeed with better positioning, can mean the difference between delivering a transformative therapy to patients or shelving it entirely.

Formation Bio’s mission is to bring new treatments to patients faster and more efficiently, and we pursue first-in-class and best-in-class drugs across multiple therapeutic areas. To ensure we invest in the most competitive treatments, we developed an intelligence tool that benchmarks drugs in development against real-world success metrics. This helps us identify overlooked opportunities and anticipate how standards of care will evolve. While competitive landscapes span many dimensions, from patient populations to dosing convenience to treatment sequencing, this post focuses on the critical safety-efficacy tradeoff that defines clinical benchmarks in drug development.

Building a Flexible Intelligence Framework

To help define what "good" looks like across therapeutic areas, we built a framework that collects and structures comprehensive competitive data—efficacy and safety metrics, patient populations, and dosing regimens. One of the most powerful visualizations this enables is a Pareto frontier, which helps us understand the fundamental tradeoff between safety and efficacy.

In drug development, efficacy gains often come with safety tradeoffs—a more potent treatment may carry higher risks of adverse events. Since not all outcome measures or adverse events carry equal clinical weight, the ability to explore different metric combinations is essential for drawing meaningful conclusions. By mapping how drugs both in development and commercially available perform across these dimensions, we can quantitatively identify the current best-in-class profiles and spot where opportunities exist for meaningful improvement. This is what a Pareto frontier visualizes: the set of drugs where improving efficacy would require compromising safety and vice versa. The frontier therefore shows the current clinical high-water marks that a new drug would need to surpass to be considered a meaningful advance.

A Pareto frontier visualization where we map clinical efficacy (y-axis, higher is more effective) to a safety metric (x-axis, rightward is more safe) of hypothetical drugs.

Conducting this type of analysis and creating these visualizations in practice requires solving a significant data challenge. For any given indication (e.g., ulcerative colitis), we need efficacy data for relevant endpoints, safety profiles broken down by adverse event severity, patient population characteristics, and dosing regimens, all reported inconsistently across trials and publications. This is where we believe AI can be a massive lever, enabling us to systematically compile and structure data that would otherwise take weeks of manual work.

We designed a three-step workflow that mirrors how a human expert would approach this task, but with AI acceleration:

Human workflow and AI-based workflow parallels

The workflow combines AI automation with human oversight at critical decision points. Involving a human in the process provides two key advantages: 1) it ensures AI is grounded with expert judgment, and 2) it forces explicit decisions about how to structure each indication. Even for well-understood therapeutic areas, humans need to make nuanced decisions about which outcome measures to group together and what timeframes are clinically meaningful—decisions requiring domain expertise that AI cannot reliably make on its own.

Step 1: Selecting the Relevant Drugs

We first select the drugs relevant to the competitive landscape for an indication. We defined three primary avenues of relevance given the goal is to collect objective evidence:

A regulatory health authority, such as the FDA, has explicitly approved the drug for an indication.
A sponsor has run a trial for the drug targeting an indication.
Physicians prescribe the drug in off-label usage.

The first two avenues have strong representation in authoritative data sources like FDA product labels and clinical trial databases. For the last avenue, we found that standard of care practices outlining off-label usage are often covered in freely-available resources such as treatment guidelines from professional associations and meta-analyses papers, although claims data would be the most canonical source.

Step 2: Understanding Relevant Outcome Measures

Compiling a list of relevant outcome measures for an indication is more difficult than selecting the relevant drugs, given that there’s no singular or obvious standard for relevance. Humans often look at trial information, sponsor websites targeting healthcare professionals, or journal articles to build an intuition for what the common outcome measures are. AI can help us do this at a much larger scale by:

Reading the corresponding product labels and supplementary materials for each explicitly indicated drug to understand what the health authority evaluated at the time of approval.
Collecting all endpoints from the study plans of all of the trials associated with the set of drugs in the previous step.

Our system then takes the literal strings associated with the endpoints (e.g., “Proportion of subjects who achieve clinical remission per Modified Mayo Score at week 8”, “Clinical remission at Week 8 and Week 52 (total Mayo score 52 with no subscore > 1)”) and suggests a standardized form (e.g., “Clinical remission (Mayo-based)”).

A human reviews this list to ensure the outcome measures and terminology match how they want to frame the indication. We then force the system to only recognize and map data to this validated set of measures.

Step 3: Populating the Data Matrix

At this point, we have the structure for a large table with the drugs we’re interested in running down the rows, and the outcome measures running across the columns:

We iterate through the entire table and systematically attempt to fill in information for each cell with individual AI calls – managing this process ourselves helps ensure that we can be exhaustive with searching for relevant data. Each invocation involves a complex prompt where we provide details such as the drug name, trial IDs, the indication, and target outcome measures. We provide explicit guidance for source hierarchy, treating product labels and peer-reviewed journal articles as more authoritative than clinicaltrials.gov, and how to handle pooled results. We also consider that a single trial (even if pivotal) may report outcome measure data at different timeframes (e.g., an induction vs. maintenance time frame) and with different doses (if there are more than one treatment arm), so we take care to represent this accurately. Below is a truncated version of the final table representation:

Beyond just the outcome measures themselves, we also capture traceability back to the underlying trial data, linking each metric to relevant context such as baseline characteristics of that trial arm. This allows users to build holistic context around any comparison: when evaluating why one drug shows superior efficacy, they can immediately see whether it was tested in a different patient population, making the comparison more nuanced than the numbers alone would suggest. While it can easily take days to weeks for a human to manually collect the relevant information for all of the drugs and outcome measures, our workflow can complete it in about 30 minutes. This speed and scalability means we can maintain comprehensive, current views across dozens of therapeutic areas, ensuring we don’t miss opportunities to bring better treatments to patients.

Visualizing the Data

Once all metrics are collected, users can select a safety and efficacy metric to visualize the Pareto frontier. The figure below shows this frontier for Ulcerative Colitis, plotting Serious Adverse Event Rate (SAE Rate) against Clinical Remission at 6-8 weeks (the standard metrics for evaluating induction therapy). SAE Rate is a useful starting point as a broad safety signal, but users can swap in more specific metrics, such as rates of serious infections, to explore tradeoffs that matter most for a given therapeutic decision:

A frontier visualization of a safety metric (SAE Rate) vs. an efficacy metric (Clinical Remission in Induction) in Ulcerative Colitis

From this visualization, we can see that the major drug classes used in UC: TNF inhibitors, JAK inhibitors, IL-23 inhibitors, and 5-ASAs define the current frontier where tradeoffs between safety and efficacy must be navigated. The visualization also shows why obefazimod, a novel miR-124 upregulator, has generated significant excitement: its results would push the frontier outward, establishing a new benchmark for efficacy at comparable safety levels, and opening up a new class of treatments to patients who otherwise may have not had success with other drugs. This example illustrates the core value proposition: turning subjective assessments of competitive positioning into quantitative insights that directly inform strategic decisions.

From Visualization to Decision-Making

By systematically mapping competitive landscapes, we can ground our assessments in comprehensive data alongside expert judgment. The framework provides quantitative benchmarks that complement clinical expertise about patient preferences, treatment sequencing, and real-world use.

This visualization, while powerful, represents one slice of a richer picture; trials differ in study design, patient populations, and endpoint definitions. Our framework accounts for this by maintaining traceability from each outcome measure back to the underlying trial characteristics, so users can assess how much of an apparent difference reflects a true therapeutic advantage, versus differences in how the data was generated. Critically, these benchmarks help us understand not just which drugs perform best overall, but which patient populations might benefit most from specific therapeutic profiles, even when a drug isn’t universally best-in-class. This allows us to pinpoint specific gaps where no current therapy achieves the desired balance of safety and efficacy, helping us identify which development opportunities have the greatest potential to deliver meaningful improvements for patients.

While this post focused on the safety-efficacy tradeoff, the framework's real power lies in its flexibility. We're expanding the framework's capabilities to incorporate more granular safety signals, mapping patient population characteristics to better contextualize comparisons, and building tools that allow rapid exploration across different metric combinations. The goal is to make it progressively easier for our experts to interrogate the landscape and surface insights that drive better investment and development decisions.

In an upcoming post, we'll dive into the technical implementation: how we optimized web search at scale, managed workflow durability with Temporal, and balanced automation with human oversight.

Back to blog