Searching Across Silence

How Formation looks for “dark assets”– silently deprioritized candidates that may reveal hidden opportunity

by Sujitt Rameshkumar

Software Engineer

8 min. read•Nov. 11, 2025

Pharma pipelines are constantly changing. Companies regularly conduct strategic reassessments of their portfolio, making decisions on which products to keep and which to cut. These decisions are rarely announced the moment they are made, if ever. A once-promising program might fade from the public record - unmentioned on earnings calls, missing from R&D updates, and omitted from SEC filings. They suddenly fall silent, with little public information on what happened.

Some assets go dark because of scientific reasons, such as problematic safety data or lack of efficacy. These are challenging to rescue. However, some assets may be deprioritized for other reasons, such as loss of funding, strategic shifts, or the absence of an internal champion. Finding those, the silent but potentially transformational “dark assets”, can be a goldmine of opportunities leading to new, more innovative therapies for patients around the world.

The question is, how? Traditional business-development workflows only surface what’s visible - mentions in company statements, active conference promotion, news articles and press releases, but they don’t capture what’s not being said. With thousands of active programs across the industry, it’s nearly impossible for teams to manually sift through data and track the quiet disappearances that happen every day.

This challenge is simple to describe, but difficult to solve. At Formation, we spent time building a system to canvas and monitor the world of potentially deprioritized assets and turn them into concrete signals for our BD team to act on.

Step 1: Seeing Silence as a Signal

To address this challenge, our team needed to define silence itself as a signal. Silence is not just the absence of information, but the first sign of change.

When a company stops talking about a program, that decision ripples through its filings, press releases, and investor updates long before it ever shows up in databases or deal trackers. Each omission leaves clues as to the underlying strategic reasons, such as shifting focus, tightening budgets, or potential future divestiture.

Our first step became quite clear - to deploy an analytical method that could capture the absence of asset mentions in financial reports, news articles, and so forth as a distinct signal.

Step 2: Building the Construct

To do so, we built an automated system that listens across the public record, from SEC filings to earnings-call transcripts and press releases, and transforms both mentions and absence of mentions into data. The system captures when and how an asset is discussed, while also tracking when a tonal shift occurs in communications (i.e. from confident to cautious), when an asset stops being mentioned (‘the silence’), and when that pattern begins to appear across multiple sources.

At its core, our system streamlines a process that would be impossible to manage manually at scale, continuously scanning three major forms of public disclosure: SEC filings (10-Ks, 10-Qs, and 8-Ks), earnings call transcripts, and press releases, to track how companies talk about their drug programs.

For each asset owned by a public company, the system asks three questions:

Is it being mentioned?
How is it being described? (positive, neutral, or negative tone)
Has that tone or frequency changed over time?

Using those inputs, each asset is assessed based on whether it is active and ‘noisy’, or whether it has fallen silent. The result is a dynamic, auditable list of assets that may be quietly falling out of focus, which our business development team can review for potential in-licensing opportunities.

Our ‘silence as a signal’ asset search process can be represented as a three-layer data pipeline:

Step 3: Refining the Architecture

This three-layer data pipeline brings structure to unstructured public data and enriches it with scientific and transactional context.

Layer 1 – Creating a Structured Asset Ownership Dataset

Layer 1 answers the question of who actually controls each drug program, and when. We unify two pharma vendor data sources with company tickers and SEC identifiers to create a definitive ownership map. Marketed products and expired patents are filtered out so the focus remains on assets in development. This prevents false positives and ensures every subsequent signal is grounded in reality.

After a first pass, our pipeline output included over 8,000 distinct drug assets spanning approximately 1,100 public pharma companies. The programs covered a wide range of therapeutic areas, with oncology, immunology, and neurology together accounting for more than half of all identified assets. The breadth of data provided scale and diversity to the analysis, enabling the system to capture deprioritization trends across modalities, from small molecules to advanced biologics.

Layer 2 – Listening for Silence

Layer 2 captures what’s being said (or not) across public communications and defines when absence becomes meaningful. Assets are considered ‘silent’ after roughly 12–18 months without mention across major disclosure types, a threshold chosen to balance typical reporting cycles with genuine disengagement. Natural-language and LLM-based models detect mentions, measure tone (positive, neutral, or negative), and tag context such as pipeline cut or under review. Importantly, not all signals are silent - some programs remain in discussion, but their tone has shifted. Our models flag these subtler changes in sentiment and language, as such tonal softening can often foreshadow future silence. Temporal analysis then tracks how enthusiasm rises or fades over time, turning both explicit absences and early tonal drift into quantifiable signals of shifting corporate focus. The output is a signal score that measures how each asset’s visibility and tone evolve - a proxy for where attention is moving next.

Example Findings

Layer 3 – Interpreting Silence

The final layer identifies why assets go silent. We define silence as zero mentions across SEC filings, earnings calls, and press releases within a phase-adjusted look-back window (18-36 months). Each source contributes to a weighted deprioritization score, with SEC filings weighted most heavily for their regulatory relevance. To interpret the absence of communication, GPT-4o is used as a classifier and extractor - parsing disclosure tone (positive, neutral, negative), tagging context (pipeline cut, resource shift, under review), and normalizing event dates. This LLM layer transforms raw text into structured, explainable signals that can be cross-checked against trial outcomes, drug profiles, and web-search-based termination rationales. External web data further fills information gaps, capturing credible third-party reporting where vendor datasets are silent. The result is a unified deprioritization record that not only flags silence but interprets its cause - distinguishing between strategic pauses, funding constraints, and genuine scientific dead-ends, allowing us to truly identify valuable “dark assets.”

This layered design allows the system to move from detection to interpretation, transforming raw disclosure text into actionable intelligence for business development.

Deploying with our BD teams

With this system, we’ve turned what was once a manual, document-by-document search into a scalable discovery system. Instead of combing through filings and transcripts line by line, BD teams can now surface potentially available assets through structured, time-aware filters.

This acceleration gives BD teams a critical timing advantage: the ability to identify possible out-licensing or partnership candidates slightly earlier than competitors, when discussions are more open-ended.

The framework doesn’t replace human judgment; it expands bandwidth. Analysts can now focus on high-context follow-up rather than initial screening, moving from thousands of assets to a short list of silence-defined leads in minutes.

For each candidate, the system also provides an initial LLM-generated hypothesis - a lightweight explanation of what might be driving the pause, drawn from trial outcomes, public tone, and web evidence. These hypotheses aren’t conclusions but starting points, helping BD prioritize where to look next.

Case Studies: AZD2693 and BMS-986406

A concrete example of predictive signal detection comes from AZD2693, an AstraZeneca asset in NASH (nonalcoholic steatohepatitis). This asset was processed for having 0 explicit mentions in earnings call transcripts and press releases during a run of the pipeline in August. Our system was able to identify signals that flagged it as a potentially “dark” asset.

On November 6th, AstraZeneca announced it was discontinuing the program following disappointing Phase 2 data. This example underscores how the deprioritization pipeline provides alpha, giving us early intel into assets and portfolios that can be valuable in a number of different contexts.

Another example is BMS-986406, an oncology asset. At time of writing, major industry databases still list the program as being in Phase 1 development, and even a basic web search for “BMS-986406 discontinuation” shows no indication that the program has ended. However, our system detected silence - no recent mentions in SEC filings, press releases, or earnings-call transcripts - and follow-up review confirmed that BMS’s own public pipeline listings make no reference to the asset, nor does it appear on any active development page. BMS has not declared a discontinuation of the asset explicitly, but the absence of acknowledgement may give us hints of what’s to come.

When taken in aggregate, these quiet mismatches of explicit announcements and corporate reality may reflect upcoming portfolio divestitures or reprioritizations, or give us an updated view of company valuations at a given point in time. As you can imagine, there are many assets such as these, all which provide useful signals for our team to further evaluate.

Learnings and further refinement

The system is by no means “complete.” Building the deprioritization pipeline surfaced several lessons that shaped its evolution and will continue to be refined:

Ownership Accuracy: Life science data surrounding asset ownership often lacks precision given real-time changes and the complexity of M&A deal contracts. We improved upon this by triangulating data sets from different vendors together to significantly improve data accuracy.
Asset Aliases: Drugs often appear under multiple names - research codes, brand names, and generics. Standardizing across these identifiers reduced duplicate records and ensured consistent tracking.
Model Reliability: LLM prompts were refined in collaboration with human life science experts, such as our BD team, to better capture the nuances of what “deprioritized” means in practice, improving accuracy across tone and context classifications.

Each of these lessons tightened the system’s precision, reliability, and speed - turning an experimental search process into a scalable intelligence framework. We’re excited to continue improving on it and iterating from future learnings.

AI-enabled asset search, now and into the future

With this system, the bottom line is simple - we turned corporate silence into a measurable indicator of deprioritization, giving our asset search teams a new way to see what others may overlook and to act before the market catches up.

Our work here is just beginning. There are many more opportunities ahead, and we’re quickly building toward a future where we can continuously monitor the entire landscape of assets in real time - more to come.

Note: Deprioritized asset search is just one of the many capabilities we’ve built in Atlas, our proprietary AI Drug Hunting platform. To learn more about our platforms, see https://www.formation.bio/technology.

Back to blog