Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn - "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis Recap

Podcast: "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Published: 2026-03-11

Duration: 1 hr 43 min

Guests: Jassi Pannu

Summary

Johns Hopkins Professor Jassi Pannu discusses the urgent need for access control systems to prevent AI models from learning dangerous biological capabilities, such as designing transmissible and virulent pathogens. She proposes a data-level framework to safeguard critical datasets while advancing beneficial research.

What Happened

Johns Hopkins Assistant Professor Jassi Pannu explains the biosecurity risks posed by AI models capable of leveraging biological data to design dangerous pathogens. She emphasizes the growing threat from extremist groups, lone actors, and even autonomous AI agents, which could exploit open-source data to create transmissible and deadly viruses.

Pannu highlights historical incidents, like 2012 gain-of-function studies where researchers made bird flu transmissible between mammals, and current concerns about AI's ability to automate and enhance such experiments. She notes that while these experiments are no longer federally funded, private labs operate with limited visibility and regulation.

The discussion covers how governments and researchers can mitigate risks by implementing a biosecurity data-level framework. This system would classify datasets into tiers of sensitivity, ensuring that only a small fraction of crucial, functional data—such as information linking viral mutations to transmissibility or virulence—requires controlled access.

Pannu explains that most biological data, like raw DNA sequences, can remain open access, while security measures can focus on functional data that connects pathogen traits to pandemic potential. She cites recent successes in data holdout experiments with AI models like Evo2 and ESM3, which filtered training data to eliminate dangerous capabilities while retaining performance for beneficial tasks.

The episode explores the potential for integrating robotics and AI to accelerate drug and vaccine development, while emphasizing physical-world bottlenecks like manufacturing and distribution. Pannu advocates for systematic data generation through robotic automation to better understand causal relationships in biology.

She introduces a defense-in-depth strategy involving delay, deter, detect, and defend mechanisms. These include mandatory DNA synthesis screening, global pathogen surveillance systems, and built-environment defenses like FAR UV sterilization to prevent airborne transmission.

Pannu underscores the importance of international cooperation and private-sector engagement to implement biosecurity measures effectively. She also calls for better information-sharing systems among gene synthesis companies to prevent malicious actors from exploiting current gaps.

The conversation ends with optimism about how a layered approach, combining data controls, surveillance, and defenses, could help mitigate catastrophic biosecurity risks while fostering safe scientific progress.

Key Insights

In 2012, researchers made bird flu transmissible between mammals in gain-of-function studies—experiments so risky that federal funding was pulled. But private labs, often with limited oversight, continue to operate, raising questions about how to regulate such dangerous capabilities.
AI models like Evo2 and ESM3 have shown that filtering out 'functional data'—specific links between viral mutations and traits like transmissibility—can prevent models from gaining capabilities to design deadly pathogens while preserving their usefulness for safe research.
Mandatory DNA synthesis screening is a critical defense against biosecurity threats. By flagging and blocking the creation of DNA sequences linked to dangerous pathogens, this measure aims to prevent malicious actors from engineering viruses in labs.
Robotic automation could generate systematic biological data to uncover causal relationships, speeding up drug and vaccine development. However, real-world bottlenecks like manufacturing and distribution remain hurdles, even if AI accelerates discovery.

Key Questions Answered

What is Jassi Pannu's biosecurity data-level framework from The Cognitive Revolution?

Pannu's framework proposes classifying biological data into five tiers based on sensitivity, limiting access to datasets linking pathogen traits to pandemic potential. It aims to preserve open access for most data while securing functional data through trusted research environments and controlled sharing.

How do AI models like Evo2 address biosecurity risks?

Evo2 filtered out human-infecting viral sequences from its training data, significantly reducing its ability to perform harmful tasks like viral protein design while maintaining performance on other biological tasks. This approach demonstrates how data controls can limit dangerous capabilities.

What are the risks of AI in gain-of-function research as discussed on The Cognitive Revolution?

AI could automate and enhance gain-of-function-style research, lowering barriers for bad actors to design transmissible and deadly pathogens. Pannu warns that current gaps in data regulation and lab oversight exacerbate these risks.