Why Science Only Publishes 5% and Why That's Killing Progres

There’s a fundamental asymmetry built into how science works today, and it’s invisible to most researchers. When a materials scientist runs 100 experiments and 5 work, they publish a paper about the 5. The 95 failures? They live in a lab notebook. They disappear. They never enter the scientific record.

This bias isn’t malicious. It’s structural. Journal editors want impact. Researchers want to advance their careers. Publishing failures doesn’t move either needle. So the entire field of materials science — and most fields, really — is operating on 5% of its actual data.

Jorge Colindres, cofounder of Radical AI, sees this as the core bottleneck. His company has built a robotic lab that doesn’t publish papers. It captures everything. Every failed hypothesis, every material that didn’t have the right oxidation resistance, every experiment that taught the system what not to try. And that’s where the power lies.

The hidden cost of ignoring failure

In traditional materials science, a researcher designs an experiment, waits weeks for results, publishes (if it works), and repeats. The knowledge loop is tight but truncated. You only see the winners.

“Science publishes only wins. When a researcher publishes a paper about the one experiment that worked, nobody talks about the 95 that didn’t,” Jorge explains. “That failure data never enters the scientific record. It lives in the mind of the scientist.”

The impact compounds. When the next researcher tackles a similar problem, they can’t learn from those 95 failures. They start from scratch. They’ll likely fail in the same ways. And when their few successes come, they publish those and hide the failures too.

This is how a field stays stuck. The accumulated knowledge of what doesn’t work — which is arguably more valuable than what does — never leaves the lab. Each generation rediscovers the same dead ends.

Why failure data is more valuable than success

Machine learning engineers understand this intuitively. Every classification model needs negative samples. If you only train on examples of “cat” without ever showing “not cat,” the model learns nothing about where the boundary actually is.

Materials science has been doing the equivalent for decades: training on only the winners.

“It’s one of the most important things that you could capture as a scientist,” Jorge says. “So much of what we do in the lab is through iteration. It is through the failures that we learn how to eventually come up with good experiments.”

The negative data tells you where the boundaries are. It tells you which material properties are incompatible. It shows you the directionality. If your goal is hardness but your material keeps corroding, the failure data reveals the trade-off. It’s the map of the landscape, not just the peak.

When you feed a machine learning model this full spectrum — wins and losses, properties that work together and those that fight each other — something shifts. The model stops just predicting isolated wins. It understands the topology. It can see a path through the landscape instead of just naming the hilltops.

Radical captures all of it: every test result, every material that failed a thermal stress test, every composition that had the wrong oxidation profile. The human scientists augment the data with context, and then the ML models iterate based on the full picture. That’s how they validated hundreds of high entropy alloys in months while academia spent 40 years validating 3,500.

What changes when science captures failure

If this seems obvious, it’s because it is. But it’s not how the incentive system works. Researchers don’t get tenure for failed experiments. Journals don’t publish negative results. Funding agencies don’t reward “here’s what doesn’t work.”

But what if they did? What if the scientific record included failure data by default?

Jorge sees this as the foundation of a new scientific method: “If scientific papers included failure data, the entire field would move faster because each researcher could see where the actual boundaries are.”

The first person to tackle a problem would run 100 experiments. They’d publish all 100 results. The next researcher wouldn’t start from zero. They’d start from the landscape map. They’d know which combinations are dead ends. They could focus on the five percent frontier instead of rediscovering the ninety-five percent swamp.

This is what Radical is building: a company where the scientific method doesn’t hide its losses. Everything gets captured. Everything gets learned from. And the feedback loop compresses decades into months.

The practical implication

For materials science, this matters acutely. Materials are the bottleneck on fusion, hypersonic flight, next-gen batteries, better solar cells. We’re not stuck because the theory is wrong. We’re stuck because the iteration is slow. And the iteration is slow because we refuse to learn from failure at scale.

When science finally embraces the full data set — when failure data is as publishable as success — the entire field accelerates. Not because the science changes, but because the learning loop tightens.

That’s why Jorge reads 200+ research papers and concluded: AI isn’t the missing ingredient in materials science. The missing ingredient is how we do science. Fix that, and everything compounds.

FAQ

Why don’t scientists publish their failure data today?

The academic incentive system rewards novel results, not comprehensive datasets. Publishing 95 failures alongside 5 successes doesn’t improve a researcher’s reputation. Journals have limited space. Funding agencies care about outcomes. The system is built to highlight winners, not expose the learning process.

What’s the difference between Radical’s lab data and data from a traditional materials lab?

Traditional labs are manual and lossy. A scientist takes notes, waits weeks for results, publishes the win. Radical’s lab is fully automated and systematic. Every piece of information — every microscopy scan, every composition, every property measurement — is captured instantly and fed directly to ML models. There’s no data loss, no manual transcription, no selection bias toward wins.

How much faster can you iterate with failure data in the loop?

Dramatic differences. Radical validated hundreds of high entropy alloys in months where academia validated 3,500 total in 40 years. That’s not because Radical’s lab is faster at running tests — though it is. It’s because the ML models understand the full landscape. They’re not just guessing. They’re learning from the 95%.

Does adding failure data make machine learning models less accurate?

The opposite. Negative examples are just as important as positive ones for training any classifier. A model trained on only winners doesn’t understand the boundary. A model trained on the full distribution — successes and failures — learns where the constraints actually are and adjusts accordingly.

Can traditional labs adopt this without building a Radical-style robotic system?

Partially. Any lab that commits to capturing and publishing comprehensive failure data will see immediate benefits in their own iteration. But the systematic capture part is hard manually. Humans are slow and selective. You’d need better data infrastructure just to keep up with the volume.

How does this apply beyond materials science?

This is a general principle. Drug discovery, semiconductor design, battery chemistry, protein folding — any domain where experimentation generates tons of data and most experiments fail. If the field captures and learns from failure at scale, it compresses timelines dramatically.

Doesn’t more data make the problem harder to solve, not easier?

It does add complexity if you’re doing this manually. But for machine learning, negative data is clarifying. It shrinks the search space. It tells you where not to look. Adding 10,000 failure points helps the model understand the topology better than having only 100 success points.

If Radical captured all this data, isn’t that a competitive advantage they’d want to keep secret?

Yes. The data itself is proprietary and valuable. But the principle — that failure data matters more than success data — is not secret. It’s a structural insight about how science should work. That principle is worth more than any single dataset because it applies across every scientific domain.

QC Note

Checks passed: Entity wiring in opening (Jorge Colindres, cofounder, Radical AI, robotic labs + ML). Opens with tension (5% bias). Three narrative sections build (current state → why it matters → what changes). Failure data explained via ML analogy (cat/not-cat). Specific numbers throughout (3,500 vs hundreds, 95% vs 5%, 200+ papers). FAQ covers buyer journey (why unpublished, how different, speed gains, accuracy, generalizability, competitive advantage). All answers 40-60 words. Body: 890 words. No H1 in body, no byline, no YouTube CTA.
Caught and fixed: Initial draft led with “the scientific method is broken” — too abstract. Reframed to open with the concrete 5% phenomenon (what people actually do) before the implication. FAQ #4 originally said “negative examples improve accuracy” — tightened to emphasize boundary-finding as the core value. FAQ #7 had press language “isn’t harder, it’s better” — reworded to distinguish data volume from utility.
Flags for Angelina: This post assumes some ML literacy (classification models, negative samples, boundary finding). The audience is technical but the analogy is accessible. The thesis (failure data is the real bottleneck) is distinct from the YouTube trailer/highlights FAQs which focus on origin story and system architecture.

Why Science Only Publishes 5% and Why That's Killing Progress

The hidden cost of ignoring failure

Why failure data is more valuable than success

What changes when science captures failure

The practical implication

FAQ

QC Note

More from Jorge Colindres

Related Insights

How AI Agents Work in the Physical World (And Why It's Harder Than Software)

Why Hypersonic Flight Is Stuck Waiting for Materials — Not Physics

The Spreadsheet Error That Almost Ended a Startup (And What It Teaches About Data)