When 97% Is Good Enough

I can seed an ElasticSearch index with some documents, write a function that queries it, and assert that I'll get back a result with a specific title. Every time. 100%. I can push that to production knowing it works.

Now let's add a hybrid RAG endpoint with an LLM doing semantic search.

Same test. Same assertion. And... it fails 3% of the time.

What the fuck is broken?

Nothing's Broken

That 3% failure rate isn't a bug. It's just... how these things work. The model interprets the query slightly differently sometimes. The embedding lands in a slightly different spot in vector space. The result is still good, just not the exact one I asserted on.

And here's what's been doing my head in: every other type of engineer has been dealing with this shit forever.

Civil engineers don't expect concrete to be exactly 4000 PSI. They know it'll be somewhere around there - 3800, 4200, whatever. They design for that uncertainty. Chemical engineers celebrate 95% yield. That's a good day.

But us? Software engineers? We've been living in deterministic paradise. 2 + 2 always equals 4. A function with the same inputs gives the same outputs. Tests are pass/fail, green/red, works/doesn't.

We're spoiled.

We're the Weird Ones

Process engineers deal with humans in the loop. Humans! With error rates! And they just design around it. They know operators will fuck up sometimes, so they build systems that catch errors, reduce them, work despite them.

Meanwhile I'm having an existential crisis because my AI search returned the second-best result instead of the first.

Testing Gets Weird

Traditional test:

it('returns the auth guide', () => {
  const results = search('user authentication');
  expect(results[0].title).toBe('Authentication Guide');
});

Pass or fail. Beautiful.

New reality:

it('returns relevant auth content', () => {
  const results = search('user authentication');
  const relevance = assessRelevance(results[0], 'authentication');
  expect(relevance).toBeGreaterThan(0.8);
});

That shift from toBe to toBeGreaterThan is a whole different philosophy.

And here's where it gets properly weird: I've started using AI to test AI. Running the search results through Claude to ask "are these results good enough for this query?" and asserting on that.

Using a probabilistic system to validate a probabilistic system.

But Some Shit Has to Be Exact

You can't be probabilistic with payment processing. Or medical dosing. Or authentication.

"You're probably logged in" doesn't work. "We're pretty sure we sent the money to the right account" doesn't work. "The patient will receive approximately 10ml" really doesn't work.

So we're not replacing deterministic engineering. We're adding a new track alongside it:

Must Be Exact: Financial transactions, authentication/authorization, data integrity, medical systems.

Good Enough Is Fine: Search results, content recommendations, natural language interfaces, summarization.

The skill is knowing which track you're on.

The Contamination Problem

This is what scares me:

const extractedDate = aiModel.parseDate(document); // 95% confident
if (extractedDate < deadline) {
  approveTransaction(); // PERMANENT
}

The probabilistic output flows into deterministic code. The if statement doesn't know it's working with a "probably". It treats that 95%-confident date as gospel truth.

Your database happily stores the maybe-correct value. Your audit logs record the possibly-wrong date. Everything downstream is built on sand but thinks it's on bedrock.

What We Do About It

I think we need to get over ourselves. Other engineers have dealt with this forever. They have tolerances, safety margins, confidence intervals, "good enough" thresholds.

They don't pretend the uncertainty doesn't exist. They measure it, plan for it, work with it.

Maybe that's what AI is forcing on us. We're finally joining the rest of engineering where nothing is certain, everything has error bars, and somehow things still work.

Once you accept that 97% is exactly what you need, it's actually liberating.

Welcome to real engineering.