Fundamentals

The Two Types of Loop

Deterministic vs. LLM-as-judge — and which one to reach for

The Idea

You've heard about loops by now — give Claude a goal and it works on its own until it hits it. But there are two main types, and they work in completely different ways. Knowing which is which before you start is the difference between a loop that finishes and one that runs for three hours and never lands.

The whole thing comes down to one question: can the goal be measured, or does it need judgment?

Type 1 — Deterministic Loops

The ones you can measure. You give the loop a target it can check for itself — a number — so it knows, with no ambiguity, exactly when it's done:

✓Every page loads in under 50ms
✓Zero errors in the logs
✓100% test coverage
✓All 100 test scenarios pass

There's a hard finish line, so the loop can't kid itself. It measures, sees it's not there yet, fixes something, measures again — and stops the moment the number is hit. This is the reliable kind. If you can possibly turn your goal into one of these, do.

Type 2 — LLM as a Judge

For the fuzzy stuff, where there's no clean number to hit. "Refactor until the code's clean." "Improve the docs until they actually make sense." You can't put a number on "clean" or "makes sense" — so instead you let the AI judge its own work against a rubric, and decide when it's good enough.

It's far more powerful for quality, clarity, and design — the things that genuinely need taste. But it's less reliable, because you're trusting its judgment. A vague goal and it'll either stop too early or polish forever. Which is exactly why the rubric matters.

Side By Side

Deterministic

Finish line is a number
It checks itself — no opinion involved
Reliable; knows exactly when to stop
Best for: speed, errors, tests, coverage

LLM as a judge

Finish line is a rubric
It grades its own work against your criteria
More powerful, but less reliable
Best for: refactors, docs, writing, design

So What's a Rubric?

A rubric is the scorecard you hand the AI so it can judge its own work — a written list of what "good" actually looks like, broken into specific, checkable points. It's what turns "make it better" (which is just vibes — it'll stop wherever) into something the loop can grade itself against on every pass.

The art is writing criteria that are concrete enough to judge. "Clean code" is useless as a rubric. Here's the same goal written so the AI can actually score itself against it:

Rubric: "the code is clean"

☐No file is longer than ~300 lines
☐No function is doing more than one job
☐Names say what things actually do — no cryptic abbreviations
☐No duplicated logic — shared code lives in one place
☐A new developer could find where to add a feature in under a minute
☐All the tests still pass

Each line is something the AI can look at a file and honestly say yes or no to. Describe the bar, not the steps — what "done" looks like, not how to get there. The more specific each criterion, the more reliable the judge.

The Rule

If you can measure it, measure it

Always try to distill the goal down to something deterministic it can check for itself. A number is more reliable than a judgment, every time — so reach for the rubric only when you genuinely can't draw a clean line.

If it needs judgment, give it a rubric

For the fuzzy goals, hand it to the AI as a judge — but never empty-handed. The more specific your rubric, the closer an LLM-judge loop gets to behaving like a deterministic one.

Want to actually run some? The 7 loop examples are mostly deterministic; the Verify Loop skill builds the verification (number or rubric) in for you.

Want a hand putting this into practice? I do one-to-one AI coaching — wherever you're starting from.Learn more