The Two Types of Loop
Deterministic vs. LLM-as-judge — and which one to reach for
The Idea
You've heard about loops by now — give Claude a goal and it works on its own until it hits it. But there are two main types, and they work in completely different ways. Knowing which is which before you start is the difference between a loop that finishes and one that runs for three hours and never lands.
The whole thing comes down to one question: can the goal be measured, or does it need judgment?
Type 1 — Deterministic Loops
The ones you can measure. You give the loop a target it can check for itself — a number — so it knows, with no ambiguity, exactly when it's done:
- ✓Every page loads in under 50ms
- ✓Zero errors in the logs
- ✓100% test coverage
- ✓All 100 test scenarios pass
There's a hard finish line, so the loop can't kid itself. It measures, sees it's not there yet, fixes something, measures again — and stops the moment the number is hit. This is the reliable kind. If you can possibly turn your goal into one of these, do.
Type 2 — LLM as a Judge
For the fuzzy stuff, where there's no clean number to hit. "Refactor until the code's clean." "Improve the docs until they actually make sense." You can't put a number on "clean" or "makes sense" — so instead you let the AI judge its own work against a rubric, and decide when it's good enough.
It's far more powerful for quality, clarity, and design — the things that genuinely need taste. But it's less reliable, because you're trusting its judgment. A vague goal and it'll either stop too early or polish forever. Which is exactly why the rubric matters.
Side By Side
Deterministic
- Finish line is a number
- It checks itself — no opinion involved
- Reliable; knows exactly when to stop
- Best for: speed, errors, tests, coverage
LLM as a judge
- Finish line is a rubric
- It grades its own work against your criteria
- More powerful, but less reliable
- Best for: refactors, docs, writing, design
So What's a Rubric?
A rubric is the scorecard you hand the AI so it can judge its own work — a written list of what "good" actually looks like, broken into specific, checkable points. It's what turns "make it better" (which is just vibes — it'll stop wherever) into something the loop can grade itself against on every pass.
The art is writing criteria that are concrete enough to judge. "Clean code" is useless as a rubric. Here's the same goal written so the AI can actually score itself against it:
- ☐No file is longer than ~300 lines
- ☐No function is doing more than one job
- ☐Names say what things actually do — no cryptic abbreviations
- ☐No duplicated logic — shared code lives in one place
- ☐A new developer could find where to add a feature in under a minute
- ☐All the tests still pass
Each line is something the AI can look at a file and honestly say yes or no to. Describe the bar, not the steps — what "done" looks like, not how to get there. The more specific each criterion, the more reliable the judge.
The Rule
If you can measure it, measure it
Always try to distill the goal down to something deterministic it can check for itself. A number is more reliable than a judgment, every time — so reach for the rubric only when you genuinely can't draw a clean line.
If it needs judgment, give it a rubric
For the fuzzy goals, hand it to the AI as a judge — but never empty-handed. The more specific your rubric, the closer an LLM-judge loop gets to behaving like a deterministic one.
Want to actually run some? The 7 loop examples are mostly deterministic; the Verify Loop skill builds the verification (number or rubric) in for you.