BLOG · GUIDE

Measuring code coherence: what a Vibe Drift Score actually tells you

June 11, 2026 · 8 min read

Code coherence is how much a codebase agrees with itself. It is not the same as "clean" and not the same as "complex", and the Vibe Drift Score puts a number on it. This post explains what the 0-100 score and its letter grade measure, the five detectors behind it, what one finding looks like, and how to read your own result over time.

Most quality tools answer a different question. A linter asks "does this line break a rule?" A complexity check asks "how tangled is this function?" A coverage report asks "how much is tested?" None of them ask whether a new file contradicts the fifty files before it. That gap is where AI-written code drifts, and a coherence metric is what closes it.

What code coherence is, and what it isn't

Three things get mixed up all the time, so it helps to pull them apart:

+Clean is about readability: clear names, small functions, no dead branches. A single file can be clean on its own.
+Complex is about intricacy: branches, nesting, coupling. You can also measure it file by file.
+Coherent is about agreement between files: does this module handle errors, auth, and data access the same way the rest of the project does? Coherence only exists in relation to the whole.

A codebase can be clean and incoherent at once: every file is tidy, but half throw typed errors and half return plain { status, error } objects. It can be coherent and a bit complex: dense, but every file solves its problem the same way. Coherence is the axis AI coding tools quietly erode, because each session re-decides conventions it has no memory of.

Coherence isn't how good any one file is. It's whether the files agree with each other about how the app should behave.

What the 0-100 score and grade mean

The Vibe Drift Score is one number from 0 to 100, paired with a letter grade, that says how coherent your codebase is. Higher means the codebase agrees with itself more. Lower means more files break from the patterns the rest of the project set. The grade is a quick band on top of the number, so one glance tells you whether you are looking at a settled codebase or a fracturing one.

A
90-100
settled
B
80-89
mostly aligned
C
70-79
some drift
D
60-69
real contradictions
F
0-59
fracturing

Each grade is a band on the 0-100 score, not a pass/fail verdict.

The key point: a low score does not mean your code is broken. Every drifting file can compile, pass tests, pass the linter, and serve real users. The score measures contradiction, not correctness. A 58 doesn't say "42% of your code is wrong"; it says "a fair share of your files behave differently from their peers, and here is where."

Read the raw number for one thing: to find your worst contradictions on the first scan. After that, the direction matters far more than the digit, which is why the score is built to be tracked, not framed.

The five detectors behind the score

The score isn't one heuristic. It rolls up five detectors, each looking at a different way the codebase can disagree with itself:

Architectural consistency — do files that solve the same kind of problem solve it the same way, or are there competing approaches (the repository pattern in seven services, raw SQL in the eighth)?
Security posture — are auth, input validation, and rate limiting applied everywhere, or did some routes get built in a session where the security pattern wasn't in context?
Redundancy — is there duplicate logic: functions the AI rebuilt without knowing an equivalent already existed?
Convention adherence — do naming, imports, error shapes, and async patterns stay the same across the project?
Scaffolding hygiene — is there generated code that does nothing: phantom CRUD handlers that are never routed, tested, or called?

Each detector feeds the score, and the report breaks the score out by category so you can see which one is dragging it down. Two codebases can score low for completely different reasons: one has three competing data-access patterns; the other is missing auth on a couple of admin routes. Same low number, different fix.

What a single finding looks like

The score is the headline; the findings are where the value is. Every finding has the same three parts, and that shape is the whole point of a coherence metric:

+The dominant behavior — what the rest of the codebase already does. This comes from your code, not from an outside rulebook.
+The deviating files — the exact files and lines that break from that behavior.
+A targeted fix — what to change to bring the outliers back in line.

For example: "14 of 16 handlers call requireAuth(req) before touching the database. orderHandler.ts and reportHandler.ts don't. Add the auth check to match." That is something you can act on; a raw score isn't. The number tells you something is off; the finding tells you what, where, and what to do about it.

VibeDrift doesn't enforce rules you wrote down. It learns the conventions your code already follows and flags where they break. The "dominant pattern" in every finding comes from your own files, which is why it survives refactors that a static .eslintrc would not.

Local vs deep: two ways to compute the score

There are two ways to score, and they differ in how deeply they read your code.

Local scan

The local scan reads structure only. It fingerprints your code's patterns (its "Code DNA") and compares files against each other, without ever sending anything off your machine. It is one command, runs in a couple of seconds, and is free, open source, and unlimited:

npx @vibedrift/cli .

That gives you a real score from the same five detectors, built from what is visible in the code's structure. For most day-to-day work, it is the number you will look at.

Deep scan

The deep scan adds Claude-checked meaning on top. It catches drift you can't see from structure alone: two functions that look different but do the same thing, or a function whose name promises one thing while its body does another. You get a fuller report, with these duplicate and name-vs-behavior findings and their fixes:

vibedrift . --deep

The deep scan uses your deep-scan budget (the free tier includes one a month). Only function snippets are read, never whole files and never git history. Use the local scan as your fast, always-on signal, and the deep scan when you want the score checked by something that understands what the code actually does, not just how it looks.

How to read your own score

Here is what a first run looks like. The numbers below are illustrative, not a benchmark — your codebase will land wherever it lands:

Scanning 63 files · 7,420 LOC · TypeScript
✓ Static analysis .............. 0.9s
✓ Cross-file drift ............. 0.5s
✓ Code DNA ..................... 0.04s

58/100 · Grade D · 7 findings

  Architectural consistency .... 51   3 competing data-access patterns
  Security posture ............. 62   auth missing on 2 admin routes
  Redundancy ................... 70   2 near-duplicate helpers
  Convention adherence ......... 49   error shapes split 2 ways
  Scaffolding hygiene .......... 66   1 unrouted CRUD module

Report: ./vibedrift-report.html

Read it in this order:

Find the lowest category, not the lowest line. Here the score is 58, but the real story is architectural consistency at 51 and convention adherence at 49. Those two are dragging the number; fixing them moves it the most.
Open the worst findings. "3 competing data-access patterns" means three files solve the same problem three ways. The finding names them and names the one to collapse onto.
Treat the grade as a band, not a verdict. A D means "several real contradictions," not "bad engineers." Plenty of shipping, profitable codebases sit in the D-to-C range simply because they were built across many AI sessions.

Don't chase 100. A perfect score on a large, growing codebase isn't the goal and often isn't worth the churn. The goal is knowing where the contradictions are and keeping the number from sliding.

Using the score over time

A single reading is a snapshot. The metric earns its keep when you watch it move.

Track the change between scans. After each scan, watch whether the number went up or down since last time. A score that slides from 64 to 58 over a sprint is telling you new work is contradicting old work faster than you are cleaning it up — long before any of it turns into a visible bug. The free tier shows the change between scans, and Pro adds the trend over time so you can see the curve, not just the last two points.

Gate CI on a threshold. The most lasting use is making the score a merge condition. Run it as a JSON check that fails the build when coherence drops below a floor you set:

npx @vibedrift/cli . --json --fail-on-score 70

With VIBEDRIFT_TOKEN in your CI secrets, this blocks any pull request that would push the codebase below 70. New code now has to be at least as coherent as what is already there to merge — drift detection turns into drift prevention. The full workflow, including how to pick a sensible threshold, is in how to gate CI on a drift score.

Pick a floor at or just below your current score, then raise it as you clean things up. Setting --fail-on-score far above where you are today just blocks every merge until the backlog is paid off — start where you are and climb.

The point of a number

Code coherence used to be a feeling: that vague sense that a codebase has stopped agreeing with itself, the thing you notice when one handler looks nothing like the one beside it. The Vibe Drift Score turns that feeling into a metric you can read, compare, and gate on. It won't tell you your code is clean or fast. It tells you something no linter or complexity tool does: whether your codebase still behaves like one codebase. For the longer story on why AI-written code drifts in the first place, see your AI codebase is drifting.

Get your own number in a couple of seconds. No signup, and the code never leaves your machine:

npx @vibedrift/cli .

See the per-detector breakdown and CI setup in the guide, or compare plans on pricing.

Frequently asked questions

What is code coherence?

Code coherence is how much a codebase agrees with itself: whether files solving the same kind of problem solve it the same way, whether conventions, error shapes, and security checks are applied uniformly, and whether the structure follows one philosophy instead of several. It is distinct from 'clean' (readable) and 'complex' (how intricate) code. A coherent codebase can be terse or verbose; what matters is that it is internally consistent.

What is a Vibe Drift Score?

The Vibe Drift Score is VibeDrift's composite 0-100 code consistency metric plus a letter grade. It is built from five detectors (architectural consistency, security posture, redundancy, convention adherence, and scaffolding hygiene) and measures coherence rather than cleanliness or complexity. A higher score means the codebase agrees with itself more; a lower score means more files contradict the patterns the rest of the project established.

Does a low Vibe Drift Score mean my code is bad?

No. A low score means inconsistent, not broken. Code can pass every test, ship to production, and still score poorly because different AI sessions made different but reasonable decisions about auth, error handling, or data access. The score points you at the contradictions so you can collapse the codebase back onto one pattern.

How should I use the score over time?

Read the absolute number once to find your worst contradictions, then track the scan-over-scan delta to see whether new work is raising or lowering coherence. In CI, gate merges with a threshold so a pull request can't drop the score below a floor you set. The trend matters more than any single reading.

Local scans and the MCP tools are free and open source, forever. The free tier includes 1 deep scan a month; Pro is $15/mo for 12, and you can top up 5 more for $10 on any plan. Credits never expire.

Website · MCP guide · npm