You can be bad at this

I've worked with AI tooling daily for a few months, cross-repo features, developer tooling, docs. The most useful thing I can tell you about it is that you can be bad at it. What follows is the system I use to be less bad, deliberately.

A talk about catching your own blind spots that doesn't catch its own is just hype. So this page does the thing it describes: the two reviewers I ran on the files that produced it are reading along with you. Their notes land in the margins. I didn't soften them.

01I keep notes on what works and what doesn't

The core of it is a mistake log. Append-only, seven weeks old, 54 entries, two working cycles of rule promotions. Alongside it sit 45 memory files, each naming a specific failure mode with a why and a how-to-apply.

Seven weeks isn't a long-running practice. That's the point, it paid off this fast because the logging is disciplined, not because I've been at it for years.

02the loop: plan → check → review → do

Most of my time now sits at the planning end of that loop. I plan maybe 90% of the time now; the implementation falls out of a good plan and has become the cheap part.

The counter-example keeps me honest: the one time I skipped the planning step on a rewrite, I paid a whole wrong-architecture round for it. Planning isn't immunity to a bad assumption. It's just where the leverage is.

so how do we actually do it? Two reviewers. One of them a bastard. Watch.

03the bastard review

That "review" stage is two independent passes on the same work: a competent peer who checks every fact, and an annoying colleague briefed to catch what I'm hiding. Neither sees the other's notes. Here's that pair, run live on the four files that produced this talk.

exhibit a · a post-mortem

self-review · draftv1

Across 32 daily notes this quarter I consistently caught my own mistakes early. This isn't a highlight reel, I included the failures too. The process is working as designed.

③ meta-review · the reckoning

The peer caught 2 things that are wrong: a count, a word. The bastard caught 3 things that aren't wrong, they're hollow.

they overlapped on 0

↳ Near-zero overlap is usually the signal, each catches what the other can't. The margins of this page are that same pair, running on me.

The value is usually the gap between them, each catches what the other can't. But sometimes the peer and the bastard land on the same finding, and that isn't the process failing. It's the loudest signal there is: the thing is so plainly broken that both of them caught it. Fix that one first.

It cuts both ways though: on one morning my own bastard pass got a few things wrong and the fix was eight grounding gates, now baked into the skill.

04where the methodology stops

If a methodology talk has no section on where it stops working, it's a sales pitch. Mine has one because the data has one. Even hook-blocked rules recur: one habit fired eight times in five weeks despite a hook that catches every single instance. The hook fixes the symptom; the reflex underneath is unchanged.

the meta · it demonstrates itselfThree of those eight fired while this talk was being written. The claim proved itself three times during prep. About as clean as evidence gets.

05what I haven't measured

Cost in tokens and money: unmeasured. A real non-AI baseline: directional only. The false-negative rate of the mistake log: unknown, five self-catches went unlogged on one day alone.

I'd rather name the gaps than have the room name them for me. And the deepest gap is the one this page is built on, an audit produced by the method can only ever catch what the method already sees. Saying that out loud is what turns the weakness into the frame.

THE BASTARD REVIEW · a reusable prompt Why: one review pass misses one of two failure shapes. A careful peer catches line-level mistakes (a wrong number, broken logic, a claim that doesn't match the source) but is blind to systemic ones (the argument grading its own homework, the missing context, the assumption nobody named). An adversarial reviewer catches the shape but skims the detail. Run both, independently. The value is in how little they overlap: if both find the same things, the bastard wasn't bastard enough. How: paste everything below into an AI, then paste in the thing you want reviewed. An essay, a plan, a design, some code. .......................................................... Review the artefact I give you in two independent passes, then reconcile them. Do the passes in strict order, and don't let a later pass soften an earlier one. PASS 1 · THE PEER. You are a sharp, experienced reviewer who checks claims against the source instead of taking the artefact's word for itself. Find: - Correctness errors: wrong logic, off-by-one, broken steps, contradictions. - Factual errors: numbers, dates, or claims that don't match the source. - Structure: redundancy, a buried headline, things it promises but never delivers. - Gaps: the case or branch that nothing here actually covers. Sample specific claims and verify them. Grade each finding [MAJOR] (blocks it), [SUGGESTION] (worth fixing), or [NITPICK] (taste). Quote the exact line and say how you would fix it. PASS 2 · THE BASTARD. Now become a different reviewer: an annoying colleague who does you the favour of being harsh now rather than letting you ship the wrong-shaped thing. Hard to please, looking to catch you out, not interested in being nice. Find what the careful peer cannot see by construction: - What the artefact is NOT saying. Honesty used as a credibility device is not honesty. "I am being self-critical" is itself a performance. Flag sentences that narrate a virtue instead of demonstrating it. - Methodology grading its own homework: does it use the very process it is selling to justify itself? - Selection bias the author did not own (a big number, but only the flattering few quoted). - Numbers that look precise but are not. - What is missing from the picture entirely: the cost, the baseline, the risk, the other people, the worst case. - For code: what it is NOT changing that it should; failure branches with no test; assumptions about input; downstream callers left unupdated. Then write 5 hostile questions a sceptic would ask that the author could not answer in 30 seconds as it stands. Be specific. Quote the offending line. "The framing is performative" is useless; name the sentence and say why. GROUNDING (both passes): every quote, number, and reference must come from the artefact in front of you, not from memory. Do not manufacture findings. Tie severity to how likely the problem is to actually bite, not how bad the hypothetical sounds. If you are stacking up MAJORs, your bar is probably wrong. PASS 3 · RECONCILE. Compare the two sets: - Overlap (both caught it): usually the highest-confidence findings. If it is most of them, the bastard pass was not harsh enough. - Only the peer: the "right in detail" findings. - Only the bastard: the "right in shape" findings, the blind spots. - Either reviewer wrong: both passes can be wrong. Check against the source and strike anything you cannot stand behind. Output one ranked list ([MAJOR], then [SUGGESTION], then [NITPICK]), each with its location, a plain-spoken description, and the fix. End with the single most important thing the double pass caught that one pass would have missed. Tip: for true independence, run Pass 2 in a fresh chat that never saw Pass 1.