How I Audit My AI Agents Each Quarter
Written by Adam Bair. Published 2026-05-29. Business of Law

When I first put AI agents in charge of recurring work in my practice, I assumed the hard part was the setup. Write the instructions, point the agent at the work, walk away. The agents would run, I would supervise lightly, and the system would compound.
That is not how it has played out. The setup is the easy part. The harder part is the quarterly review pass that catches drift, retires agents that have outlived their usefulness, and adjusts instructions to match a practice that has moved on. Without that review, a multi-agent system rots quietly. The output keeps shipping. The quality keeps slipping. By the time the slippage shows up in a published piece, three months of unreviewed output is already out in the world.
I am a Florida trial lawyer. I am non-technical. The system I run is not a research project; it is the marketing, content, brand, audience-research, and ORM layer of a working solo practice. This article is the audit pass I run every ninety days.
Why an audit is needed at all
Three forces push an AI team out of alignment over time.
The first is practice drift. The cases I take change. The audience I serve shifts. Topics that mattered last quarter stop mattering. An agent operating off six-month-old instructions is producing for a practice that no longer exists.
The second is voice drift. Even with a written voice spec, output gradually pulls toward generic AI register. A sentence here, a phrase there. The drift is invisible piece by piece and obvious in aggregate. By month three the work product reads less like me and more like a competent stranger.
The third is tool drift. The underlying models change. New capabilities open up. Old workflows that depended on a specific model behavior break or become inefficient. What was the right architecture six months ago is rarely the right architecture today.
The audit catches all three before they compound.
What I review, in order
The audit takes me about four hours spread over a week. I do it the same way each quarter so that nothing important gets skipped.
Agent roster. Every agent gets a yes-or-no question: is this agent producing work the practice still needs. The answer is sometimes no. An agent that wrote intake-funnel copy when I was building a funnel may have nothing to do six months later. Retire it. The roster gets shorter, not longer, in most quarters.
Output samples.I pull a representative slice of each surviving agent's recent output. Not the highlight reel. A random ten or twenty pieces. I read them aloud against the voice spec. Pieces that do not sound like me get flagged. Patterns in the flagged pieces tell me where the instructions have to tighten.
Instructions.Every agent's written instructions get a line-by-line read. The question is not whether the instructions are correct in the abstract. The question is whether they still describe the work the practice actually needs and the voice the practice actually has. Stale instructions get rewritten. Contradictions get resolved. Examples that no longer apply get replaced.
Hand-offs between agents. Agents that feed each other are where breakage hides. The marketing agent produces a brief. The content agent works from the brief. The image agent generates from the content. If any link in the chain has drifted, the whole chain produces something off. I trace one or two end-to-end runs each quarter to confirm the chain is intact.
Errors I have not caught.I read recent corrections. Pieces I edited heavily after the fact. Topics where the agent missed the point. Each of those is an instruction gap. Most of them get addressed by tightening the relevant agent's spec.
Cost and volume. What did the system produce, what did it cost, what was the marginal cost per piece. The numbers are not the goal of the practice, but they tell me whether the system is paying for itself or quietly bloating.
What changes after the audit
Each quarter produces a short list of edits. Most are small. A paragraph rewritten in an agent's instructions. A new example added. An obsolete topic removed from a backlog. A retired agent. A new specialist added to fill a gap that the practice has grown into.
The cumulative effect over a year is large. The system after four audits is not the same system it was at the start of the year. It is leaner, more specialized, more aligned with the practice as it actually is, and producing output that reads more like the practice's voice than it did before.
The skill is the same skill a senior lawyer uses to manage associates. Read the work. Note what is off. Have the conversation. Update the standing instructions. The mechanics are different because the associate is software, but the discipline is the same.
The audit catches things I would not have caught otherwise
Two examples from a recent cycle.
An agent producing audience-research summaries had drifted into citing the same three sources every week. The variety was an illusion. Reading the recent output in one sitting made the pattern obvious in fifteen minutes; reading the pieces individually as they shipped, week by week, hid it for two months. The instructions got a sourcing-diversity rule. The next month's output looked entirely different.
A content agent had started concluding articles with a hedge that I would never write. “If you have questions, consult an attorney.” The instructions did not require it; the agent had picked it up from somewhere and stuck with it. Reading thirty articles in a row turned the tic into a flashing light. The instructions got an explicit “do not write closing CYA paragraphs” rule. The closings cleaned up the next week.
Neither problem would have surfaced from reading single pieces. Both were obvious from a batch read. The quarterly audit creates the batch read.
What the audit is not
It is not a rebuild. The whole point of running the system on written instructions is that I do not have to reinvent it every ninety days. The audit is a maintenance pass, not a redesign.
It is not the only oversight. Each agent's output also goes through a per-piece review pass before it ships. The quarterly audit is the longer wavelength on top of the daily one. Both layers are needed.
It is not free. Four hours a quarter is not nothing. But four hours of audit time pays for itself in caught errors, reclaimed practice fit, and avoided embarrassment. The alternative is letting the system rot quietly until a published piece forces an emergency.
Why this is worth saying
A lot of lawyers I talk to who have started experimenting with AI fall into the same trap I almost did. They get the agent running, they like the early output, they assume the maintenance is going to be the same as the setup, and they walk away. Three months later the output is mediocre and they cannot say exactly when it became mediocre.
Treating an AI team like a team that needs management is the small mental shift that turns the system from a one-off experiment into a sustainable layer of the practice. The shift is not technical. It is operational discipline of the kind that lawyers already apply to associates, paralegals, and outside vendors.
The maintenance is the work. The work is the system. The audit is how the work stays honest.
Frequently Asked Questions
How often should a solo with a smaller AI footprint do this kind of audit?
The smaller the footprint, the faster the pass. A two- or three-agent setup is probably an hour every couple of months. The principle is the same: read the recent output in batch, against the original instructions, and ask what has drifted.
What if the audit finds the system is mostly fine?
Good. Write that down. Move on. Not every cycle produces a long list of edits, and a quarter where the output is on-voice and on-topic is a win, not a missed opportunity to find problems.
Do you log the audits?
Yes. Each cycle ends in a short note describing what was changed and why. The notes are useful for the next cycle and for the long-running question of how the practice and the agents have evolved together.
Is there a tool that automates this?
Not really, for the substantive review. Reading the output and judging whether it matches the practice is the lawyer's job. Some logging and metrics can be automated, and I do automate them. The aloud-read against the voice spec is not delegable.
How is this different from just supervising AI output piece by piece?
Per-piece supervision catches obvious errors. The quarterly audit catches the slow drifts that look fine in any single piece and bad in aggregate. Both are needed. Neither replaces the other.
Written by Adam Bair.
Adam Bair is a Florida trial lawyer pivoting into AI applied to legal work. A non-technical lawyer running a multi-agent AI system end to end. He writes about verification-first AI workflows for solo and small-firm practice. Verify his Florida Bar standing.
This article is general information about AI in legal practice and the business of law. It is not legal advice and does not create an attorney-client relationship.