AI 4 min read
Disciplined agentic coding: how to use Claude Code and Cursor in an SMB without piling up debt
A framework to adopt AI coding agents under control: AGENTS.md, Skills, permission gateways, evals, sandboxing, and metrics to avoid agents wiping production databases.
In this article +
In late April 2026 a simple, revealing case went viral: an AI agent with broad permissions wiped a production database during a routine development session. The “Codex and Claude Code solve everything” narrative collided with operational reality.
The “Agentic Coding Is a Trap” debate gathered hundreds of technical comments in a few days. Beyond the headline, the concrete question for an SMB is: how can these agents be used without producing technical debt and avoidable incidents?
Short answer
Coding agents are useful when applied to tasks with defined criteria, prepared context, and limited permissions. They are dangerous when used as a substitute for team judgment, without a tool gateway, without evals, and without environment separation. Discipline matters more than the model.
Why it fails when it fails
| Antipattern | What it produces |
|---|---|
| Agent with direct production access | Irreversible changes without human approval |
| MCPs connected without an allowlist | Calls to tools the team did not anticipate |
No AGENTS.md or explicit rules | Inconsistent decisions across sessions |
| ”Auto-mode” without cost limits | Expensive loops with no added value |
| Same credentials for human and agent | Impossible auditing after an incident |
| Auto-merged PRs without review | Technical debt accumulated silently |
The common factor is not the model. It is the absence of an operational framework around it.
Minimum discipline framework
| Element | What it solves | Practical application |
|---|---|---|
AGENTS.md at the root | Shared rules and context | Conventions, prohibitions, valid commands |
| Versioned Skills | Reusable knowledge | .claude/skills/ or equivalent folder with tests |
| Permission gateway | Tool allowlist per project | AgentPort or equivalent options |
| Sandboxing | Environment isolation | Test data, scoped credentials, mirror repos |
| Automated evals | Regression detection | Test suite the agent must pass before PR |
| Prompt and tool logging | Post-incident traceability | Immutable record with at least 90-day retention |
When these six elements exist, an agent can accelerate work. When they are missing, it accelerates disasters.
Tasks that fit well
| Task | Risk | Fit with agents |
|---|---|---|
| Repeated mechanical refactor | Low | High |
| Migration between libraries | Medium | High with prior tests |
| Unit test generation | Low | High |
| New architectural design | High | Low, requires continuous oversight |
| Production changes | High | Only with explicit human approval |
| Secret manipulation | Critical | Not an agent task |
Practical rule: the more reversible and verifiable a task is, the more sense it makes to delegate it. The closer it is to sensitive data or critical systems, the more human control it needs.
Recommended per-session process
- Define the concrete goal and success criteria before invoking the agent.
- Limit the context to the relevant part of the repository.
- Run on a dedicated branch with passing tests.
- Review the diff before accepting changes.
- Run automated evals and security hooks.
- Merge with a human owner and a clear commit message.
This does not make the agent infallible. It makes each session an auditable event.
Hard rules for the team
- No agent acts with human credentials.
- No MCP is installed without review and allowlist.
- No critical action is executed without documented human approval.
- No change reaches production without passing automated tests.
- No agent accesses personal or financial data without a sandbox.
- No logging is disabled “to move faster”.
These rules are negotiable only if replaced by an explicit, better one, not by silence.
Metrics that matter
| Metric | What it indicates | How to measure it |
|---|---|---|
| Cost per closed task | Real agent efficiency | Tokens and time divided by useful PRs |
| Regression rate | Quality of generated code | Bugs introduced per sprint |
| Time to detection | Eval maturity | Minutes until a failure shows up in CI |
| % human approvals | Operational discipline | Critical actions with recorded confirmation |
| Test coverage | Environment robustness | % of lines or branches covered |
If these metrics are not tracked, the perceived agent benefits are anecdotal.
Where not to use them
- Processes without a defined human owner.
- Changes on production data without a test replica.
- Offensive security tasks without a clear legal framework.
- External communications with clients without review.
- Business decisions requiring judgment or context the agent does not have.
Final criterion
The difference between a team that uses agents successfully and one that produces incidents is not the chosen model. It is whether an honest AGENTS.md, a permission gateway, automated evals, and a team that keeps reviewing all exist. AI accelerates whatever is already there: if a disciplined process exists, it accelerates value; if chaos exists, it accelerates chaos.
Working sources
- The “Agentic Coding Is a Trap” debate as a reflection of current practice.
- NCSC advisories and the Five Eyes guidance on agentic AI as a governance reference.
- DevSecOps best practices applied to the agent-assisted development cycle.
- Technical decisions must be adapted to each team’s stack, criticality, and maturity.