darxai: engineering, AI, and cybersecurity darxai
Back to blog
Disciplined agentic coding: how to use Claude Code and Cursor in an SMB without piling up debt

AI 4 min read

Disciplined agentic coding: how to use Claude Code and Cursor in an SMB without piling up debt

A framework to adopt AI coding agents under control: AGENTS.md, Skills, permission gateways, evals, sandboxing, and metrics to avoid agents wiping production databases.

In this article +

In late April 2026 a simple, revealing case went viral: an AI agent with broad permissions wiped a production database during a routine development session. The “Codex and Claude Code solve everything” narrative collided with operational reality.

The “Agentic Coding Is a Trap” debate gathered hundreds of technical comments in a few days. Beyond the headline, the concrete question for an SMB is: how can these agents be used without producing technical debt and avoidable incidents?

Short answer

Coding agents are useful when applied to tasks with defined criteria, prepared context, and limited permissions. They are dangerous when used as a substitute for team judgment, without a tool gateway, without evals, and without environment separation. Discipline matters more than the model.

Why it fails when it fails

AntipatternWhat it produces
Agent with direct production accessIrreversible changes without human approval
MCPs connected without an allowlistCalls to tools the team did not anticipate
No AGENTS.md or explicit rulesInconsistent decisions across sessions
”Auto-mode” without cost limitsExpensive loops with no added value
Same credentials for human and agentImpossible auditing after an incident
Auto-merged PRs without reviewTechnical debt accumulated silently

The common factor is not the model. It is the absence of an operational framework around it.

Minimum discipline framework

ElementWhat it solvesPractical application
AGENTS.md at the rootShared rules and contextConventions, prohibitions, valid commands
Versioned SkillsReusable knowledge.claude/skills/ or equivalent folder with tests
Permission gatewayTool allowlist per projectAgentPort or equivalent options
SandboxingEnvironment isolationTest data, scoped credentials, mirror repos
Automated evalsRegression detectionTest suite the agent must pass before PR
Prompt and tool loggingPost-incident traceabilityImmutable record with at least 90-day retention

When these six elements exist, an agent can accelerate work. When they are missing, it accelerates disasters.

Tasks that fit well

TaskRiskFit with agents
Repeated mechanical refactorLowHigh
Migration between librariesMediumHigh with prior tests
Unit test generationLowHigh
New architectural designHighLow, requires continuous oversight
Production changesHighOnly with explicit human approval
Secret manipulationCriticalNot an agent task

Practical rule: the more reversible and verifiable a task is, the more sense it makes to delegate it. The closer it is to sensitive data or critical systems, the more human control it needs.

  1. Define the concrete goal and success criteria before invoking the agent.
  2. Limit the context to the relevant part of the repository.
  3. Run on a dedicated branch with passing tests.
  4. Review the diff before accepting changes.
  5. Run automated evals and security hooks.
  6. Merge with a human owner and a clear commit message.

This does not make the agent infallible. It makes each session an auditable event.

Hard rules for the team

  1. No agent acts with human credentials.
  2. No MCP is installed without review and allowlist.
  3. No critical action is executed without documented human approval.
  4. No change reaches production without passing automated tests.
  5. No agent accesses personal or financial data without a sandbox.
  6. No logging is disabled “to move faster”.

These rules are negotiable only if replaced by an explicit, better one, not by silence.

Metrics that matter

MetricWhat it indicatesHow to measure it
Cost per closed taskReal agent efficiencyTokens and time divided by useful PRs
Regression rateQuality of generated codeBugs introduced per sprint
Time to detectionEval maturityMinutes until a failure shows up in CI
% human approvalsOperational disciplineCritical actions with recorded confirmation
Test coverageEnvironment robustness% of lines or branches covered

If these metrics are not tracked, the perceived agent benefits are anecdotal.

Where not to use them

  • Processes without a defined human owner.
  • Changes on production data without a test replica.
  • Offensive security tasks without a clear legal framework.
  • External communications with clients without review.
  • Business decisions requiring judgment or context the agent does not have.

Final criterion

The difference between a team that uses agents successfully and one that produces incidents is not the chosen model. It is whether an honest AGENTS.md, a permission gateway, automated evals, and a team that keeps reviewing all exist. AI accelerates whatever is already there: if a disciplined process exists, it accelerates value; if chaos exists, it accelerates chaos.

Working sources

  • The “Agentic Coding Is a Trap” debate as a reflection of current practice.
  • NCSC advisories and the Five Eyes guidance on agentic AI as a governance reference.
  • DevSecOps best practices applied to the agent-assisted development cycle.
  • Technical decisions must be adapted to each team’s stack, criticality, and maturity.

Next step

Apply ai automation to your company?

We automate repetitive processes with applied AI, agents, RAG, and integrations so your team works with less friction and more control.