Helder Burato Berto

Posted on Mar 24 • Originally published at helderberto.com

AI Writes Code. You Own Quality.

#ai #software #development #programming

The more I use AI tools like Claude Code, the clearer it becomes: engineering skills are what make AI output worth shipping.

AI makes writing code faster. But shipping good software still requires the same judgment it always did. Speed without engineering discipline just means shipping bugs faster.

You Own the Code

AI is a tool in your toolset. Like a compiler, a linter, or a test runner. It doesn't own the code. You do.

When something breaks in production, nobody asks "which AI generated this?" They ask who shipped it. The PR has your name on it. The review was your responsibility. The decision to merge was yours.

AI is a multiplier. If your engineering skills are weak, it multiplies that too.

What AI Can't Do For You

Think about edge cases. AI covers the happy path. You guide it to the edges.
Understand the system. AI sees the file. You see the architecture.
Make tradeoffs. AI doesn't know your team's priorities, deadlines, or tech debt tolerance.
Carry team context. You were in the meeting where the team decided to deprecate that service. You know the naming conventions, the architectural decisions, the "we tried X and it didn't work" history. AI has none of that unless you provide it.

Guide AI With Tests

Red-Green-Refactor

TDD becomes even more powerful with AI. The engineer defines WHAT to test. AI handles the HOW.

Red. Write failing tests that cover expected behavior and edge cases:

describe('SearchFilter', () => {
  it('renders input with placeholder', () => {
    render(<SearchFilter onSearch={vi.fn()} />);
    expect(screen.getByPlaceholderText('Search products...')).toBeInTheDocument();
  });

  it('calls onSearch after user stops typing', async () => {
    const onSearch = vi.fn();
    render(<SearchFilter onSearch={onSearch} debounceMs={300} />);

    await userEvent.type(screen.getByRole('searchbox'), 'shoes');

    expect(onSearch).not.toHaveBeenCalled();
    await waitFor(() => expect(onSearch).toHaveBeenCalledWith('shoes'));
  });

  it('does not call onSearch for empty input', async () => {
    const onSearch = vi.fn();
    render(<SearchFilter onSearch={onSearch} debounceMs={300} />);

    await userEvent.type(screen.getByRole('searchbox'), 'a');
    await userEvent.clear(screen.getByRole('searchbox'));

    await waitFor(() => expect(onSearch).not.toHaveBeenCalled());
  });

  it('shows loading spinner while searching', () => {
    render(<SearchFilter onSearch={vi.fn()} isLoading />);
    expect(screen.getByRole('status')).toBeInTheDocument();
  });

  it('trims whitespace before calling onSearch', async () => {
    const onSearch = vi.fn();
    render(<SearchFilter onSearch={onSearch} debounceMs={300} />);

    await userEvent.type(screen.getByRole('searchbox'), '  shoes  ');
    await waitFor(() => expect(onSearch).toHaveBeenCalledWith('shoes'));
  });
});

You wrote zero implementation. But you defined the component's contract, its edge cases, and its behavior. That's engineering.

Green. AI implements the minimal code to pass all tests.

Refactor. You guide AI to clean up. Extract helpers, apply single responsibility, name things clearly. The goal: make it easy for the next engineer who touches this code.

Without test discipline, AI gives you untested code that "looks right." With TDD, AI works within constraints you defined.

Cover Entire Flows With E2E Tests

Unit tests verify pieces. E2E tests verify the whole flow works together.

AI can scaffold e2e tests, but you define which flows are critical. A checkout flow, an authentication sequence, a data export pipeline. These are decisions that require understanding the business, not just the code.

test('user completes checkout flow', async ({ page }) => {
  await page.goto('/products');
  await page.click('[data-testid="add-to-cart"]');
  await page.click('[data-testid="checkout"]');
  await page.fill('#email', 'test@example.com');
  await page.fill('#card-number', '4242424242424242');
  await page.click('[data-testid="place-order"]');
  await expect(page.locator('.confirmation')).toBeVisible();
});

You defined the critical path. AI can fill in the details, add assertions, handle setup/teardown. But the decision of WHAT to test end-to-end is yours. The same applies to edge cases: what happens when payment fails? When the session expires mid-checkout? When the cart is empty? You define those scenarios. AI writes the assertions.

Enforce Standards Before Code Ships

Standards only matter if they're enforced. Three layers:

Linting rules. Create rules that encode team conventions. AI follows them when configured, but you need to know which rules matter for your codebase.

Git hooks. Pre-push hooks that run linting and tests. Code that doesn't pass doesn't ship. No exceptions, not even for AI-generated code.

AI tool hooks. Tools like Claude Code support hooks that intercept actions and enforce standards automatically. Run lint before every commit. Run tests before every push. The AI operates within guardrails you defined.

The engineer's job: define the guardrails. AI works within them.

Close the Loop With Verification Tools

Every AI-generated change needs verification. The faster you catch problems, the faster AI can fix them.

Verification feedback loops exist at every level: CI pipelines that run visual regression tests, browser automation that captures screenshots, performance audits that flag regressions. The principle is the same. Every output becomes input for the next iteration.

In my workflow, I use Playwright MCP and Chrome DevTools MCP to close this loop directly inside the AI session:

Screenshot shows a broken layout? AI fixes the CSS.
Console error from a missing prop? AI adds the prop.
Lighthouse audit flags an accessibility issue? AI adds the missing aria label.
Network tab shows a redundant API call? AI refactors the data fetching.

This turns AI from "generate and hope" into "generate, verify, iterate." The engineer who sets up this loop gets better results than one who just prompts and ships.

The skill isn't writing the code. It's knowing what to verify and how to feed that information back.

Review AI Output Like You'd Review a Junior's PR

AI-generated code compiles. It passes the tests you wrote. It looks reasonable. But that doesn't mean it's good.

Read the diff. Every line. Look for:

Unnecessary complexity. AI loves abstractions. Does this need a factory pattern, or would a plain function do?
Subtle bugs. Off-by-one errors, missing null checks, race conditions. AI generates plausible code, not provably correct code.
Deviations from patterns. Your codebase uses a specific error handling pattern. AI might invent a different one.
Security holes. Unsanitized input, exposed secrets, missing auth checks. AI doesn't think adversarially by default.

The skill of reading code critically matters more when someone (or something) else writes it. You can't review what you don't understand. Invest in reading code as much as writing it.

Comment the Why, Not the How

AI tends to over-comment. It explains the obvious:

// Loop through the array and filter items
const filtered = items.filter(item => item.active);

// Set the state with filtered items
setItems(filtered);

These comments add noise. The code already says HOW it works.

Good comments explain why something exists or what business logic it represents:

// Archived items are processed by a nightly batch job, not shown in the UI
const filtered = items.filter(item => item.active);

But before writing any comment, ask: can the code explain itself? A well-named function or variable often eliminates the need for a comment entirely. Comments should exist only when the code can't tell the full story on its own.

That's the difference between AI-generated noise and engineering judgment.

Small Chunks for the Next Engineer

AI generates 500 lines in seconds. Your job: break it into reviewable, understandable pieces.

Extract functions. Apply single responsibility. Name things clearly. The next engineer (or the next AI session) touching this code benefits from clean structure.

This is a human judgment call. AI optimizes for the current prompt. You optimize for the project's lifetime. A function that does one thing well is easier to test, easier to reuse, and easier to replace than a monolith that "works."

Small PRs > big PRs, even when AI writes them.

Takeaways

You own the code. AI is a tool, not an excuse. Your name is on the PR.
TDD guides AI. Write failing tests first, let AI implement, then refactor.
E2E tests catch what unit tests miss. Define the critical flows and edge cases.
Enforce standards with linting and hooks. Guardrails before code ships.
Close the loop. Feed screenshots, errors, and audits back to AI for better iterations.
Review like it's a junior's PR. Read every line. Question every abstraction.
Comment the why, not the how. Business context over implementation details.
Small chunks > big dumps. Break AI output into pieces the next engineer can follow.

Wrapping Up

AI didn't make engineering skills optional. It made them the differentiator.

AI is a tool. A powerful one. But tools don't ship software, engineers do. Study the fundamentals. Master testing. Understand your architecture. Define your standards. Then use AI to execute at a speed you never could alone.

The code is AI's job. The quality is yours.

Top comments (24)

Mark • Mar 25

Does AI take pride from the learning, problem solving, and iterations it can take to land on a solid solution? Call me old fashioned, but I do enjoy coding for the artistic and aha moments when the next piece of the puzzle is laid. I get that we can speed up coding with AI, it has a use, but humans do need to exercise their brain and not just be fed by the man behind the curtain.

Max • Mar 27

This matches our experience exactly. We run PHPStan (level 9), PHPMD, and Rector in CI — and the AI's code goes through the same pipeline as everyone else's. No special treatment, no "AI-generated" label that lowers the bar.

The insight we landed on: static analysis is the AI's self-awareness. The agent can't tell when its own quality is dropping — context fills up, confidence stays high, output gets worse. But PHPStan doesn't care who wrote the code. It catches the type mismatch the agent introduced at 2 AM the same way it catches a human's Friday afternoon mistake.

One addition to your point about hooks: we inject context rules per file path and per tool call, so the agent loads coding conventions only when it's actually editing PHP — not burning tokens on rules it doesn't need. Keeps the context budget for the work that matters.

Helder Burato Berto • Mar 27

The insight we landed on: static analysis is the AI's self-awareness
This is so true!

Kalpaka • Mar 25

The "comment the why" principle points at something the article doesn't fully follow through on. When a junior submits a PR, you can ask "why this approach over that one?" and the answer tells you whether they understood the problem or got lucky. With AI output, the decision path is gone. You see the solution but not what was considered and rejected.

TDD constrains the output. Code review validates the result. But the reasoning that produced both is invisible. That's a quality layer that tests can't cover and reviews can only guess at.

Pixeliro • Mar 25

I agree AI is a multiplier, but I think there’s a new problem emerging:

If AI starts outperforming engineers in certain parts (like implementation speed or edge case coverage), there can be a gap between the quality of the output and the reviewer’s ability to fully understand it.

At that point, engineers are responsible for code they don’t completely grasp, which makes quality assurance much harder.

It feels like the role of engineering may shift — from writing code to designing constraints (tests, specs, guardrails) that control AI output.

Not about being better than AI, but about controlling the system it operates in.

TMD • Mar 25 • Edited

LLMs have made adding test harnesses so incredibly easy, it would be unwise not to use them, especially given how they can re-write business logic.

I've been super critical and "human in the loop" with AI, and it's been working well so far.

Nova Elvaris • Mar 31

The "AI is a multiplier" framing is spot on, and I think it extends further than most people realize. I've found that the developers who get the most out of AI coding tools aren't the ones who prompt best — they're the ones who already had strong review habits. They catch the subtle issues (cached error responses, missing edge cases, wrong assumptions about data formats) because they were already looking for those things in human-written code.

One concrete practice that's worked well for me: treating AI output the same way you'd treat a junior developer's PR. You wouldn't merge it without reviewing. You wouldn't skip the tests. And you definitely wouldn't let "it looks clean" substitute for "I understand what it does." The moment you start skimming AI output because it looks competent is exactly when the production bugs start sneaking through.

Harsh • Mar 25

The AI is a multiplier framing is the most honest way I've seen this described. It doesn't sugarcoat or fear-monger it just puts the responsibility exactly where it belongs: with the engineer.

Your point about carrying team context is what resonates most for me. That institutional memory we tried X and it broke under load is invisible to AI no matter how good your prompt is. It lives in the engineer's head, in old Slack threads, in postmortems nobody archived properly. You can't inject judgment you haven't earned yet.

One thing I'd add to the review section: I've noticed that AI-generated code is often too clean and that cleanliness creates a false sense of safety. Messy handwritten code with a `// TODO: fix this makes me slow down and ask questions. Polished AI code triggers a subconscious "this looks fine" response even when the logic is subtly wrong. The halo effect is real and I think it's underappreciated in most AI + code review discussions.

Great read this is the kind of grounded take the dev community needs more of.

Helder Burato Berto • Mar 25

I like your take "I've noticed that AI-generated code is often too clean and that cleanliness creates a false sense of safety". Thanks for commenting!

Daniel Balcarek • Mar 24

Nice post! Just a small addition: I try to avoid comments in production apps about 99% of the time. In my experience, writing comments to explain the code itself is often risky, when the code changes, authors rarely update the comments, making them obsolete and confusing. With AI, it’s often better to focus on making the code self-explanatory, since developers coming back after a few weeks can easily get lost otherwise.

Helder Burato Berto • Mar 24

I agree this is the best solution in the majority of situations

Sushil Kulkarni • Mar 25

Great points — but I'd push it further: we don't just own the quality, we own everything around the code too.
Planning, design, security — those are ours. AI can assist, but the direction and judgment? That's on us.
And there's stuff specific to working with AI that people overlook:
→ Token optimization — bloated context = worse output
→ Accurate framing — if you describe the problem wrong, AI solves the wrong thing confidently
→ Fallback plans — when AI hallucinates or misses a business rule, you need a recovery, not just a merge
AI writes the code. We own the whole road — not just the last mile.

Mykola Kondratiuk • Mar 31

The ownership point is underrated. Shipping AI-written code without understanding it is like signing a contract you haven't read - the liability is still yours when something breaks. I've started treating AI output the way I treat code review: I'm responsible for everything I let through.

View full discussion (24 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.