From Vibe Coding to Production: Our AI Development Playbook

We work in AI to transform industries like security, which makes it fitting, and a little ironic, that software development itself was one of the first disciplines on our team to be reshaped by AI. We’ve long used AI for everyday work, including document review and generation, email drafting, spreadsheet analysis, and other repeatable tasks. But the real challenge and payoff was taking vibe coding from rapid prototyping to enterprise-grade reliability and security.

This is the story of how we rebuilt our development process to make that possible.

The early lesson: ambiguity doesn’t scale

Our first experiments were wide-ranging: various VS Code plugins, Windsurf, and other tools that have come and gone. What we discovered quickly is that vague guidance produces plausible code that’s often wrong. For prototypes, that’s acceptable. For production systems with strict security and reliability requirements, it creates rework and risk.

The mindset shift was clear: keep the creativity, but move precision up front. We still “vibe,” but now we do it in the design and decision stages rather than while the AI is generating code.

The tools that stuck

Today our stack centers on three tools:

Cursor, our daily driver IDE. Its agent mode is powerful, and its visual diffs make large changes easy to understand.
Claude Code, our terminal-centric workhorse for implementing well-specified features across multiple files.
Devin, connected to Slack with access to our monorepo, which we use for small, autonomous tasks and for quick codebase Q&A. Devin produces PRs that we review like any other change.

Other tools pop up and get tested, but these three carry the bulk of our production work.

The Workflow We Follow at Kindo

We start with discovery and decision-making, using a top model (Claude Opus, GPT-5, Gemini 2.5) with research enabled. We describe product goals, the user experience we want, and constraints, then let the AI surface industry best practices, risks, and alternative approaches. This phase is iterative, with deliberate choices made about architecture, contracts, and security.

The output becomes a Markdown spec: a concise, reviewable document that captures objectives, decisions, and non-functional requirements. From there, we move into a repo-anchored implementation plan. Using Claude Code, we map the spec to our actual codebase, including what files to create or modify, which models and migrations to add, what test cases to write, and what rollout steps to follow. Reviewing this plan up front makes it easy to catch incorrect assumptions while they’re still cheap to fix.

Finally comes incremental implementation. With the spec and plan as inputs, Claude Code writes code in manageable steps, adds the tests described in the plan, runs them, and validates changes against the broader system to avoid regressions. It’s not that “tests come first” in every case; it’s that tests are always part of the deliverable, and they’re validated continuously in our flow.

Guardrails at Generation and Review

One of the most important refinements in our process is how we layer checks. CursorRules and Claude.md files are two sides of the same coin: they encode our standards, including style, security practices, error handling, and logging expectations, so that the AI writes code in the shape we want, with fewer gaps to fix later.

On top of that, we run AI reviewers with their own tailored prompts. They provide a second layer of protection regardless of whether the code was generated by Cursor, Claude Code, or Devin. The reviewers scan for issues like auth flow errors, input validation, concurrency pitfalls, performance traps, and other risks we’ve learned to watch for.

Together, this gives us a two-step double check: first during code generation (via CursorRules or Claude.md), and then again at review time with AI reviewers. Humans still read every PR, but our focus is on the core logic, architectural soundness, and whether tests cover what matters; not on boilerplate or naming conventions.

How We Go to Production

Every pull request triggers our CI pipeline, which runs linting, unit and integration tests, and other required checks. Merges only happen when everything is green. Once merged, the pipeline deploys automatically. This keeps quality attached directly to the flow of work, and it means nobody has to wonder whether something “slipped through.”

Why it matters

This system lets us ship faster, not because we cut corners, but because we moved decisions earlier and encoded our standards into the AI itself. That reduces errors, shrinks review time, and makes code easier to reason about. Specs and plans now explain not just what was built, but why, which makes the system more teachable for new contributors.

We’re always refining this process. We're adding new patterns into CursorRules and Claude.md, improving AI reviewer prompts, and tightening the chain from spec → plan → implementation → tests. But we’ve already made substantial progress, and we intend to stay on the bleeding edge without ever compromising the security or reliability of the software we deliver.
‍
By the way, we’re GA'ing a new release next week: Chat Actions. Chat Actions is Kindo’s new execution model for technical operations. It turns a conversation into a plan, tool calls, and a verified result. Agents handle repeatable jobs while Chat Actions drives the complex and cross system investigations that cannot be scripted. Sign up for the live webinar here.