The model and the prompt and the dashboard get most of the attention. What decides whether the system is still running after we leave is the operating handoff: six documents a supervisor can read in an afternoon and run without us.
What is included
Every handoff carries a workflow map, an owner list, an admin guide, the exception-handling rules, a testing checklist, and the exact metric that tells you whether the system is working. Six documents. We hand them over in plain language a supervisor can read in an afternoon.
We also write down what the system should not do. That boundary earns its place: an AI tool left without one expands until nobody can say who owns the decision, and that is the failure we watch for most.
How we test it
Before we leave, the internal owner runs the whole workflow with us sitting on our hands. We watch for missing context and unclear failure states. We watch for the spot where a human is still quietly doing the old manual step nobody mentioned.
Say the owner hits a bad input, or the system spits out a bad output, or an integration drops. If they cannot recover on their own, we are not done and the build stays open.
What remains
After the build, maintenance is supposed to be boring. The owner reviews the metric, checks the exceptions, and updates the source material when something changes. We keep that workflow owner accountable for it by name.
Skip that loop and the system becomes another experiment that runs for two months, then sits in a tab nobody opens.
