Every escalation or refusal is a signal. The Improve tab turns those signals into action: Convoship reads the conversations your agent couldn't handle, groups them into recurring themes, and drafts concrete improvements — a new capability, a clearer instruction, a fresh test. You review each suggestion and apply it with one click. The agent never changes itself.

How it works

Convoship looks at recent conversations that ended in an escalation or refusal.
Each one gets a root cause: was a capability missing, did a tool fail, was knowledge lacking, were instructions unclear, did a guardrail refuse — or was escalating actually the right call?
Similar failures are grouped into themes, so thirty versions of “how do I cancel?” show up as one pattern, not thirty rows.
For each theme, Convoship drafts a proposal — the smallest change that would fix the pattern — backed by the real conversations behind it.
You review: apply the proposal to the agent's draft configuration, or dismiss it. Dismissed suggestions are never re-filed.

You stay in control

Proposals only ever touch the draft configuration. Going live still requires publishing, which runs your eval gate — and publish approval, if your workspace requires sign-off. Capabilities flagged as sensitive are never edited by a suggestion.

What the agent can suggest

Suggestion	When it appears
New capability	Users keep asking for something none of the agent's capabilities covers.
Capability edit	A capability exists but its example or required details miss how users actually phrase the request.
Mission clarification	The agent behaves inconsistently because its instructions are ambiguous.
New guardrail	A recurring failure should be handled by an explicit rule instead of judgement calls.
Regression test	A real failed conversation becomes a permanent eval, so the same mistake can't quietly come back.

Failures become tests

Applying a regression-test proposal adds the failed conversation to the agent's evals. Over time the eval suite grows from real traffic, so every publish is checked against the exact situations that used to break the agent.

Run it on demand or nightly

Open the agent's Improve tab and choose Run analysis to analyze the last two weeks immediately. Or switch on Nightly suggestions: Convoship analyzes new failures every night and queues proposals for your team to review in the morning. The toggle is per agent, and it's off until you enable it.

See whether changes helped

Every conversation records which published version served it. The Impact by version table shows conversations, resolutions, and the failure rate per version side by side — so after you apply a proposal and publish, you can see in numbers whether the new version actually reduced escalations. If it didn't, roll back from the agent's version history.

Escalations that should stay

Not every hand-off is a failure. When escalating was the right outcome — a sensitive request, or a hand-off your design calls for — the analysis labels it “correct escalation” and never proposes a fix for it. The goal is an agent that resolves more of what it should resolve, not one that stops asking for help.