Testing with `convoy-mock`

Convoy ships a synthetic test model called convoy-mock that exercises the full submit → batch → callback pipeline without ever calling a real provider. Use it whenever you’re building, debugging, or demoing — it is free, returns in ~60 seconds, and is never billed.

TL;DR — Set "model": "convoy-mock" in any request and Convoy will generate a synthetic Claude-shaped response, deliver it to your callback_url, and skip every billing-related side effect. Switch to a production model (e.g. claude-3-haiku) once your integration is wired up correctly.

Why a test model exists

Real Convoy models batch based on per-model thresholds (for example, a 100-request threshold), so latency depends on each model’s configuration — that is by design and gives you the cheapest possible inference. But it is awful for testing: a single request would sit pending for up to an hour before you saw a callback.

convoy-mock is the carve-out. When the scheduler sees pending convoy-mock requests it forms a batch immediately (threshold of 1), runs the synthetic adapter, waits 60 seconds to simulate a (very fast) real batch, and then delivers the callback. The whole pipeline — token tracking, callback retries, status transitions — is exercised exactly as in production. Only two things are different:

The model output is synthetic (no real provider is called)
The org is not billed and quota is not consumed

When to use it

✅ First time you wire up a new integration (Zapier, Make, Lambda, Slack, Sheets, your own backend)
✅ End-to-end smoke test in CI
✅ Debugging callback delivery, JSON parsing, or DB writes downstream of Convoy
✅ Demos and screenshots
❌ Production traffic — the response text is canned, not generated

How fast is it really?

Step	Latency
`POST /cargo/load` accepted	< 200 ms
Scheduler tick picks up pending `convoy-mock`	up to 30 s
Mock adapter “runs” the batch	60 s (fixed)
Callback fires to your `callback_url`	< 1 s
Total round-trip	60 – 90 s

Compare to a real model, where the same request waits up to 1 hour before the batch even forms.

Example request


curl -X POST https://api.cnvy.ai/cargo/load \
  -H "Content-Type: application/json" \
  -H "X-API-Key: convoy_sk_your_key_here" \
  -d '{
    "params": {
      "model": "convoy-mock",
      "max_tokens": 100,
      "messages": [{ "role": "user", "content": "ping" }]
    },
    "callback_url": "https://your-server.com/callback"
  }'

The callback you receive ~60 seconds later will look exactly like a real Claude response — a response.content[0].text field, usage.input_tokens / output_tokens, and a model of convoy-mock. Parse it with the same code path you use for real models.

Switching to a real model

Once your integration is verified end-to-end, change one line in your request body:


-    "model": "convoy-mock"
+    "model": "claude-3-haiku"

That’s it — same request shape, same callback shape, same tracking endpoint. See Supported Models for the full list of production model IDs.

Limits and caveats

Synthetic content. The mock returns canned text. Don’t grade it for quality, don’t show it to end users, don’t use it for prompt engineering.
No quota / no margin. convoy-mock does not consume your monthly token quota and does not contribute to billing or margin reports.
Token counts are simulated. Each mock response reports a fixed input/output token count (configurable per-deployment), not a real tokenization of your prompt.
Single model per batch. Mock requests batch separately from real-model requests — mixing them in the same workflow run is fine.

Don’t ship convoy-mock to production. Always swap to a real model before turning your integration on for real users. The mock will happily accept requests forever, and you’ll just get canned responses with no useful AI output.

Testing with convoy-mock