Testing with convoy-mock
Convoy ships a synthetic test model called convoy-mock that exercises the full submit → batch → callback pipeline without ever calling a real provider. Use it whenever you’re building, debugging, or demoing — it is free, returns in ~60 seconds, and is never billed.
TL;DR — Set "model": "convoy-mock" in any request and Convoy will generate a synthetic Claude-shaped response, deliver it to your callback_url, and skip every billing-related side effect. Switch to a production model (e.g. claude-3-haiku) once your integration is wired up correctly.
Why a test model exists
Real Convoy models batch based on per-model thresholds (for example, a 100-request threshold), so latency depends on each model’s configuration — that is by design and gives you the cheapest possible inference. But it is awful for testing: a single request would sit pending for up to an hour before you saw a callback.
convoy-mock is the carve-out. When the scheduler sees pending convoy-mock requests it forms a batch immediately (threshold of 1), runs the synthetic adapter, waits 60 seconds to simulate a (very fast) real batch, and then delivers the callback. The whole pipeline — token tracking, callback retries, status transitions — is exercised exactly as in production. Only two things are different:
- The model output is synthetic (no real provider is called)
- The org is not billed and quota is not consumed
When to use it
- ✅ First time you wire up a new integration (Zapier, Make, Lambda, Slack, Sheets, your own backend)
- ✅ End-to-end smoke test in CI
- ✅ Debugging callback delivery, JSON parsing, or DB writes downstream of Convoy
- ✅ Demos and screenshots
- ❌ Production traffic — the response text is canned, not generated
How fast is it really?
| Step | Latency |
|---|---|
POST /cargo/load accepted | < 200 ms |
Scheduler tick picks up pending convoy-mock | up to 30 s |
| Mock adapter “runs” the batch | 60 s (fixed) |
Callback fires to your callback_url | < 1 s |
| Total round-trip | 60 – 90 s |
Compare to a real model, where the same request waits up to 1 hour before the batch even forms.
Example request
curl -X POST https://api.cnvy.ai/cargo/load \
-H "Content-Type: application/json" \
-H "X-API-Key: convoy_sk_your_key_here" \
-d '{
"params": {
"model": "convoy-mock",
"max_tokens": 100,
"messages": [{ "role": "user", "content": "ping" }]
},
"callback_url": "https://your-server.com/callback"
}'The callback you receive ~60 seconds later will look exactly like a real Claude response — a response.content[0].text field, usage.input_tokens / output_tokens, and a model of convoy-mock. Parse it with the same code path you use for real models.
Switching to a real model
Once your integration is verified end-to-end, change one line in your request body:
- "model": "convoy-mock"
+ "model": "claude-3-haiku"That’s it — same request shape, same callback shape, same tracking endpoint. See Supported Models for the full list of production model IDs.
Limits and caveats
- Synthetic content. The mock returns canned text. Don’t grade it for quality, don’t show it to end users, don’t use it for prompt engineering.
- No quota / no margin.
convoy-mockdoes not consume your monthly token quota and does not contribute to billing or margin reports. - Token counts are simulated. Each mock response reports a fixed input/output token count (configurable per-deployment), not a real tokenization of your prompt.
- Single model per batch. Mock requests batch separately from real-model requests — mixing them in the same workflow run is fine.
Don’t ship convoy-mock to production. Always swap to a real model before turning your integration on for real users. The mock will happily accept requests forever, and you’ll just get canned responses with no useful AI output.