Prompts
This page covers what actually goes inside a Convoy request — how to structure the messages array, when to use a system prompt, and what proven prompt patterns look like end-to-end.
Anatomy of a Request
Every request to POST /cargo/load has the same shape:
{
"params": {
"model": "claude-3-haiku",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "Your prompt here" }
]
},
"callback_url": "https://your-server.com/callback"
}| Field | What it does |
|---|---|
params.model | Which model Convoy should route to. See Supported Models. |
params.messages | The conversation. This is where the actual prompt lives. |
params.system | Optional. Persistent instructions that shape how the model responds. |
params.max_tokens | Maximum output length. |
params.temperature | How “creative” the response is (0.0–1.0). |
callback_url | Where Convoy delivers the result when processing finishes. |
Convoy is batch-first. Requests are queued and processed in cost-efficient batches. Results are delivered to your callback_url minutes to hours after submission (24-hour SLA), not synchronously.
The messages Array
messages is a list of turns in a conversation. Each turn has a role and content.
Single-turn (most common)
The simplest form — just ask the model something.
{
"params": {
"model": "claude-3-haiku",
"max_tokens": 500,
"messages": [
{ "role": "user", "content": "Write a one-sentence summary of photosynthesis." }
]
},
"callback_url": "https://example.com/callback"
}Multi-turn (provide conversation history)
You can include prior assistant responses to give the model context for a follow-up. This is useful when you’ve already had an exchange and want the model to continue.
{
"messages": [
{ "role": "user", "content": "List three good names for a coffee shop." },
{ "role": "assistant", "content": "1. Brew & Bean\n2. The Daily Grind\n3. Cup of Joe" },
{ "role": "user", "content": "Now give me three for a tea shop in the same style." }
]
}Convoy doesn’t store conversation state for you. If you want multi-turn context, you must include the prior turns in each request.
Roles
| Role | Used for |
|---|---|
user | What the human is asking or providing as input |
assistant | A prior response from the model (when supplying conversation history) |
The system role is supplied as a top-level system field rather than inside messages.
System Prompts
The system field sets persistent instructions for how the model should behave — its role, tone, output format, or constraints. It applies to the entire request and is the most reliable way to steer model behavior.
{
"params": {
"model": "claude-3-haiku",
"max_tokens": 800,
"system": "You are a senior technical writer. Respond in clear, concise British English. Never use marketing jargon.",
"messages": [
{ "role": "user", "content": "Explain what a webhook is for a non-technical reader." }
]
}
}Good uses for system:
- Defining a persona (“You are a customer support agent for an e-commerce store…”)
- Enforcing output format (“Always respond with valid JSON matching this schema…”)
- Setting tone and style (“Use a friendly, casual tone. Avoid technical jargon.”)
- Establishing rules (“Never make up facts. If you don’t know, say so.”)
Prompt Patterns
These are battle-tested patterns for the kinds of jobs Convoy is good at. Each example is a complete, copy-paste-ready payload.
Classifier
Categorize an input into one of a fixed set of labels. Keep temperature low for consistent output.
{
"params": {
"model": "claude-3-haiku",
"max_tokens": 50,
"temperature": 0.0,
"system": "You are a support ticket classifier. Reply with exactly one of these labels and nothing else: BILLING, BUG, FEATURE_REQUEST, ACCOUNT, OTHER.",
"messages": [
{ "role": "user", "content": "I was charged twice for my subscription this month, can you refund the duplicate?" }
]
},
"callback_url": "https://example.com/callback"
}Extractor (structured JSON output)
Pull structured fields out of unstructured text. Always tell the model the exact schema you expect.
{
"params": {
"model": "claude-3-haiku",
"max_tokens": 400,
"temperature": 0.0,
"system": "Extract contact details from the user message. Respond with valid JSON matching this schema and nothing else:\n{\n \"name\": string | null,\n \"email\": string | null,\n \"phone\": string | null,\n \"company\": string | null\n}",
"messages": [
{ "role": "user", "content": "Hey, this is Jordan Reyes from Northwind Logistics. You can reach me at jordan@northwind.example or 555-204-1188." }
]
},
"callback_url": "https://example.com/callback"
}Summarizer
Condense long input into a shorter form. A medium temperature keeps the language readable without inventing facts.
{
"params": {
"model": "claude-3-haiku",
"max_tokens": 250,
"temperature": 0.3,
"system": "Summarize the user's input in 3 short bullet points. Use only information present in the source. Do not add commentary.",
"messages": [
{ "role": "user", "content": "<paste the article or transcript here>" }
]
},
"callback_url": "https://example.com/callback"
}Persona / Style
Generate content in a specific voice. Use a higher temperature for creative output.
{
"params": {
"model": "claude-3-sonnet",
"max_tokens": 800,
"temperature": 0.8,
"system": "You are a marketing copywriter for a small-batch coffee roaster. Write in a warm, conversational tone. Keep sentences short. Avoid generic marketing clichés like 'unleash' or 'game-changer'.",
"messages": [
{ "role": "user", "content": "Write an Instagram caption for our new Ethiopian Yirgacheffe single-origin." }
]
},
"callback_url": "https://example.com/callback"
}Multi-turn follow-up
Reuse a prior exchange to refine or extend the answer.
{
"params": {
"model": "claude-3-haiku",
"max_tokens": 600,
"messages": [
{ "role": "user", "content": "Outline a blog post about reducing AWS costs." },
{ "role": "assistant", "content": "1. Right-size EC2 instances\n2. Use Savings Plans\n3. Delete unattached EBS volumes\n4. Move cold data to S3 Glacier\n5. Use spot for non-critical workloads" },
{ "role": "user", "content": "Expand point 2 into 3 paragraphs aimed at a startup CTO." }
]
},
"callback_url": "https://example.com/callback"
}Tuning Parameters
Small tweaks to these parameters have a big effect on output quality.
| Parameter | Range | When to use |
|---|---|---|
temperature | 0.0 – 1.0 | 0.0–0.2 for classification, extraction, anything that should be deterministic. 0.7–0.9 for creative writing. |
max_tokens | integer | Cap on output length. Set this realistically — too low truncates, too high wastes budget. |
top_p | 0.0 – 1.0 | Nucleus sampling. Usually leave at default; tune temperature instead. |
top_k | integer | Limits the model to the top-k most likely tokens. Rarely needed. |
stop_sequences | array of strings | Stop generation when any of these strings appear. Useful for structured output (e.g., ["</response>"]). |
Rule of thumb: start with temperature: 0 for any task where you want a predictable answer. Only raise it for genuinely creative or open-ended work.
Tips
- Be specific. “Summarize in 3 bullets, max 15 words each” beats “summarize this”.
- Show, don’t just tell. If you want a specific output format, include an example in the system prompt.
- Set
temperature: 0for anything machine-readable. JSON, labels, codes, IDs — keep them deterministic. - Cap
max_tokens. It bounds cost and prevents runaway responses. - Test the cheapest model first.
claude-3-haikuand similar small models often handle classification, extraction, and summarization at a fraction of the cost of larger models.
Next Steps
- Supported Models — choose the right model for your prompt
- Callbacks — how Convoy delivers the result back to you
- Load Cargo API — full request/response reference