Prompts

This page covers what actually goes inside a Convoy request — how to structure the messages array, when to use a system prompt, and what proven prompt patterns look like end-to-end.

Anatomy of a Request

Every request to POST /cargo/load has the same shape:


{
  "params": {
    "model": "claude-3-haiku",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Your prompt here" }
    ]
  },
  "callback_url": "https://your-server.com/callback"
}

Field	What it does
`params.model`	Which model Convoy should route to. See Supported Models.
`params.messages`	The conversation. This is where the actual prompt lives.
`params.system`	Optional. Persistent instructions that shape how the model responds.
`params.max_tokens`	Maximum output length.
`params.temperature`	How “creative” the response is (0.0–1.0).
`callback_url`	Where Convoy delivers the result when processing finishes.

Convoy is batch-first. Requests are queued and processed in cost-efficient batches. Results are delivered to your callback_url minutes to hours after submission (24-hour SLA), not synchronously.

The `messages` Array

messages is a list of turns in a conversation. Each turn has a role and content.

Single-turn (most common)

The simplest form — just ask the model something.


{
  "params": {
    "model": "claude-3-haiku",
    "max_tokens": 500,
    "messages": [
      { "role": "user", "content": "Write a one-sentence summary of photosynthesis." }
    ]
  },
  "callback_url": "https://example.com/callback"
}

Multi-turn (provide conversation history)

You can include prior assistant responses to give the model context for a follow-up. This is useful when you’ve already had an exchange and want the model to continue.


{
  "messages": [
    { "role": "user", "content": "List three good names for a coffee shop." },
    { "role": "assistant", "content": "1. Brew & Bean\n2. The Daily Grind\n3. Cup of Joe" },
    { "role": "user", "content": "Now give me three for a tea shop in the same style." }
  ]
}

Convoy doesn’t store conversation state for you. If you want multi-turn context, you must include the prior turns in each request.

Roles

Role	Used for
`user`	What the human is asking or providing as input
`assistant`	A prior response from the model (when supplying conversation history)

The system role is supplied as a top-level system field rather than inside messages.

System Prompts

The system field sets persistent instructions for how the model should behave — its role, tone, output format, or constraints. It applies to the entire request and is the most reliable way to steer model behavior.


{
  "params": {
    "model": "claude-3-haiku",
    "max_tokens": 800,
    "system": "You are a senior technical writer. Respond in clear, concise British English. Never use marketing jargon.",
    "messages": [
      { "role": "user", "content": "Explain what a webhook is for a non-technical reader." }
    ]
  }
}

Good uses for system:

Defining a persona (“You are a customer support agent for an e-commerce store…”)
Enforcing output format (“Always respond with valid JSON matching this schema…”)
Setting tone and style (“Use a friendly, casual tone. Avoid technical jargon.”)
Establishing rules (“Never make up facts. If you don’t know, say so.”)

Prompt Patterns

These are battle-tested patterns for the kinds of jobs Convoy is good at. Each example is a complete, copy-paste-ready payload.

Classifier

Categorize an input into one of a fixed set of labels. Keep temperature low for consistent output.


{
  "params": {
    "model": "claude-3-haiku",
    "max_tokens": 50,
    "temperature": 0.0,
    "system": "You are a support ticket classifier. Reply with exactly one of these labels and nothing else: BILLING, BUG, FEATURE_REQUEST, ACCOUNT, OTHER.",
    "messages": [
      { "role": "user", "content": "I was charged twice for my subscription this month, can you refund the duplicate?" }
    ]
  },
  "callback_url": "https://example.com/callback"
}

Extractor (structured JSON output)

Pull structured fields out of unstructured text. Always tell the model the exact schema you expect.


{
  "params": {
    "model": "claude-3-haiku",
    "max_tokens": 400,
    "temperature": 0.0,
    "system": "Extract contact details from the user message. Respond with valid JSON matching this schema and nothing else:\n{\n  \"name\": string | null,\n  \"email\": string | null,\n  \"phone\": string | null,\n  \"company\": string | null\n}",
    "messages": [
      { "role": "user", "content": "Hey, this is Jordan Reyes from Northwind Logistics. You can reach me at jordan@northwind.example or 555-204-1188." }
    ]
  },
  "callback_url": "https://example.com/callback"
}

Summarizer

Condense long input into a shorter form. A medium temperature keeps the language readable without inventing facts.


{
  "params": {
    "model": "claude-3-haiku",
    "max_tokens": 250,
    "temperature": 0.3,
    "system": "Summarize the user's input in 3 short bullet points. Use only information present in the source. Do not add commentary.",
    "messages": [
      { "role": "user", "content": "<paste the article or transcript here>" }
    ]
  },
  "callback_url": "https://example.com/callback"
}

Persona / Style

Generate content in a specific voice. Use a higher temperature for creative output.


{
  "params": {
    "model": "claude-3-sonnet",
    "max_tokens": 800,
    "temperature": 0.8,
    "system": "You are a marketing copywriter for a small-batch coffee roaster. Write in a warm, conversational tone. Keep sentences short. Avoid generic marketing clichés like 'unleash' or 'game-changer'.",
    "messages": [
      { "role": "user", "content": "Write an Instagram caption for our new Ethiopian Yirgacheffe single-origin." }
    ]
  },
  "callback_url": "https://example.com/callback"
}

Multi-turn follow-up

Reuse a prior exchange to refine or extend the answer.


{
  "params": {
    "model": "claude-3-haiku",
    "max_tokens": 600,
    "messages": [
      { "role": "user", "content": "Outline a blog post about reducing AWS costs." },
      { "role": "assistant", "content": "1. Right-size EC2 instances\n2. Use Savings Plans\n3. Delete unattached EBS volumes\n4. Move cold data to S3 Glacier\n5. Use spot for non-critical workloads" },
      { "role": "user", "content": "Expand point 2 into 3 paragraphs aimed at a startup CTO." }
    ]
  },
  "callback_url": "https://example.com/callback"
}

Tuning Parameters

Small tweaks to these parameters have a big effect on output quality.

Parameter	Range	When to use
`temperature`	`0.0` – `1.0`	`0.0`–`0.2` for classification, extraction, anything that should be deterministic. `0.7`–`0.9` for creative writing.
`max_tokens`	integer	Cap on output length. Set this realistically — too low truncates, too high wastes budget.
`top_p`	`0.0` – `1.0`	Nucleus sampling. Usually leave at default; tune `temperature` instead.
`top_k`	integer	Limits the model to the top-k most likely tokens. Rarely needed.
`stop_sequences`	array of strings	Stop generation when any of these strings appear. Useful for structured output (e.g., `["</response>"]`).

Rule of thumb: start with temperature: 0 for any task where you want a predictable answer. Only raise it for genuinely creative or open-ended work.

Tips

Be specific. “Summarize in 3 bullets, max 15 words each” beats “summarize this”.
Show, don’t just tell. If you want a specific output format, include an example in the system prompt.
Set temperature: 0 for anything machine-readable. JSON, labels, codes, IDs — keep them deterministic.
Cap max_tokens. It bounds cost and prevents runaway responses.
Test the cheapest model first. claude-3-haiku and similar small models often handle classification, extraction, and summarization at a fraction of the cost of larger models.

Next Steps

Supported Models — choose the right model for your prompt
Callbacks — how Convoy delivers the result back to you
Load Cargo API — full request/response reference