Skip to Content
ConceptsPrompts

Prompts

This page covers what actually goes inside a Convoy request — how to structure the messages array, when to use a system prompt, and what proven prompt patterns look like end-to-end.

Anatomy of a Request

Every request to POST /cargo/load has the same shape:

{ "params": { "model": "claude-3-haiku", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Your prompt here" } ] }, "callback_url": "https://your-server.com/callback" }
FieldWhat it does
params.modelWhich model Convoy should route to. See Supported Models.
params.messagesThe conversation. This is where the actual prompt lives.
params.systemOptional. Persistent instructions that shape how the model responds.
params.max_tokensMaximum output length.
params.temperatureHow “creative” the response is (0.0–1.0).
callback_urlWhere Convoy delivers the result when processing finishes.

Convoy is batch-first. Requests are queued and processed in cost-efficient batches. Results are delivered to your callback_url minutes to hours after submission (24-hour SLA), not synchronously.


The messages Array

messages is a list of turns in a conversation. Each turn has a role and content.

Single-turn (most common)

The simplest form — just ask the model something.

{ "params": { "model": "claude-3-haiku", "max_tokens": 500, "messages": [ { "role": "user", "content": "Write a one-sentence summary of photosynthesis." } ] }, "callback_url": "https://example.com/callback" }

Multi-turn (provide conversation history)

You can include prior assistant responses to give the model context for a follow-up. This is useful when you’ve already had an exchange and want the model to continue.

{ "messages": [ { "role": "user", "content": "List three good names for a coffee shop." }, { "role": "assistant", "content": "1. Brew & Bean\n2. The Daily Grind\n3. Cup of Joe" }, { "role": "user", "content": "Now give me three for a tea shop in the same style." } ] }

Convoy doesn’t store conversation state for you. If you want multi-turn context, you must include the prior turns in each request.

Roles

RoleUsed for
userWhat the human is asking or providing as input
assistantA prior response from the model (when supplying conversation history)

The system role is supplied as a top-level system field rather than inside messages.


System Prompts

The system field sets persistent instructions for how the model should behave — its role, tone, output format, or constraints. It applies to the entire request and is the most reliable way to steer model behavior.

{ "params": { "model": "claude-3-haiku", "max_tokens": 800, "system": "You are a senior technical writer. Respond in clear, concise British English. Never use marketing jargon.", "messages": [ { "role": "user", "content": "Explain what a webhook is for a non-technical reader." } ] } }

Good uses for system:

  • Defining a persona (“You are a customer support agent for an e-commerce store…”)
  • Enforcing output format (“Always respond with valid JSON matching this schema…”)
  • Setting tone and style (“Use a friendly, casual tone. Avoid technical jargon.”)
  • Establishing rules (“Never make up facts. If you don’t know, say so.”)

Prompt Patterns

These are battle-tested patterns for the kinds of jobs Convoy is good at. Each example is a complete, copy-paste-ready payload.

Classifier

Categorize an input into one of a fixed set of labels. Keep temperature low for consistent output.

{ "params": { "model": "claude-3-haiku", "max_tokens": 50, "temperature": 0.0, "system": "You are a support ticket classifier. Reply with exactly one of these labels and nothing else: BILLING, BUG, FEATURE_REQUEST, ACCOUNT, OTHER.", "messages": [ { "role": "user", "content": "I was charged twice for my subscription this month, can you refund the duplicate?" } ] }, "callback_url": "https://example.com/callback" }

Extractor (structured JSON output)

Pull structured fields out of unstructured text. Always tell the model the exact schema you expect.

{ "params": { "model": "claude-3-haiku", "max_tokens": 400, "temperature": 0.0, "system": "Extract contact details from the user message. Respond with valid JSON matching this schema and nothing else:\n{\n \"name\": string | null,\n \"email\": string | null,\n \"phone\": string | null,\n \"company\": string | null\n}", "messages": [ { "role": "user", "content": "Hey, this is Jordan Reyes from Northwind Logistics. You can reach me at jordan@northwind.example or 555-204-1188." } ] }, "callback_url": "https://example.com/callback" }

Summarizer

Condense long input into a shorter form. A medium temperature keeps the language readable without inventing facts.

{ "params": { "model": "claude-3-haiku", "max_tokens": 250, "temperature": 0.3, "system": "Summarize the user's input in 3 short bullet points. Use only information present in the source. Do not add commentary.", "messages": [ { "role": "user", "content": "<paste the article or transcript here>" } ] }, "callback_url": "https://example.com/callback" }

Persona / Style

Generate content in a specific voice. Use a higher temperature for creative output.

{ "params": { "model": "claude-3-sonnet", "max_tokens": 800, "temperature": 0.8, "system": "You are a marketing copywriter for a small-batch coffee roaster. Write in a warm, conversational tone. Keep sentences short. Avoid generic marketing clichés like 'unleash' or 'game-changer'.", "messages": [ { "role": "user", "content": "Write an Instagram caption for our new Ethiopian Yirgacheffe single-origin." } ] }, "callback_url": "https://example.com/callback" }

Multi-turn follow-up

Reuse a prior exchange to refine or extend the answer.

{ "params": { "model": "claude-3-haiku", "max_tokens": 600, "messages": [ { "role": "user", "content": "Outline a blog post about reducing AWS costs." }, { "role": "assistant", "content": "1. Right-size EC2 instances\n2. Use Savings Plans\n3. Delete unattached EBS volumes\n4. Move cold data to S3 Glacier\n5. Use spot for non-critical workloads" }, { "role": "user", "content": "Expand point 2 into 3 paragraphs aimed at a startup CTO." } ] }, "callback_url": "https://example.com/callback" }

Tuning Parameters

Small tweaks to these parameters have a big effect on output quality.

ParameterRangeWhen to use
temperature0.01.00.00.2 for classification, extraction, anything that should be deterministic. 0.70.9 for creative writing.
max_tokensintegerCap on output length. Set this realistically — too low truncates, too high wastes budget.
top_p0.01.0Nucleus sampling. Usually leave at default; tune temperature instead.
top_kintegerLimits the model to the top-k most likely tokens. Rarely needed.
stop_sequencesarray of stringsStop generation when any of these strings appear. Useful for structured output (e.g., ["</response>"]).

Rule of thumb: start with temperature: 0 for any task where you want a predictable answer. Only raise it for genuinely creative or open-ended work.


Tips

  • Be specific. “Summarize in 3 bullets, max 15 words each” beats “summarize this”.
  • Show, don’t just tell. If you want a specific output format, include an example in the system prompt.
  • Set temperature: 0 for anything machine-readable. JSON, labels, codes, IDs — keep them deterministic.
  • Cap max_tokens. It bounds cost and prevents runaway responses.
  • Test the cheapest model first. claude-3-haiku and similar small models often handle classification, extraction, and summarization at a fraction of the cost of larger models.

Next Steps

Last updated on