Convoy
Convoy simplifies batch processing for AI inference. Send individual requests and Convoy automatically groups them into batches.
How It Works
- Submit requests via the
/cargo/loadendpoint - Convoy batches them automatically (100 requests or 1 hour, whichever comes first)
- Provider processes the batch (AWS Bedrock or Anthropic)
- Results delivered to your callback URL
Key Features
- Automatic batching - No manual batch management needed
- Multiple providers - AWS Bedrock and Anthropic support
- Reliable delivery - Callbacks with exponential backoff retry
- Status tracking - Monitor your requests through the lifecycle
Quick Example
curl -X POST http://localhost:8000/cargo/load \
-H "Content-Type: application/json" \
-d '{
"params": {
"model": "claude-sonnet-4-5",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}]
},
"callback_url": "https://your-server.com/callback"
}'Response:
{
"cargo_id": "crg_abc123",
"status": "success",
"message": "Cargo loaded successfully"
}Next Steps
- Getting Started - Install and run Convoy
- API Reference - Full API documentation
- Concepts - Understand how Convoy works
Last updated on