Performance Optimization
Optimize workflow execution, reduce API costs, and improve throughput.
Cost Optimization
Model Selection
Choose the right model for each task:
| Task Type | Recommended Model | Rationale |
|---|---|---|
| Simple formatting | GPT-3.5-Turbo | Fast, cheap, sufficient quality |
| Content generation | GPT-4-Turbo | Better quality, reasonable cost |
| Complex reasoning | GPT-4 | Highest quality for critical content |
Prompt Efficiency
- Keep prompts concise - tokens cost money
- Don't repeat information already in system prompt
- Use variables instead of hardcoded long text
- Set appropriate max_tokens to limit response size
Parallel Processing
Process independent tasks in parallel to reduce total workflow time.
Python
# Configure stage for parallel task processing
stage.write({
'parallel_processing': True,
'max_concurrent_tasks': 5, # Process up to 5 tasks simultaneously
'batch_size': 10, # Group tasks in batches
})
Caching
Cache identical requests to avoid redundant API calls.
Python
# Enable caching on provider
provider.write({
'enable_cache': True,
'cache_ttl': 3600, # Cache for 1 hour
})
# Cache key is computed from:
# - Template ID
# - All template variables
# - Model name