Performance Optimization

Optimize workflow execution, reduce API costs, and improve throughput.

Cost Optimization

Model Selection

Choose the right model for each task:

Task Type	Recommended Model	Rationale
Simple formatting	GPT-3.5-Turbo	Fast, cheap, sufficient quality
Content generation	GPT-4-Turbo	Better quality, reasonable cost
Complex reasoning	GPT-4	Highest quality for critical content

Prompt Efficiency

Keep prompts concise - tokens cost money
Don't repeat information already in system prompt
Use variables instead of hardcoded long text
Set appropriate max_tokens to limit response size

Parallel Processing

Process independent tasks in parallel to reduce total workflow time.

Python

# Configure stage for parallel task processing
stage.write({
    'parallel_processing': True,
    'max_concurrent_tasks': 5,  # Process up to 5 tasks simultaneously
    'batch_size': 10,           # Group tasks in batches
})

Caching

Cache identical requests to avoid redundant API calls.

Python

# Enable caching on provider
provider.write({
    'enable_cache': True,
    'cache_ttl': 3600,  # Cache for 1 hour
})

# Cache key is computed from:
# - Template ID
# - All template variables
# - Model name