Ir al contenido

Performance Optimization

Optimize workflow execution, reduce API costs, and improve throughput.

Cost Optimization

Model Selection

Choose the right model for each task:

Task Type Recommended Model Rationale
Simple formatting GPT-3.5-Turbo Fast, cheap, sufficient quality
Content generation GPT-4-Turbo Better quality, reasonable cost
Complex reasoning GPT-4 Highest quality for critical content

Prompt Efficiency

  • Keep prompts concise - tokens cost money
  • Don't repeat information already in system prompt
  • Use variables instead of hardcoded long text
  • Set appropriate max_tokens to limit response size

Parallel Processing

Process independent tasks in parallel to reduce total workflow time.

Python
# Configure stage for parallel task processing
stage.write({
    'parallel_processing': True,
    'max_concurrent_tasks': 5,  # Process up to 5 tasks simultaneously
    'batch_size': 10,           # Group tasks in batches
})

Caching

Cache identical requests to avoid redundant API calls.

Python
# Enable caching on provider
provider.write({
    'enable_cache': True,
    'cache_ttl': 3600,  # Cache for 1 hour
})

# Cache key is computed from:
# - Template ID
# - All template variables
# - Model name