Performance Optimization
Optimize agent performance for production workloads
Performance Optimization
Optimize your SwiftClaw agents for maximum performance and efficiency.
Model Selection
Choose the right model for your workload:
Task Complexity
# Simple tasks: Use faster, cheaper models
classifier = Agent(model="llama-3")
# Complex reasoning: Use powerful models
analyst = Agent(model="gpt-4")
# Hybrid approach: Route based on complexity
agent = Agent(
model={
"simple": "llama-3",
"complex": "gpt-4"
}
)Response Time vs Quality
| Model | Response Time | Quality | Cost |
|---|---|---|---|
| llama-3 | ~500ms | Good | $ |
| gpt-3.5-turbo | ~1s | Better | $$ |
| claude-3-sonnet | ~2s | Great | $$$ |
| gpt-4 | ~3s | Best | $$$$ |
Memory Optimization
Short-Term Memory
Use for session-specific data:
{
"memory": {
"shortTerm": {
"ttl": "1h",
"maxSize": "10MB"
}
}
}Long-Term Memory
Optimize for frequently accessed data:
{
"memory": {
"longTerm": {
"ttl": "30d",
"maxSize": "100MB",
"cache": {
"enabled": true,
"strategy": "lru"
}
}
}
}Memory Search
Use hybrid search for best performance:
{
"memory": {
"search": {
"type": "hybrid",
"vectorWeight": 0.7,
"textWeight": 0.3,
"maxResults": 10
}
}
}Caching Strategies
Response Caching
Cache common responses:
@agent.cache(ttl="1h")
async def get_product_info(product_id: str):
return await fetch_product(product_id)Tool Result Caching
Cache expensive tool calls:
@agent.tool
@cache(ttl="30m")
async def search_database(query: str):
return await db.search(query)Parallel Processing
Concurrent Tool Calls
Execute tools in parallel:
async def process_request(message):
# Execute tools concurrently
results = await asyncio.gather(
agent.call_tool("search_docs", message),
agent.call_tool("search_database", message),
agent.call_tool("fetch_user_data", message)
)
return combine_results(results)Batch Processing
Process multiple requests together:
@agent.batch(max_size=10, max_wait="100ms")
async def process_messages(messages):
return await agent.generate_batch(messages)Request Optimization
Prompt Engineering
Optimize prompts for efficiency:
# Bad: Verbose prompt
prompt = """
Please analyze the following text and provide a detailed
summary including key points, sentiment analysis, and
recommendations for improvement...
"""
# Good: Concise prompt
prompt = "Summarize: key points, sentiment, recommendations"Token Management
Reduce token usage:
# Limit context window
agent = Agent(
model="gpt-4",
max_tokens=2000,
context_window=8000
)
# Truncate long inputs
def truncate_context(text, max_tokens=4000):
tokens = tokenize(text)
if len(tokens) > max_tokens:
return detokenize(tokens[:max_tokens])
return textStreaming Responses
Enable streaming for better UX:
@agent.on_message
async def handle_message(message):
async for chunk in agent.generate_stream(message):
yield chunkConnection Pooling
Optimize external connections:
# Database connection pool
db = Database(
pool_size=10,
max_overflow=20,
pool_timeout=30
)
# HTTP connection pool
http = HTTPClient(
pool_connections=10,
pool_maxsize=20
)Monitoring Performance
Key Metrics
Track these metrics:
- Response Time: P50, P95, P99 latency
- Throughput: Requests per second
- Error Rate: Failed requests percentage
- Token Usage: Tokens per request
- Memory Usage: RAM consumption
- CPU Usage: Processor utilization
Performance Dashboard
# View real-time metrics
swiftclaw metrics my-agent --live
# Generate performance report
swiftclaw report my-agent --period 7dLoad Testing
Test agent performance:
# Install load testing tool
npm install -g @swiftclaw/load-test
# Run load test
swiftclaw load-test my-agent \
--requests 1000 \
--concurrency 50 \
--duration 5mPerformance Benchmarks
Typical performance targets:
| Metric | Target | Excellent |
|---|---|---|
| Response Time (P95) | <2s | <1s |
| Throughput | >100 req/s | >500 req/s |
| Error Rate | <1% | <0.1% |
| Availability | >99.9% | >99.99% |
Continuous Optimization: Monitor metrics regularly and optimize based on actual usage patterns.
Next Steps
How is this guide ?
Last updated on