Skip to main content

Building Agent Products

Ship production-ready agent systems that users love and pay for

3-4 hours
3 min read
Updated January 15, 2026

Building Agent Products

You can build agents. Now let's ship products. This guide covers architecture, deployment, UX, and monetization for agent-powered applications.

Product Architecture
The full stack

Architecture Patterns

Choose based on your use case:

Synchronous (Simple)

Best for quick tasks (< 30 seconds):

Python
@app.post("/analyze")
async def analyze(request: AnalyzeRequest):
result = agent.run(request.query)
return {"result": result}

Pros: Simple, easy to debug Cons: User waits, timeout risk

Asynchronous (Scalable)

Best for longer tasks:

Python
@app.post("/analyze")
async def start_analysis(request: AnalyzeRequest):
job_id = create_job(request)
queue.enqueue(run_agent_job, job_id)
return {"job_id": job_id}
@app.get("/jobs/{job_id}")
async def get_job(job_id: str):
return get_job_status(job_id)

Pros: Handles long tasks, scalable Cons: More complex, needs job management

Streaming (Real-time)

Best for interactive experiences:

Python
@app.post("/chat/stream")
async def stream_chat(request: ChatRequest):
async def generate():
async for chunk in agent.stream(request.message):
yield f"data: {json.dumps(chunk)}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")

Pros: Great UX, immediate feedback Cons: Complex client handling

When to Use What

Real-World Product Examples

1. Code Review Bot

Product: Automated code review for GitHub PRs

Code Review Flow

Key Features:

  • Webhook receives PR events
  • Agent analyzes diff for bugs, security, style
  • Posts inline comments on GitHub
  • Approves or requests changes

2. Customer Support Agent

Product: AI-powered ticket handling

Support Flow

Key Features:

  • Multi-channel intake (email, chat, form)
  • Knowledge base search
  • CRM integration for context
  • Human escalation workflow

3. Research Assistant

Product: Automated research and reports

Key Features:

  • Multi-source research
  • Fact verification
  • Citation management
  • Multiple output formats

4. Data Analysis Platform

Product: Natural language data exploration

Key Features:

  • Natural language to SQL
  • Chart generation
  • Dashboard builder
  • Scheduled reports

User Experience Design

Progressive Disclosure

Don't overwhelm users with agent complexity:

Bash
Level 1: Simple Chat
└── User asks, agent responds
Level 2: Show Thinking
└── Display reasoning steps
Level 3: Tool Visibility
└── Show which tools are used
Level 4: Full Control
└── User guides decisions

Real-time Feedback

Keep users informed during long operations:

TypeScript
function AgentStatus({ jobId }) {
const [status, setStatus] = useState('starting');
const [steps, setSteps] = useState([]);
useEffect(() => {
const events = new EventSource(`/jobs/${jobId}/stream`);
events.onmessage = (e) => {
const data = JSON.parse(e.data);
setStatus(data.status);
setSteps(data.steps);
};
return () => events.close();
}, [jobId]);
return (
<div>
<StatusBadge status={status} />
<StepsList steps={steps} />
<ProgressBar progress={steps.length / totalSteps} />
</div>
);
}

Error States

Design for failures:

StateMessageActions
Timeout"Taking longer than expected"Wait, Cancel
Tool failure"Couldn't access X"Retry, Skip
Uncertain"Found conflicting info"Show both
Can't help"Outside my capabilities"Contact human

Trust Building

Help users trust agent outputs:

  1. Show sources: Link to where information came from
  2. Confidence indicators: Signal when uncertain
  3. Edit suggestions: Let users modify outputs
  4. Explain reasoning: Show why decisions were made

Deployment

Docker Setup

dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
YAML
# docker-compose.yml
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- REDIS_URL=redis://redis:6379
depends_on:
- redis
worker:
build: .
command: celery -A tasks worker --loglevel=info
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- REDIS_URL=redis://redis:6379
redis:
image: redis:7-alpine

Serverless

For variable workloads:

Python
# AWS Lambda handler
def handler(event, context):
body = json.loads(event['body'])
result = agent.run(body['query'])
return {
'statusCode': 200,
'body': json.dumps({'result': result})
}

Kubernetes

For scale:

YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-service
spec:
replicas: 3
template:
spec:
containers:
- name: agent
image: your-registry/agent:latest
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: api-keys
key: anthropic

Monitoring

Key Metrics

MetricDescriptionAlert If
Completion rate% tasks completed< 95%
Latency (p95)Time to response> 30s
Tool failure rate% tool calls failed> 5%
Token usageTokens per task> budget
User satisfactionRatings< 4/5

Structured Logging

Python
import structlog
logger = structlog.get_logger()
def run_with_logging(user_id: str, task: str):
log = logger.bind(user_id=user_id, task_id=str(uuid4()))
log.info("agent_start", task=task)
try:
start = time.time()
result = agent.run(task)
duration = time.time() - start
log.info("agent_done",
duration=duration,
tokens=result.tokens,
tools=result.tools)
return result
except Exception as e:
log.error("agent_failed", error=str(e))
raise

Alerting

YAML
# Prometheus alert rules
groups:
- name: agent-alerts
rules:
- alert: HighLatency
expr: histogram_quantile(0.95, agent_duration_seconds) > 30
for: 5m
labels:
severity: warning
- alert: HighToolFailures
expr: rate(tool_failures[5m]) / rate(tool_calls[5m]) > 0.05
for: 5m
labels:
severity: critical

Cost Management

Token Budgeting

Python
class BudgetedAgent:
def __init__(self, max_tokens: int = 100000):
self.max_tokens = max_tokens
self.tokens_used = 0
def run(self, task: str) -> str:
if self.tokens_used >= self.max_tokens:
raise BudgetExceededError()
response = self._call_api(task)
self.tokens_used += response.usage.total_tokens
return response

Caching

Python
import hashlib
class CachedAgent:
def __init__(self):
self.cache = {}
def run(self, task: str) -> str:
key = hashlib.md5(task.encode()).hexdigest()
if key in self.cache:
return self.cache[key]
result = self._run(task)
self.cache[key] = result
return result

Usage Tiers

Python
TIERS = {
"free": {"tokens_day": 10000, "tasks_day": 10},
"pro": {"tokens_day": 100000, "tasks_day": 100},
"enterprise": {"tokens_day": None, "tasks_day": None}
}
def check_usage(user_id: str, tier: str) -> bool:
usage = get_daily_usage(user_id)
limits = TIERS[tier]
if limits["tokens_day"] and usage.tokens > limits["tokens_day"]:
return False
if limits["tasks_day"] and usage.tasks > limits["tasks_day"]:
return False
return True

Safety and Guardrails

Input Validation

Python
from pydantic import BaseModel, validator
class AgentRequest(BaseModel):
task: str
max_tokens: int = 4096
@validator('task')
def validate_task(cls, v):
if len(v) > 10000:
raise ValueError("Task too long")
if contains_pii(v):
raise ValueError("Remove personal information")
return v

Output Filtering

Python
def filter_output(response: str) -> str:
response = remove_system_paths(response)
response = redact_secrets(response)
if contains_harmful_content(response):
return "I cannot provide that information."
return response

Action Approval

Python
SENSITIVE_ACTIONS = ["delete_file", "send_email", "make_payment"]
def execute_with_approval(tool: str, params: dict, user_id: str) -> str:
if tool in SENSITIVE_ACTIONS:
request_id = create_approval_request(user_id, tool, params)
approved = wait_for_approval(request_id, timeout=300)
if not approved:
return "Action not approved."
return execute_tool(tool, params)

Rate Limiting

Python
from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address)
@app.post("/agent")
@limiter.limit("10/minute")
async def run_agent(request: Request):
# ...

Monetization Models

Usage-Based

Bash
Pay per task:
- Free: 10 tasks/month
- Basic: $0.10/task (up to 1000)
- Pro: $0.05/task (unlimited)
Pay per token:
- $0.01 per 1000 tokens

Subscription Tiers

Bash
Free Tier:
- 100 tasks/month
- Basic tools
- Community support
Pro ($29/month):
- 1000 tasks/month
- All tools
- Priority support
Enterprise (Custom):
- Unlimited
- Custom tools
- SLA guarantee

Value-Based

Price by outcome:

  • Research reports: $5-50/report
  • Code reviews: $1/PR
  • Support tickets: $0.50/resolution

Launch Checklist

Technical

  • [ ] Load testing completed
  • [ ] Error handling covers edge cases
  • [ ] Monitoring and alerts configured
  • [ ] Backup and recovery tested
  • [ ] Security audit completed
  • [ ] API documentation published

Product

  • [ ] Onboarding flow tested
  • [ ] Help documentation written
  • [ ] Feedback mechanism in place
  • [ ] Pricing page live
  • [ ] Terms of service updated

Operations

  • [ ] Support channels ready
  • [ ] Escalation procedures defined
  • [ ] Cost monitoring active
  • [ ] Analytics tracking verified

Case Study: Content Agent Product

Let's walk through building a complete product.

Product: ContentBot

AI content creation for marketers.

Features:

  • Blog post generation
  • Social media content
  • Email campaigns
  • SEO optimization
  • Brand voice matching

MVP Timeline

WeekFocus
1Core agent + basic API
2User system + tracking
3Templates + formats
4Deploy + monitor

Architecture

Python
# content_agent.py
class ContentAgent:
def __init__(self, brand_voice: str = None):
self.client = Anthropic()
self.brand_voice = brand_voice
async def generate_blog(self, topic: str, keywords: list) -> str:
# Research topic
research = await self._research(topic)
# Generate outline
outline = await self._create_outline(topic, research)
# Write content
content = await self._write_content(outline, keywords)
# Apply brand voice
if self.brand_voice:
content = await self._apply_voice(content)
return content

Next Steps

  • Review Building Agents for implementation
  • Study successful products in your domain
  • Join AI product communities for feedback

Practice Projects

ProjectGoal
Ship an MVPPick an idea, deploy in 4 weeks
Add monetizationImplement usage-based pricing
Scale to 100 usersHandle concurrent load
Reduce costs 50%Optimize token usage

Resources


Share this article