Skip to main content

Using Agents Effectively

Master prompting, debugging, and getting reliable results from AI agents

2-3 hours
3 min read
Updated January 15, 2026

Using Agents Effectively

Building an agent is one thing. Getting reliable, high-quality results from it is another. This guide covers the techniques that separate frustrating agents from useful ones.

The Reliability Stack
Each layer improves agent performance

Prompting for Agents

Agent prompts need more structure than chatbot prompts. They define personality, capabilities, and constraints.

System Prompts

Tell the agent who it is and what it can do:

Python
SYSTEM_PROMPT = """You are a research assistant.
CAPABILITIES:
- Search the web for information
- Read and analyze documents
- Take notes and organize findings
- Generate comprehensive reports
GUIDELINES:
- Always cite your sources
- Verify facts from multiple sources
- Acknowledge uncertainty
- Ask for clarification when needed
LIMITATIONS:
- Cannot access private or paywalled content
- Cannot make changes to external systems
- Must respect rate limits on searches
"""

Task Prompts

Be specific about what you want:

Few-Shot Examples

Show, don't just tell:

Python
PROMPT = """
Analyze customer feedback and extract themes.
EXAMPLE INPUT:
"The app is fast but crashes when I try to export. Also wish it had dark mode."
EXAMPLE OUTPUT:
{
"sentiment": "mixed",
"themes": [
{"topic": "performance", "sentiment": "positive", "detail": "app speed"},
{"topic": "stability", "sentiment": "negative", "detail": "export crashes"},
{"topic": "features", "sentiment": "neutral", "detail": "dark mode request"}
]
}
NOW ANALYZE:
"{user_input}"
"""

Tool Selection Strategies

Agents pick tools based on descriptions. Write them carefully.

Write Clear Descriptions

Python
# Good: specific about when to use
{
"name": "search_knowledge_base",
"description": """Search company knowledge base.
USE FOR:
- Company policies
- Product documentation
- How-to questions
DO NOT USE FOR:
- General knowledge
- External information
- Real-time data
Returns: List of relevant articles with titles and snippets"""
}
# Bad: vague
{
"name": "search",
"description": "Search for information"
}

Guide Tool Choice

When you have multiple similar tools, help the agent decide:

Python
TOOL_GUIDE = """
Available tools and when to use them:
1. **web_search**: Current events, general knowledge, external info
2. **knowledge_base**: Company-specific info, policies, products
3. **database_query**: User data, analytics, specific records
4. **calculator**: Any mathematical computation
Decision tree:
- About our company? → knowledge_base
- About user data? → database_query
- Needs math? → calculator
- Otherwise → web_search
"""

Design Composable Tools

Tools that work together:

Python
RESEARCH_TOOLS = [
{"name": "search_sources", "description": "Find relevant sources"},
{"name": "read_source", "description": "Extract info from a source"},
{"name": "take_note", "description": "Save a note with attribution"},
{"name": "get_notes", "description": "Retrieve all notes"},
{"name": "write_report", "description": "Generate report from notes"}
]

The agent can naturally flow: search → read → note → repeat → report.


Output Handling

Request Structured Outputs

Python
STRUCTURED_PROMPT = """
Return your analysis in this exact JSON format:
{
"summary": "One paragraph overview",
"key_findings": [
{"finding": "string", "confidence": "high|medium|low", "evidence": "string"}
],
"recommendations": ["string"],
"limitations": ["string"]
}
Return ONLY the JSON, no additional text.
"""

Validate Outputs

Python
from pydantic import BaseModel, validator
from typing import List, Literal
import json
class Finding(BaseModel):
finding: str
confidence: Literal["high", "medium", "low"]
evidence: str
class Analysis(BaseModel):
summary: str
key_findings: List[Finding]
recommendations: List[str]
limitations: List[str]
@validator('key_findings')
def needs_findings(cls, v):
if len(v) < 1:
raise ValueError("Must have at least one finding")
return v
def parse_output(response: str) -> Analysis:
try:
data = json.loads(response)
return Analysis(**data)
except json.JSONDecodeError:
raise ValueError("Invalid JSON")
except Exception as e:
raise ValueError(f"Invalid format: {e}")

Handle Streaming

Python
async def stream_response(task: str):
async with client.messages.stream(
model=MODEL,
messages=[{"role": "user", "content": task}],
tools=TOOLS
) as stream:
async for event in stream:
if event.type == "content_block_delta":
if event.delta.type == "text_delta":
yield {"type": "text", "content": event.delta.text}
elif event.type == "content_block_start":
if event.content_block.type == "tool_use":
yield {"type": "tool_start", "tool": event.content_block.name}

Debugging Agents

Agents misbehave. Here's how to fix common problems.

Problem: Agent Not Using Tools

Symptom: Agent makes up information instead of searching.

Fixes:

Python
# 1. Stronger system prompt
SYSTEM = """You MUST use tools to gather information.
Do NOT rely on training data for facts."""
# 2. Explicit task instructions
TASK = """
Research {topic}.
IMPORTANT: Use web_search for EVERY fact you include.
Do not include any information that doesn't come from a tool.
"""
# 3. Require citations
"""After each fact, cite the tool call that provided it."""

Problem: Infinite Loops

Symptom: Same tool called repeatedly with same inputs.

Fix:

Python
seen_calls = set()
for call in tool_calls:
key = (call.name, json.dumps(call.input, sort_keys=True))
if key in seen_calls:
messages.append({
"role": "user",
"content": "You already tried this. Try something different or provide your answer."
})
else:
seen_calls.add(key)
# Execute tool...
# Also: hard limit on iterations
if iteration_count >= MAX_ITERATIONS:
break

Problem: Hallucinated Tool Results

Symptom: Agent references results that don't exist.

Fix:

Python
# Use explicit markers
RESULT_FORMAT = """
[TOOL RESULT: {tool_name}]
{result}
[END TOOL RESULT]
Only use information within TOOL RESULT markers.
"""
# Validate citations reference actual results
def validate_citations(response: str, tool_results: list) -> bool:
# Check all cited facts appear in tool_results
pass

Debugging Tools

Logging:

Python
import structlog
logger = structlog.get_logger()
def debug_run(task: str):
log = logger.bind(task_id=str(uuid4()))
log.debug("agent_start", task=task)
for i, step in enumerate(steps):
log.debug("step", num=i, tool=step.tool, input=step.input)
result = execute(step)
log.debug("result", num=i, result=result[:200])
log.debug("agent_done", total_steps=len(steps))

Visual trace:

Python
def print_trace(trace: list):
for i, step in enumerate(trace):
print(f"""
Step {i + 1}:
Thinking: {step.thinking[:100]}...
Tool: {step.tool}
Input: {json.dumps(step.input, indent=2)}
Result: {step.result[:100]}...
Duration: {step.duration:.2f}s
""")

Advanced Prompting Patterns

Chain of Thought

Make the agent show its work:

Python
COT_PROMPT = """
Solve this step by step:
{problem}
Process:
1. Understand what's being asked
2. Identify what information you need
3. Gather information using tools
4. Analyze the information
5. Form your conclusion
6. Verify your answer
Show your reasoning at each step.
"""

Self-Critique

Have the agent check its own work:

Python
SELF_CRITIQUE = """
After completing the task, critique your work:
1. Did you fully address the question?
2. Are your sources reliable and cited?
3. Are there gaps in your analysis?
4. What could be improved?
Based on your critique, revise if needed.
"""

Reflection on Errors

Help the agent learn from mistakes:

Python
REFLECTION = """
The previous attempt had issues:
{error_description}
Before trying again:
1. What went wrong?
2. Why did it happen?
3. How will you avoid it this time?
Now try again with these learnings.
"""

Advanced Patterns

Iterative Refinement

Python
def iterative_refine(task: str, max_rounds: int = 3) -> str:
output = agent.run(task)
for _ in range(max_rounds):
critique_prompt = f"""
Task: {task}
Attempt:
{output}
Critique this and provide an improved version.
"""
response = agent.run(critique_prompt)
if "no improvements needed" in response.lower():
break
output = extract_improved_version(response)
return output

Parallel Exploration

Python
async def parallel_explore(task: str, approaches: int = 3) -> str:
# Generate different approaches
approach_prompt = f"Generate {approaches} different ways to solve: {task}"
approaches = await agent.run(approach_prompt)
# Execute in parallel
results = await asyncio.gather(*[
agent.run(f"Execute this approach: {a}")
for a in parse_approaches(approaches)
])
# Select best
selection_prompt = f"""
Task: {task}
Results from {len(results)} approaches:
{format_results(results)}
Select the best and explain why.
"""
return await agent.run(selection_prompt)

Human-in-the-Loop

Python
SENSITIVE_ACTIONS = ["delete_file", "send_email", "make_payment"]
def execute_with_approval(tool: str, params: dict) -> str:
if tool in SENSITIVE_ACTIONS:
print(f"Agent wants to: {tool}")
print(f"Parameters: {params}")
approved = input("Approve? (y/n): ")
if approved.lower() != 'y':
return "Action not approved by user."
return execute_tool(tool, params)

Best Practices Checklist

Before Running

  • [ ] Clear, specific task description
  • [ ] Appropriate tools available
  • [ ] System prompt defines constraints
  • [ ] Output format specified
  • [ ] Error handling in place

During Execution

  • [ ] Monitor for loops
  • [ ] Track token usage
  • [ ] Log all tool calls
  • [ ] Handle timeouts
  • [ ] Validate tool results

After Completion

  • [ ] Validate output format
  • [ ] Check source citations
  • [ ] Review for hallucinations
  • [ ] Measure quality metrics
  • [ ] Log for analysis

Common Recipes

Research Question

Python
def research(question: str) -> str:
return agent.run(f"""
Research: {question}
Process:
1. Search for 3-5 authoritative sources
2. Read and extract key information
3. Identify agreement and disagreement
4. Synthesize into comprehensive answer
Requirements:
- Cite every factual claim
- Note confidence levels
- Include diverse perspectives
- Acknowledge limitations
Format: Summary → Detailed findings → Sources
""")

Code Generation

Python
def generate_code(spec: str) -> str:
return agent.run(f"""
Generate code for: {spec}
Process:
1. Clarify requirements
2. Design approach
3. Write code with comments
4. Add error handling
5. Write example usage
Output:
- Full implementation
- Approach explanation
- Example usage
- Dependencies/limitations
""")

Data Analysis

Python
def analyze_data(question: str, context: str) -> str:
return agent.run(f"""
Analyze data to answer: {question}
Context: {context}
Process:
1. Understand question and available data
2. Write and execute analysis
3. Interpret results
4. Generate visualizations if helpful
5. Form conclusions
Output:
- Direct answer
- Supporting analysis
- Visualizations
- Caveats
""")

Next Steps

  1. Building Agents — Create custom agents
  2. Agent Products — Ship production systems

Practice

  • Optimize prompts on an existing agent
  • Add logging and analyze behavior
  • Implement output validation
  • Build a self-critiquing agent

Share this article