Using Agents Effectively
Master prompting, debugging, and getting reliable results from AI agents
Using Agents Effectively
Building an agent is one thing. Getting reliable, high-quality results from it is another. This guide covers the techniques that separate frustrating agents from useful ones.
Prompting for Agents
Agent prompts need more structure than chatbot prompts. They define personality, capabilities, and constraints.
System Prompts
Tell the agent who it is and what it can do:
SYSTEM_PROMPT = """You are a research assistant. CAPABILITIES:- Search the web for information- Read and analyze documents- Take notes and organize findings- Generate comprehensive reports GUIDELINES:- Always cite your sources- Verify facts from multiple sources- Acknowledge uncertainty- Ask for clarification when needed LIMITATIONS:- Cannot access private or paywalled content- Cannot make changes to external systems- Must respect rate limits on searches"""Task Prompts
Be specific about what you want:
Few-Shot Examples
Show, don't just tell:
PROMPT = """Analyze customer feedback and extract themes. EXAMPLE INPUT:"The app is fast but crashes when I try to export. Also wish it had dark mode." EXAMPLE OUTPUT:{ "sentiment": "mixed", "themes": [ {"topic": "performance", "sentiment": "positive", "detail": "app speed"}, {"topic": "stability", "sentiment": "negative", "detail": "export crashes"}, {"topic": "features", "sentiment": "neutral", "detail": "dark mode request"} ]} NOW ANALYZE:"{user_input}""""Tool Selection Strategies
Agents pick tools based on descriptions. Write them carefully.
Write Clear Descriptions
# Good: specific about when to use{ "name": "search_knowledge_base", "description": """Search company knowledge base. USE FOR: - Company policies - Product documentation - How-to questions DO NOT USE FOR: - General knowledge - External information - Real-time data Returns: List of relevant articles with titles and snippets"""} # Bad: vague{ "name": "search", "description": "Search for information"}Guide Tool Choice
When you have multiple similar tools, help the agent decide:
TOOL_GUIDE = """Available tools and when to use them: 1. **web_search**: Current events, general knowledge, external info2. **knowledge_base**: Company-specific info, policies, products3. **database_query**: User data, analytics, specific records4. **calculator**: Any mathematical computation Decision tree:- About our company? → knowledge_base- About user data? → database_query- Needs math? → calculator- Otherwise → web_search"""Design Composable Tools
Tools that work together:
RESEARCH_TOOLS = [ {"name": "search_sources", "description": "Find relevant sources"}, {"name": "read_source", "description": "Extract info from a source"}, {"name": "take_note", "description": "Save a note with attribution"}, {"name": "get_notes", "description": "Retrieve all notes"}, {"name": "write_report", "description": "Generate report from notes"}]The agent can naturally flow: search → read → note → repeat → report.
Output Handling
Request Structured Outputs
STRUCTURED_PROMPT = """Return your analysis in this exact JSON format: { "summary": "One paragraph overview", "key_findings": [ {"finding": "string", "confidence": "high|medium|low", "evidence": "string"} ], "recommendations": ["string"], "limitations": ["string"]} Return ONLY the JSON, no additional text."""Validate Outputs
from pydantic import BaseModel, validatorfrom typing import List, Literalimport json class Finding(BaseModel): finding: str confidence: Literal["high", "medium", "low"] evidence: str class Analysis(BaseModel): summary: str key_findings: List[Finding] recommendations: List[str] limitations: List[str] @validator('key_findings') def needs_findings(cls, v): if len(v) < 1: raise ValueError("Must have at least one finding") return v def parse_output(response: str) -> Analysis: try: data = json.loads(response) return Analysis(**data) except json.JSONDecodeError: raise ValueError("Invalid JSON") except Exception as e: raise ValueError(f"Invalid format: {e}")Handle Streaming
async def stream_response(task: str): async with client.messages.stream( model=MODEL, messages=[{"role": "user", "content": task}], tools=TOOLS ) as stream: async for event in stream: if event.type == "content_block_delta": if event.delta.type == "text_delta": yield {"type": "text", "content": event.delta.text} elif event.type == "content_block_start": if event.content_block.type == "tool_use": yield {"type": "tool_start", "tool": event.content_block.name}Debugging Agents
Agents misbehave. Here's how to fix common problems.
Problem: Agent Not Using Tools
Symptom: Agent makes up information instead of searching.
Fixes:
# 1. Stronger system promptSYSTEM = """You MUST use tools to gather information.Do NOT rely on training data for facts.""" # 2. Explicit task instructionsTASK = """Research {topic}. IMPORTANT: Use web_search for EVERY fact you include.Do not include any information that doesn't come from a tool.""" # 3. Require citations"""After each fact, cite the tool call that provided it."""Problem: Infinite Loops
Symptom: Same tool called repeatedly with same inputs.
Fix:
seen_calls = set() for call in tool_calls: key = (call.name, json.dumps(call.input, sort_keys=True)) if key in seen_calls: messages.append({ "role": "user", "content": "You already tried this. Try something different or provide your answer." }) else: seen_calls.add(key) # Execute tool... # Also: hard limit on iterationsif iteration_count >= MAX_ITERATIONS: breakProblem: Hallucinated Tool Results
Symptom: Agent references results that don't exist.
Fix:
# Use explicit markersRESULT_FORMAT = """[TOOL RESULT: {tool_name}]{result}[END TOOL RESULT] Only use information within TOOL RESULT markers.""" # Validate citations reference actual resultsdef validate_citations(response: str, tool_results: list) -> bool: # Check all cited facts appear in tool_results passDebugging Tools
Logging:
import structlog logger = structlog.get_logger() def debug_run(task: str): log = logger.bind(task_id=str(uuid4())) log.debug("agent_start", task=task) for i, step in enumerate(steps): log.debug("step", num=i, tool=step.tool, input=step.input) result = execute(step) log.debug("result", num=i, result=result[:200]) log.debug("agent_done", total_steps=len(steps))Visual trace:
def print_trace(trace: list): for i, step in enumerate(trace): print(f"""Step {i + 1}: Thinking: {step.thinking[:100]}... Tool: {step.tool} Input: {json.dumps(step.input, indent=2)} Result: {step.result[:100]}... Duration: {step.duration:.2f}s""")Advanced Prompting Patterns
Chain of Thought
Make the agent show its work:
COT_PROMPT = """Solve this step by step: {problem} Process:1. Understand what's being asked2. Identify what information you need3. Gather information using tools4. Analyze the information5. Form your conclusion6. Verify your answer Show your reasoning at each step."""Self-Critique
Have the agent check its own work:
SELF_CRITIQUE = """After completing the task, critique your work: 1. Did you fully address the question?2. Are your sources reliable and cited?3. Are there gaps in your analysis?4. What could be improved? Based on your critique, revise if needed."""Reflection on Errors
Help the agent learn from mistakes:
REFLECTION = """The previous attempt had issues:{error_description} Before trying again:1. What went wrong?2. Why did it happen?3. How will you avoid it this time? Now try again with these learnings."""Advanced Patterns
Iterative Refinement
def iterative_refine(task: str, max_rounds: int = 3) -> str: output = agent.run(task) for _ in range(max_rounds): critique_prompt = f""" Task: {task} Attempt: {output} Critique this and provide an improved version. """ response = agent.run(critique_prompt) if "no improvements needed" in response.lower(): break output = extract_improved_version(response) return outputParallel Exploration
async def parallel_explore(task: str, approaches: int = 3) -> str: # Generate different approaches approach_prompt = f"Generate {approaches} different ways to solve: {task}" approaches = await agent.run(approach_prompt) # Execute in parallel results = await asyncio.gather(*[ agent.run(f"Execute this approach: {a}") for a in parse_approaches(approaches) ]) # Select best selection_prompt = f""" Task: {task} Results from {len(results)} approaches: {format_results(results)} Select the best and explain why. """ return await agent.run(selection_prompt)Human-in-the-Loop
SENSITIVE_ACTIONS = ["delete_file", "send_email", "make_payment"] def execute_with_approval(tool: str, params: dict) -> str: if tool in SENSITIVE_ACTIONS: print(f"Agent wants to: {tool}") print(f"Parameters: {params}") approved = input("Approve? (y/n): ") if approved.lower() != 'y': return "Action not approved by user." return execute_tool(tool, params)Best Practices Checklist
Before Running
- [ ] Clear, specific task description
- [ ] Appropriate tools available
- [ ] System prompt defines constraints
- [ ] Output format specified
- [ ] Error handling in place
During Execution
- [ ] Monitor for loops
- [ ] Track token usage
- [ ] Log all tool calls
- [ ] Handle timeouts
- [ ] Validate tool results
After Completion
- [ ] Validate output format
- [ ] Check source citations
- [ ] Review for hallucinations
- [ ] Measure quality metrics
- [ ] Log for analysis
Common Recipes
Research Question
def research(question: str) -> str: return agent.run(f""" Research: {question} Process: 1. Search for 3-5 authoritative sources 2. Read and extract key information 3. Identify agreement and disagreement 4. Synthesize into comprehensive answer Requirements: - Cite every factual claim - Note confidence levels - Include diverse perspectives - Acknowledge limitations Format: Summary → Detailed findings → Sources """)Code Generation
def generate_code(spec: str) -> str: return agent.run(f""" Generate code for: {spec} Process: 1. Clarify requirements 2. Design approach 3. Write code with comments 4. Add error handling 5. Write example usage Output: - Full implementation - Approach explanation - Example usage - Dependencies/limitations """)Data Analysis
def analyze_data(question: str, context: str) -> str: return agent.run(f""" Analyze data to answer: {question} Context: {context} Process: 1. Understand question and available data 2. Write and execute analysis 3. Interpret results 4. Generate visualizations if helpful 5. Form conclusions Output: - Direct answer - Supporting analysis - Visualizations - Caveats """)Next Steps
- Building Agents — Create custom agents
- Agent Products — Ship production systems
Practice
- Optimize prompts on an existing agent
- Add logging and analyze behavior
- Implement output validation
- Build a self-critiquing agent
Success
Master the craft! Good agent use is about clear communication, proper tooling, and continuous refinement. The more you work with agents, the better you'll get at extracting reliable results.