Building AI Agents: Designing the user experience
To design an agent, you (the developer) have three levers to consider:
What should the model control?
What should the developer (you) control?
What should the end-user control?
Model: How can you shape the model’s personality, reasoning and acting capability?
Tools: System instructions, Reinforcement learning
Key metric: Consistency ( e.g., clarifies assumptions?, selects the right set of tools?)
Type of evals: Human and LLM Graders (since the metric is subjective)
Key problems to solve:
Which model should I select?
How do I curate datasets with golden user prompts and reference answers for constructing baseline evals?
How do I tune system instructions to guide the model reasoning ? How do I analyze and trace model reasoning?
Can I improve model consistency by modifying its weights? Does my use-case rely on datasets that the general purpose models are not trained on?
How do I collect user traces or generate synthetic data to apply reinforcement learning to tune the model weights?
Developer: What agent architecture and set of tools will best process the user intent, and enable the model to complete actions?
Tools: Agent architecture and tools design including indexed search, MCP, native tools, memory
Key metric: Accuracy
Type of evals: objective scores (precision@K, recall@K, passed unit test) and LLM graders (e.g., is the PR summary accurate?)
Key problems to solve:
Which agent architecture is best suited for the use-case? How should I break down the workflow and define sub-agents with its own set of tools and system instructions to insert most relevant information at each point?
How do I iterate on available tools and tools definition to improve accuracy?
How do I define metrics and scoring rubrics or objective functions to automate evals?
User experience: What modalities are available to the user? Is the end-user experience responsive or proactive?
Tools: Inputs via speaking (voice), writing (inline, chat, keywords, prompt or rule files), selecting or loading data (files, MCP tool, image upload)
Key metric: Simplicity
Key problems to solve:
Which inputs should the user be able control?
Can I deliver a proactive agent with guardrails given my use-case? For example: a code reviewer agent that automatically gets triggered upon each PR is a proactive agent.
How to solve a MVP use-case completely while keeping the number of input modalities, and the number of user actions required minimal?
Are the user controls intuitive and visible, or hidden under hard to find settings?
Does this summary match your own experience or learnings while designing AI agents?
