Building AI Agents: Designing the user experience

Jun 23, 2025

To design an agent, you (the developer) have three levers to consider:

Model: How can you shape the model’s personality, reasoning and acting capability?

Tools: System instructions, Reinforcement learning
Key metric: Consistency ( e.g., clarifies assumptions?, selects the right set of tools?)
Type of evals: Human and LLM Graders (since the metric is subjective)
Key problems to solve:
- Which model should I select?
- How do I curate datasets with golden user prompts and reference answers for constructing baseline evals?
- How do I tune system instructions to guide the model reasoning ? How do I analyze and trace model reasoning?
- Can I improve model consistency by modifying its weights? Does my use-case rely on datasets that the general purpose models are not trained on?
- How do I collect user traces or generate synthetic data to apply reinforcement learning to tune the model weights?

Developer: What agent architecture and set of tools will best process the user intent, and enable the model to complete actions?

Tools: Agent architecture and tools design including indexed search, MCP, native tools, memory
Key metric: Accuracy
Type of evals: objective scores (precision@K, recall@K, passed unit test) and LLM graders (e.g., is the PR summary accurate?)
Key problems to solve:
- Which agent architecture is best suited for the use-case? How should I break down the workflow and define sub-agents with its own set of tools and system instructions to insert most relevant information at each point?
- How do I iterate on available tools and tools definition to improve accuracy?
- How do I define metrics and scoring rubrics or objective functions to automate evals?

User experience: What modalities are available to the user? Is the end-user experience responsive or proactive?

Tools: Inputs via speaking (voice), writing (inline, chat, keywords, prompt or rule files), selecting or loading data (files, MCP tool, image upload)
Key metric: Simplicity
Key problems to solve:
- Which inputs should the user be able control?
- Can I deliver a proactive agent with guardrails given my use-case? For example: a code reviewer agent that automatically gets triggered upon each PR is a proactive agent.
- How to solve a MVP use-case completely while keeping the number of input modalities, and the number of user actions required minimal?
- Are the user controls intuitive and visible, or hidden under hard to find settings?

Does this summary match your own experience or learnings while designing AI agents?