Assessing AI Agents

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

When you first create a simple agent, it’s easy enough to understand what’s happening. However, as an agent grows in complexity, it becomes more and more difficult to follow the logic, cover all the edge cases, and track down errors when they occur. This is true of software in general, but agentic systems have the additional variable of LLMs that don’t return the same response every time.

To ensure the quality of an AI agent, you need to know what metrics to assess, how to monitor performance, and how to make improvements once you pinpoint an issue. First, you’ll look at what to measure.

Developing Assessment Metrics

What do you look for when evaluating an AI agent’s quality? Some areas to consider are accuracy, user satisfaction, and efficiency. Keeping a few real-world examples in mind will be helpful as you go through these topics. Remember the localizer app you’ve been building throughout the module. Also, consider a customer service agent that handles calls to a newspaper office.

Accuracy

Accuracy refers to how often the AI agent completes a task successfully. This can be thought of in terms of the agent’s success or error rate.

User Satisfaction

User satisfaction is closely related to accuracy. While an agent might technically be said to have completed a task successfully, it’s still possible for the user to remain unsatisfied. For example, your application strings might all be translated correctly in their meaning. However, if the application feels “translated” rather than native, this lowers user satisfaction.

Efficiency

Another metric to measure the quality of your AI agent is efficiency. Time and resources are both issues here.

Time

An agent might successfully complete a task, but if it takes a long time, it’s a lower-quality agent. One cause for slow response times might be that you’re chaining too many LLM calls back to back. Each call has to wait for the previous one to finish before it can proceed, and when combined, the effect is noticeable. Another cause for a slow response might be server overload.

Resources

Resources for an AI agent largely refer to the number of tokens a task uses. More tokens mean more money. The cost per million tokens is decreasing, but it can still be significant for certain applications. That means you don’t want to waste tokens unnecessarily.

See forum comments
Download course materials from Github
Previous: Introduction Next: Monitoring Agent Behavior