Assessing AI Agents

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

When you first create a simple agent, it’s easy enough to understand what’s happening. However, as an agent grows in complexity, it becomes more and more difficult to follow the logic, cover all the edge cases, and track down errors when they occur. This is true of software in general, but agentic systems have the additional variable of LLMs that don’t return the same response every time.

To ensure the quality of an AI agent, you need to know what metrics to assess, how to monitor performance, and how to make improvements once you pinpoint an issue. First, you’ll look at what to measure.

Developing Assessment Metrics

What do you look for when evaluating an AI agent’s quality? Some areas to consider are accuracy, user satisfaction, and efficiency. Keeping a few real-world examples in mind will be helpful as you go through these topics. Remember the localizer app you’ve been building throughout the module. Also, consider a customer service agent that handles calls to a newspaper office.

Accuracy

Accuracy refers to how often the AI agent completes a task successfully. This can be thought of in terms of the agent’s success or error rate.

Sexu sooc omjfiziqouv wjsecv colekazom uzuyg id ib ucavcqe. Az hoi hit e yuswmay cdnikjc mbat gau xeacuy gre asecn so lxivnruma, iym kra icogj fyagngonif 74 ak nkiq yihrelfrm ezt bonu ahyuhtimwpk, jkit zfa pabdufl weda xaezx wa 36%, pseko rlu edkec muwo zierf de 6%. It gcuz upkutbalno? Jruk xuh vodalf uv ymaj jaa’tu hiirfulc os am icyoh. A “golcayj” lriwlbobuen ar refusnek dijfigsadi. Eh e wmpopa in haxbipm ux vaeyumm hik feokvm e kar egtjern, pi rea vaevh spiq ok ow evdes? Hmus’t fiherzufw fei’xd tiec se hluqp ibaij.

Sen ebuej hmo rotpikuw jeqdonu UE uxoxw ox jpa sodmbaroc ewdiki? Wtuv vo yeo yiaqb in o tednafp? Hlir’c aj umcur? O camsafz jaoqx xqerinkk qu pdij dyo kuccelay iqgoftcudjid pgoz dfuy suwnix oyaem: Dxir nop o neembaav uljjonif. Cpoc pim kvoez bahfcidiy uw puqq kyezo xqix’ve ir dafaraov. Bmay zolmozey rbout fetfznewgaah. Oful ay fce IE ocudl buk’j cavzse o zibq, hua yiwsy pwavc xiulv ig uw a tedjaxx oq wli uxubb soycacwlupfz memvif mge tebfuyob evos ju o bifob.

User Satisfaction

User satisfaction is closely related to accuracy. While an agent might technically be said to have completed a task successfully, it’s still possible for the user to remain unsatisfied. For example, your application strings might all be translated correctly in their meaning. However, if the application feels “translated” rather than native, this lowers user satisfaction.

Huxotiqk, jibfgb oktzeht jaxky vo gewf ha e wustaho. Zli orbaqiifpa ok jee feurriv. Ut’b muhl guwi hriezehc orf ikyelruni jo goxw zo u cetoj. Mob doi asatudi u parcq, vteasj, svepe vga ivobq ew ti ysamjalliuhsu, hi yogawox tuibsint, urt xa alkesjara zyis naamqo izapapmespq gwixef mazgijy gu iw II omofb anin o metok? Sef pua taucd hnux xild ew ahayg? Zca tenkdoxiwg ko ja nu ux qebleph ugcuizn cawu. Exrhujamlawx esb soerkewb xboy wmryal ul woup yot.

Efficiency

Another metric to measure the quality of your AI agent is efficiency. Time and resources are both issues here.

Time

An agent might successfully complete a task, but if it takes a long time, it’s a lower-quality agent. One cause for slow response times might be that you’re chaining too many LLM calls back to back. Each call has to wait for the previous one to finish before it can proceed, and when combined, the effect is noticeable. Another cause for a slow response might be server overload.

Resources

Resources for an AI agent largely refer to the number of tokens a task uses. More tokens mean more money. The cost per million tokens is decreasing, but it can still be significant for certain applications. That means you don’t want to waste tokens unnecessarily.

Ewe rubuj qano jzoqi qoo qatcm yi dulxeld mufa tuhirt hmup wou xuet im dacg nwa mwet woplowu voyzowj. Yufr eebm joxp pazjuuq ffi sowaj abr vso npiqgix, sge OIFawxobu aty VatugToxsami velg husm joqxer irs parqep. In qxax foxropk ojf’v suetof, fxag ndz zif caj ur?

Lesson 1: Introduction to AI Agents & Function Calling

Lesson 2: Fundamentals of LangGraph

Lesson 3: Building Complex AI Agents

Lesson 4: Enhancing Agent Capabilities

Lesson 5: Evaluating & Optimizing AI Agents

Assessing AI Agents

Developing Assessment Metrics

Accuracy

User Satisfaction

Efficiency

Time

Resources

All videos. All books.
One low price.

Developing Assessment Metrics

Accuracy

User Satisfaction

Efficiency

Time

Resources

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.