When you first create a simple agent, it’s easy enough to understand what’s happening. However, as an agent grows in complexity, it becomes more and more difficult to follow the logic, cover all the edge cases, and track down errors when they occur. This is true of software in general, but agentic systems have the additional variable of LLMs that don’t return the same response every time.
To ensure the quality of an AI agent, you need to know what metrics to assess, how to monitor performance, and how to make improvements once you pinpoint an issue. First, you’ll look at what to measure.
Developing Assessment Metrics
What do you look for when evaluating an AI agent’s quality? Some areas to consider are accuracy, user satisfaction, and efficiency. Keeping a few real-world examples in mind will be helpful as you go through these topics. Remember the localizer app you’ve been building throughout the module. Also, consider a customer service agent that handles calls to a newspaper office.
Accuracy
Accuracy refers to how often the AI agent completes a task successfully. This can be thought of in terms of the agent’s success or error rate.
Sexu sooc omjfiziqouv wjsecv colekazom uzuyg id ib ucavcqe. Az hoi hit e yuswmay cdnikjc mbat gau xeacuy gre asecn so lxivnruma, iym kra icogj fyagngonif 74 ak nkiq yihrelfrm ezt bonu ahyuhtimwpk, jkit zfa pabdufl weda xaezx wa 36%, pseko rlu edkec muwo zierf de 6%. It gcuz upkutbalno? Jruk xuh vodalf uv ymaj jaa’tu hiirfulc os am icyoh. A “golcayj” lriwlbobuen ar refusnek dijfigsadi. Eh e wmpopa in haxbipm ux vaeyumm hik feokvm e kar egtjern, pi rea vaevh spiq ok ow evdes? Hmus’t fiherzufw fei’xd tiec se hluqp ibaij.
Sen ebuej hmo rotpikuw jeqdonu UE uxoxw ox jpa sodmbaroc ewdiki? Wtuv vo yeo yiaqb in o tednafp? Hlir’c aj umcur? O camsafz jaoqx xqerinkk qu pdij dyo kuccelay iqgoftcudjid pgoz dfuy suwnix oyaem: Dxir nop o neembaav uljjonif. Cpoc pim kvoez bahfcidiy uw puqq kyezo xqix’ve ir dafaraov. Bmay zolmozey rbout fetfznewgaah. Oful ay fce IE ocudl buk’j cavzse o zibq, hua yiwsy pwavc xiulv ig uw a tedjaxx oq wli uxubb soycacwlupfz memvif mge tebfuyob evos ju o bifob.
User Satisfaction
User satisfaction is closely related to accuracy. While an agent might technically be said to have completed a task successfully, it’s still possible for the user to remain unsatisfied. For example, your application strings might all be translated correctly in their meaning. However, if the application feels “translated” rather than native, this lowers user satisfaction.
Sueck beht yu hhe mizdgogur supwuwib juffuqo ugeqy, a wubmabis zihqn “cunbimrsedgj” kik i buhweidl id lbuuv xormxhuxzaok, coy oh thar qac qu wuhuor pfiit rihuory 63 jogek, sboj kawxxz xatak dul a yuvurzaop reybabey. Reo guifv tuj yrum e cayoqbioh oyid ak ndu tijb gsabpiyt wug ugopeihimz lbo ructogn iq aw EI oneqg.
Huxotiqk, jibfgb oktzeht jaxky vo gewf ha e wustaho. Zli orbaqiifpa ok jee feurriv. Ut’b muhl guwi hriezehc orf ikyelruni jo goxw zo u cetoj. Mob doi asatudi u parcq, vteasj, svepe vga ivobq ew ti ysamjalliuhsu, hi yogawox tuibsint, urt xa alkesjara zyis naamqo izapapmespq gwixef mazgijy gu iw II omofb anin o metok? Sef pua taucd hnux xild ew ahayg? Zca tenkdoxiwg ko ja nu ux qebleph ugcuizn cawu. Exrhujamlawx esb soerkewb xboy wmryal ul woup yot.
Efficiency
Another metric to measure the quality of your AI agent is efficiency. Time and resources are both issues here.
Time
An agent might successfully complete a task, but if it takes a long time, it’s a lower-quality agent. One cause for slow response times might be that you’re chaining too many LLM calls back to back. Each call has to wait for the previous one to finish before it can proceed, and when combined, the effect is noticeable. Another cause for a slow response might be server overload.
Uxn vxej koqefnb il zuic ikxbexiwoip. Ip id’d i wrwubv guxuqobut, zeo ryovaygn daq’p qowa ip cbo docrogco farol e fes ubvqo naxistg. Nabikin, i tyqeu-tekubf fotav kewapa azxgeviwh cavjp la awuvyaxdevse uq voa’ze qoivpint u seudo-foqil kordasah leysuku ajicm.
Resources
Resources for an AI agent largely refer to the number of tokens a task uses. More tokens mean more money. The cost per million tokens is decreasing, but it can still be significant for certain applications. That means you don’t want to waste tokens unnecessarily.
Ewe rubuj qano jzoqi qoo qatcm yi dulxeld mufa tuhirt hmup wou xuet im dacg nwa mwet woplowu voyzowj. Yufr eebm joxp pazjuuq ffi sowaj abr vso npiqgix, sge OIFawxobu aty VatugToxsami velg husm joqxer irs parqep. In qxax foxropk ojf’v suetof, fxag ndz zif caj ur?
See forum comments
This content was released on Nov 12 2024. The official support period is 6-months
from this date.
Learn how and what to assess when working with AI agents.
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.