AI Agents

AI Agents

There are all kinds of agents in the world: travel agents, real estate agents, insurance agents, even secret agents. Back in the old days, if you wanted to book a trip to Paris, you’d call a travel agent. They would ask you some questions about when you wanted to leave, how many people were going, and so on. Then the travel agent would book you a ticket with the airline. The agent was an assistant that helped you accomplish a task: booking a flight.

This aspect of assistance is also true for other kinds of agents. A real estate agent is an assistant who helps connect house sellers and house buyers. An insurance agent is an assistant who helps insurance companies find customers and helps customers know what kind of insurance to buy. A secret agent is an assistant who helps governments learn classified information about another country.

In the same way, AI agents are also assistants that help to accomplish some task. You’ve probably used AI Agents many times without knowing that that’s what they were:

  • “Alexa, play me some classical music.”
  • “Hey Siri, set a timer for 30 minutes.”
  • “OK, Google, tell me the weather forecast for Chicago tomorrow.”

These virtual assistants process spoken language to determine the speaker’s intent and then perform some action based on that intent. You could turn on classical music, set a timer, or look up a forecast yourself, but these assistants do it for you.

An AI Agent, as the term is most commonly used today, is an autonomous system that uses a large language model (LLM) to make decisions and perform actions. Below are a few keywords that are often associated with AI Agents. They aren’t all true of every agent but are generally applicable:

  • Reasoning: Agents reason about the input they receive from the outside environment and plan how to respond.
  • Action: Agents can access external tools and use them to perform actions.
  • Memory: Agents often “remember” historical state by storing past input and output as a list of messages.
  • State: Agents act like state machines and transition from one state to another until their task is complete.
  • Autonomous: Agents make decisions on their own but still have the option to consult a human.
  • Cyclic: Agents perform actions repeatedly to optimize the results.

Note: Words like “reasoning” and “planning” are perhaps too strong for what is essentially a glorified text-prediction mechanism, but you can leave semantics to the philosophers for now.

The Need For a Better Assistant

There’s massive room for improvement in the automated customer service market. While hiring and training an adequate team of humans to handle customer questions and complaints may be prohibitively expensive, the automated services that companies have employed are woefully lacking.

You’ve probably been one of the countless multitudes who have been frustrated by the old-fashioned Interactive Voice Response (IVR) systems on telephones:

  • Machine: For English, press 1. Para Español, oprima el dos.
  • Human: 1.
  • Machine: If you’d like to know your balance, press 1. If you’d like to report a fraud, press 2. If you’d like to order a new card, press 3. If you’d like to change your PIN, press 4. If you’d like to talk to a representative, press 5. If you’d like to hear the options again, press 6.
  • Human: 6.
  • Human: 6.
  • Human: 6.
  • Human: 5.
  • Machine: We’re sorry. All representatives are busy now. Please stay on the line, and a representative will be with you shortly.
  • Machine: Plays music for an hour.
  • Human: Click.

The spoken version isn’t any better:

  • Machine: If you’d like to cancel your subscription, say “Cancel”. If you’d like to renew your subscription, say “Renew”.
  • Human: “Cancel”.
  • Machine: “I’m sorry, I didn’t catch that. If you’d like to cancel your subscription, say “Cancel”. If you’d like to renew your subscription, say “Renew”.
  • Human: CANCEL!
  • Machine: “I’m sorry, I didn’t catch…”
  • Human: Aaaagh!!! …click.

Similarly annoying experiences are also easy to find on the web. Many websites prominently display chatbots that can’t do anything more than point you to the FAQ page you already read.

Recent advances in natural language processing and the advent of readily available large language models provide a huge opportunity to improve customer support across a wide range of industries.

Opportunities For AI Agents

Opportunities for AI Agents aren’t limited to voice assistants and chatbots. There are many applications for AI Agents. Here are just a few:

  • Programming assistants
  • Autonomous robots
  • Tutors
  • Financial traders or fraud detectors
  • Marketing content creators
  • Threat detectors
  • NPCs in video games

The list could go on and on.

You may not take on anything as ambitious as building an autonomous robot. Still, once you understand how to harness the power of large language models, there are many simple but powerful applications that you can make.

Usually, when you use an LLM application, like ChatGPT, the input is text, and the output is also text. However, the essence of an AI Agent is that the input is text, and the result is an action.

  • Input: “Book me a flight to Chicago.” (text)

  • Output: AI agent books a flight to Chicago. (action)

  • Input: “Find the cosine of 2.5.” (text)

  • Output: AI agent uses a calculator to find the precise answer. (action)

  • Input: “Who won the soccer match in Buenos Aires yesterday?” (text)

  • Output: AI agent does a web search for the answer. (action)

You’ve learned that AI agents fundamentally are an LLM that can perform an action. But how do AI Agents actually take actions rather than just produce text? The answer is through function calling.

Function Calling

Functions are a basic component of most programming languages. Here’s a simple function in Python:

def do_something():
  return 2 + 3

This is how you would call that function:

response = do_something()
print(response) // 5

The function do_something added two numbers, but it could have done anything. It could have called another function, printed a document, fetched some data from an external API, or launched a rocket to Mars. Since functions can do anything, if you give an LLM the power to call a function, then the LLM can do anything. That’s the power of an AI agent.

The secret to giving an LLM the power to call a function is knowing how to write a good prompt. Take the following one, for example. Feel free to try it for yourself in ChatGPT or another LLM:

  • Your role is an AI Agent. If the user wants to know anything about weather conditions, respond with the exact string “do_something”. Otherwise, respond with the word “error”. No other responses are allowed.

Now, when you ask questions like:

  • Is it going to rain tomorrow?
  • What is the temperature in France?
  • How much is it likely to snow in Lagos on Friday?

You should get the response do_something. However, if you ask, “Can pigs fly?” you’ll get error back.

What used to be an extremely difficult string parsing problem just became a simple flow-control exercise:

if response == "do_something":
  do_something()
else:
  raise ValueError("error")

The code takes the string output "do_something" from the LLM and translates it into the action of calling the function do_something. That’s how you get an LLM to call a function. You teach it to produce a structured output and then match it to an appropriate function.

Demo Project

As you work through the module this week, you’ll create an AI Agent to assist mobile app developers in localizing their app strings. For example, given the UI in the screenshot below, it’d be nice to have an automated way to translate all those strings into Spanish accurately:

The demo will assume you’re familiar with the OpenAI APIs. If you’re not, go back and review the module Text Generation with OpenAI.

You’ll benefit most from the video demo if you don’t just watch it but follow along.

See forum comments
Download course materials from Github
Previous: Introduction Next: AI Agent Translation Demo