2️⃣ Building a Simple Arithmetic Agent

Learning Objectives
  • Learn how to use function calling to allow LLMs to use external tools.
  • Understand the main functionalities of an LLM agent.

In general, most LLM agents share these core components:

  1. LLM API interface: A basic function that makes API calls (e.g. get_response()).
  2. Actions/Tools: A set of actions the agent can take.
  3. Task State Management: Keeping track of the current state of the task and any relevant context.
  4. Memory: A way to store and retrieve information from past interactions (i.e. chat history). The simplest implemention is usually to store the list of past chat messages in a chat_history class attribute.
  5. Observation Parser: Functions to parse and interpret the results of actions and update the state.
  6. Decision/Execution Logic: The rules or algorithms used to choose actions based on the current state and LLM output.

These components are implemented across the Task, Agent, and Tool classes. However, the specific breakdown of these components in our implementation is a design choice and can vary depending on the task. While some are very natural (e.g. LLM API interface goes into Agent, task state management goes into Task), others can vary (e.g. Tools could be implemented and handled entirely within the Task or Agent class, as opposed to being separate classes; observation parsing could be in the Task or the Agent class). In general, we want to maximize separability and minimize interfaces/dependencies, so that we can easily swap out different agents for the same task, or vice versa.

Task

In an LLM agent eval, there will usually be a Task class that interacts with the Agent. In general, the Task will:

  • Prepare and provide the task instruction (and necessary files, functions etc) to the agent
  • Parse and score the agent's output
  • Update the task state accordingly (e.g. proceeds onto the next step of the task, ends the task).

Exercise - Build a simple arithmetic task