[3.4] - LLM Agents

Colab: exercises | solutions

Please send any problems / bugs on the #errata channel in the Slack group, and ask any questions on the dedicated channels for this chapter of material.

If you want to change to dark mode, you can do this by clicking the three horizontal lines in the top-right, then navigating to Settings → Theme.

Links to all other chapters: (0) Fundamentals, (1) Transformer Interpretability, (2) RL.

Introduction

This set of exercises lasts 2 days, and involves building and working with LLM agents. LLM agents consist of a scaffolding program interacting with an LLM API. We'll also build and analyse two tasks for LLM agents to complete, a simple and a complex one, in order to see how LLM agents act.

We'll begin by building a simple Arithmetic Task and Arithmetic Agent. This should teach you the basics of function calling via the OpenAI API (Anthropic's API has minor differences, but operates in essentially the same way). Then, once we're comfortable with function calling and the general setup of LLM agents and tasks, we'll move on to building a more complex agent that plays the Wikipedia Game.

Then we'll explore a variety of elicitation methods. These are methods for getting the best capabilities out of models, and are crucial for evaluating LLM agents. Looking at model performance with elicitation method help us to answer the question "Can the model do this?" Unfortunately, we'll almost never be able to prove that the model doesn't have a capability, and will only be able to say that with some effort, we couldn't get the model to demonstrate this capability. This means we'll have to put a lot of effort into trying to exhibit the behavior in models (to have the highest confidence when we make a claim that the model can't exhibit this behavior). This will involve:

  • Improving our prompting
  • Improving our tools
  • Improving the way the relevant information is stored
  • Ensuring the model can access good information

Each exercise will have a difficulty and importance rating out of 5, as well as an estimated maximum time you should spend on these exercises and sometimes a short annotation. You should interpret the ratings & time estimates relatively (e.g. if you find yourself spending about 50% longer on the exercises than the time estimates, adjust accordingly). Please do skip exercises / look at solutions if you don't feel like they're important enough to be worth doing, and you'd rather get to the good stuff!

For a lecture on the material today, which provides some high-level understanding before you dive into the material, watch the video below:

Content & Learning Objectives

1️⃣ Intro to LLM Agents

Learning Objectives
  • Understand why we want to evaluate LLM agents.
  • Read resources about LLM agent evaluations to understand the current state of the field.
  • Understand the common failure modes of LLM agents.

2️⃣ Building a Simple Arithmetic Agent

Learning Objectives
  • Understand that a LLM agent is just a "glorified for-loop" (of the scaffolding program interacting with the LLM API).
  • Learn how to use function calling to allow LLMs to use external tools.
  • Understand the main functionalities of an LLM agent.

3️⃣ Building a more Complex Agent: WikiGame

Learning Objectives
  • Get comfortable building a more complex task, with noisy and imperfect tool outputs
  • Understand how to build a more complex agent that implements dynamic decision-making
  • Observe the failure modes of a more complex agent

4️⃣ Elicitation

Learning Objectives
  • Understand the importance of elicitation in evaluating LLM agents
  • Understand the different methods of elicitation
  • Understand how to improve prompting, tools, history storage, and information access in LLM agents

Setup code

import json
import math
import os
import re
import sys
from pathlib import Path
from typing import Any, Literal, Optional

import wikipedia
from anthropic import Anthropic
from dotenv import load_dotenv
from openai import BadRequestError, OpenAI
from openai.types.chat.chat_completion_message import ChatCompletionMessage
from openai.types.chat.chat_completion_message_tool_call import (
    ChatCompletionMessageToolCall,
)
from wikipedia import DisambiguationError, PageError, WikipediaPage

# Make sure exercises are in the path
chapter = "chapter3_llm_evals"
section = "part4_llm_agents"
root_dir = next(p for p in Path.cwd().parents if (p / chapter).exists())
exercises_dir = root_dir / chapter / "exercises"
section_dir = exercises_dir / section
if str(exercises_dir) not in sys.path:
    sys.path.append(str(exercises_dir))

import part4_llm_agents.tests as tests
from part1_intro_to_evals.solutions import retry_with_exponential_backoff
from utils import countrylist, evaluate_expression, wiki_pairs

MAIN = __name__ == "__main__"
Reminder - how to set up your OpenAI API keys, before running the code below

- OpenAI: If you haven't already, go to https://platform.openai.com/ to create an account, then create a key in 'Dashboard'-> 'API keys'. - Anthropic: If you haven't already, go to https://console.anthropic.com/ to create an account, then select 'Get API keys' and create a key.

If you're in Google Colab, you should be able to set API Keys from the "secrets" tab on the left-side of the screen (the key icon). If in VSCode, then you can create a file called ARENA_3.0/.env containing the following:

OPENAI_API_KEY = "your-openai-key"
ANTHROPIC_API_KEY = "your-anthropic-key"

In the latter case, you'll also need to run the load_dotenv() function, which will load the API keys from the .env file & set them as environment variables.

Once you've done this (either the secrets tab based method for Colab or .env-based method for VSCode), you can get the keys as os.getenv("OPENAI_API_KEY") and os.getenv("ANTHROPIC_API_KEY") in the code below. Note that the code OpenAI() and Anthropic() both accept an api_key parameter, but in the absence of this parameter they'll look for environment variables with the names OPENAI_API_KEY and ANTHROPIC_API_KEY - which is why it's important to get the names exactly right when you save your keys!

load_dotenv()

assert os.getenv("OPENAI_API_KEY") is not None, (
    "You must set your OpenAI API key - see instructions in dropdown"
)

# OPENAI_API_KEY

openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))