3️⃣ Building a more complex agent: WikiGame
Learning Objectives
- Understand how to build a more complex agent that implements dynamic decision-making
- Observe the failure modes of a more complex agent
The task in this section simulates the Wikipedia game, where the agent starts on one Wikipedia page and attempts to navigate to a goal page using only links found in the main content of Wikipedia pages. Compared to the previous sections, the main challenge here is to implement dynamic decision making while parsing noisy and imperfect outputs.
Quick intro to the Wikipedia API
Our agent will interact with Wikipedia by making tool calls to the Wikipedia API. We will only need to learn the following key functions from the Wikipedia API to be able to implement the basic dynamics of the game.
wikipedia.page()- Returns aWikipediaPageobject, which contains various attributes and methods to access page content. (See page docs for these attributes.)wikipediaPage.title- Returns the title of the pagewikipediaPage.content- Returns the full text content of the page (this can be very long, make sure to take snippets when possible to not use up the context window of the LLM)wikipediaPage.summary- Returns a summary of the page (i.e. the introductory text of the Wikipage before the first section title).wikipediaPage.links- Returns a list of all links as strings
Aside: Wikipedia API content can be weird!
The wikipedia API often outputs content in unintuitive ways. For example, articles that are essentially just a big list become near useless, since the content usually omits the list (for example, see the wikipedia API content for List of countries and dependencies by population). Another issue that you might encounter is that the API formats mathematical expressions in $\LaTeX$ quite poorly (for example, see the wikipedia API content for Kullback-Leibler divergence). This is why it's important to determine what content the wikipedia API produces when .content is called — and why you want to make sure you're testing a large diversity of wikipedia articles.
Aside: Wikipedia "summaries" can be long!
The wikipedia API accesses summaries of pages by presenting all the information before the first titled section. For certain (generally obscure) wikipedia pages, this summary itself can be extremely long, and contain lots of information that is unnecessary to determine the key information about the page the model should be trying to access. We'll handle this later when it comes up by truncating wikipedia's summary to just the first ~500 characters
Run the following code to see how these wikipedia API functions work!
# Retrieve a Wikipedia page from its title
page = wikipedia.page("Large language model")
# Access basic page information
print("Title:", page.title)
print("\nURL", page.url)
print(f"\nSummary (word count {len(page.summary.split())}):", page.summary)
print(
f"\nContent (word count {len(page.content.split())}):",
page.content[:1000],
"......",
)
print(f"""\nLinks (link count {len(page.links)}): [{", ".join(page.links[:7])}, ......]""")
Click to see the output of this code (the wikipedia page might have changed slightly)
Title: Large language model URL https://en.wikipedia.org/wiki/Large_language_model Summary (word count 95): A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text. The largest and most capable LLMs are generative pretrained transformers (GPTs). Modern models can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in. Content (word count 6887): A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text. The largest and most capable LLMs are generative pretrained transformers (GPTs). Modern models can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in. == History == Before 2017, there were a few language models that were large as compared to capacities then available. In the 1990s, the IBM alignment models pioneered statistical language modelling. A smoothed n-gram model in 2001 trained on 0.3 billion words achieved state-of-the-art perplexity at the time. In the 2000s, as Internet use became prevalent, some rese ...... Links (link count 524): [15.ai, AI-complete, AI explainability, API, Action selection, Activation function, Active learning (machine learning), ......]
The following two cell blocks cause an error when run (you should see a DisambiguationError for the first, and a PageError for the second). These are common errors that LLMs can encounter when moving between wikipedia pages, and so we'll need to find a way to handle them:
try:
page = wikipedia.page("Python")
except DisambiguationError as e:
print(type(e), "\n\n", e)
try:
page = wikipedia.page("Animalss", auto_suggest=False)
except Exception as e:
print(type(e), "\n\n", e)
Click to see the output of this code
<class 'wikipedia.exceptions.PageError'> Page id "Animalss" does not match any pages. Try another id!
# Fixes PageError by allowing redirects
page = wikipedia.page("Animalss", redirect=True)
print(page.title)
# Fixes DisambiguationError by selecting the first option
try:
page = wikipedia.page("Python")
except DisambiguationError as e:
page = wikipedia.page(e.options[0])
print(page.title)
The errors above are:
-
DisambiguationError: This was raised because the title "Python" can correspond to multiple pages. Whenever this error is raised, we get a list of options that Wikipedia suggests we could mean, and so we choose the first. -
PageError: This was raised for "Animalss" as there is no Wikipedia page with that title. We can usually avoid these by settingredirect = Trueand allowing Wikipedia to redirect us.
We have implemented a simple function get_page() for you to get the WikipediaPage object for a particular page title with error handling.
def get_page(title: str) -> WikipediaPage:
"""
Get a Wikipedia page object given a title. If the title is ambiguous, choose the first option.
If the title is not found, try to find a similar title.
Args:
title (str): The title of the Wikipedia page
Returns:
WikipediaPage: The Wikipedia page
"""
try:
return wikipedia.page(title, auto_suggest=False, redirect=True)
except DisambiguationError as e:
return wikipedia.page(e.options[0], auto_suggest=False, redirect=True)
except PageError:
return wikipedia.page(title, auto_suggest=True, redirect=True)
What do the kwargs redirect and auto_suggest in wikipedia.page() do?
redirect
- This kwarg enables redirecting when you reference an article title with slight differences to how it is stored in Wikipedia. For example, the Wikipedia API will generally access the correct page if there is a capitalization error on the first letter, but not for capitalization errors in the middle of the word if
redirect = False:
# This returns a WikipediaPage object for the "Human" page
page = wikipedia.page("huMan", redirect = True, auto_suggest=False)
# This raises a PageError since there is no page called "huMan"
page = wikipedia.page("huMan", redirect=False, auto_suggest=False)
- By default, we should set
redirect = Truein thewikipedia.page()function.
auto_suggest
-
This kwarg enables the API to provide suggestions. This allows a lot more than
redirectdoes, sinceredirectis only for the "obvious" cases (e.g. "huMan" → "Human", "U.S. President" → "President of the United States", etc.). Whenauto_suggestis true, it would allow something like "president of states" → "President of the United States", "gogle" → "Google"; both of which would raise an error ifredirect = True, auto_suggest = False. -
However,
auto_suggestcan sometimes be too permissive and lead to errors. For example, the below code will return aWikipediaPageobject for the "Man" page. This is clearly not what we were trying to access, and theauto_suggesthas gotten carried away in this case:
page = wikipedia.page("Human", redirect= False, auto_suggest=True)
- If
redirect = Trueandauto_suggest=True, thenauto_suggesttakes priority. - By default, we should set
auto_suggesttoFalseunless it is used as a last resort to resolve an error!
Exercise - Implement get_permitted_links()
This is a quick exercise to familarize you with the Wikipedia API.
When you get the links from a page using page.links, this will include every possible Wikipedia link that is accessible from the HTML on that page, including those that are not in the main page content (e.g. links in sidebars, links in footnotes etc.), which are irrelevant or not permitted by the rules of the Wiki game.
Write a simple get_permitted_links() function. This should only return the links that can be found inside the main content. The resulting list of permitted links should be about a third as long as the list of links from page.links (although it varies slightly by page).
def get_permitted_links(current_page: WikipediaPage) -> list[str]:
"""
Get "permitted" links (i.e. links that are in the content of the page) from a Wikipedia page.
Args:
current_page (WikipediaPage): The current Wikipedia page
Returns:
list[str]: A list of permitted links from current_page
"""
raise NotImplementedError("You need to implement the get_permitted_links function")
tests.test_get_permitted_links(get_permitted_links)
Solution
def get_permitted_links(current_page: WikipediaPage) -> list[str]:
"""
Get "permitted" links (i.e. links that are in the content of the page) from a Wikipedia page.
Args:
current_page (WikipediaPage): The current Wikipedia page
Returns:
list[str]: A list of permitted links from current_page
"""
all_links = current_page.links
content_lower = current_page.content.lower()
permitted_links = [link for link in all_links if link.lower() in content_lower]
if current_page.title in permitted_links:
permitted_links.remove(current_page.title)
return permitted_links
LLM Agent for WikiGame
The WikiGame class
Below is the WikiGame class that instantiates the Wikipedia game. This class contains the following functionalities:
- Keeps track of task state variables (e.g. the current page, the page history)
- Task-specific helper functions for calling the Wikipedia API.
The implementation of this class has been provided for you, but you should read and understand how they're being implemented, and what they do:
WikiGameinitialises with 4 variables:starting_pagewhich gives the page that the agent should begin on.goal_pagewhich gives the page that the agent should aim to get to.current_pagewhich tracks the current page that the agent is on.page_historywhich tracks all the pages the agent has visited in the game (initially only thestarting_page).
It also comes with 4 methods:
- get_page() which takes the title of a Wikipedia page as a string, and returns the WikipediaPage object associated to this tile (this is the same as the get_page() function we introduced earlier).
- get_permitted_links() which gets the permitted links from WikiGame.current_page. You can replace this function with your solution to the get_permitted_links exercise earlier if you prefer.
- is_permitted_link() which takes a link name, and returns True if this link is a permitted link, and False otherwise.
- check_win() which returns self.current_page == self.goal_page, corresponding to whether the agent has won the game.
Providing information to the agent
While models are trained on most of the Wikipedia content, a particular page may still be confused with something else, or be an article that was added after the training cutoff. Models also can't generally recall information in their training data if they only come up once or twice (as is often the case for obscure wikipedia articles). So you should use the game's get_summary() function to provide details of the goal page to the agent in its initial message.
class WikiGame:
def __init__(
self,
starting_page: str,
goal_page: str,
):
"""
This task simulates the Wikipedia game, where the agent starts on one Wikipedia page and
attempts to navigate to a goal page using only links found in the main content of Wikipedia
pages.
Args:
starting_page (str): The page the agent starts on.
goal_page (str): The page the agent is trying to reach.
Attributes:
page_history (list[str]): The history of pages visited by the agent.
starting_page (WikipediaPage): The starting page of the game.
goal_page (WikipediaPage): The goal page of the game.
current_page (WikipediaPage): The current page the agent is on.
"""
self.page_history: list[str] = [starting_page]
self.starting_page: WikipediaPage = self.get_page(starting_page)
self.goal_page: WikipediaPage = self.get_page(goal_page)
self.current_page: WikipediaPage = self.starting_page
# ========================= Helper Functions (given) =========================
# Get page and page summary
@staticmethod
def get_page(title: str) -> WikipediaPage:
"""
Get a Wikipedia page object given a title. If the title is ambiguous, choose the first
option. If the title is not found, try to find a similar title.
Args:
title (str): The title of the Wikipedia page
Returns:
WikipediaPage: The Wikipedia page
"""
try:
return wikipedia.page(title, auto_suggest=False, redirect=True)
except DisambiguationError as e:
return wikipedia.page(e.options[0], auto_suggest=False, redirect=True)
except PageError:
return wikipedia.page(title, auto_suggest=True, redirect=True)
def get_page_summary(self, page: WikipediaPage | None = None) -> str:
"""
Get summary of a wikipedia page, to the last full stop within the first 500 characters.
This can be used to give a brief overview of a page to the agent.
Args:
page (WikipediaPage): The Wikipedia page object.
Returns:
str: The summary of the Wikipedia page.
"""
page = page if page else self.goal_page
summary = page.content[:500]
last_period_index = summary.rfind(".")
return summary[: last_period_index + 1] if last_period_index != -1 else summary
# Get and check permitted links
def get_permitted_links(self) -> list[str]:
"""
Returns a list of permitted links (i.e. links in the main page content) for the current page.
Returns:
list[str]: The permitted links.
"""
all_links = self.current_page.links
content_lower = self.current_page.content.lower()
permitted_links = [link for link in all_links if link.lower() in content_lower]
if self.current_page.title in permitted_links:
permitted_links.remove(self.current_page.title)
return permitted_links
def is_permitted_link(self, link: str) -> bool:
"""
Returns True if the link is in the permitted links for the current page, False otherwise.
Args:
link (str): The link to check.
Returns:
bool: True if the link is permitted, False otherwise
"""
return link.lower() in (x.lower() for x in self.get_permitted_links())
# ========================= Task State Management (given) =========================
def check_win(self) -> bool:
return self.current_page == self.goal_page
Exercise - Build tools for the WikiGame
The basic WikiAgent will need these two tools to play the game:
1. GetContentTool: This returns the full content of the current page, with all the wikipedia links wrapped in <link></link> tags (as otherwise they are presented as strings and indistinguishable from normal text). Implementing this involves dealing with annoying regex, so we've provided the regex necessary to wrap links with link tags in the hint below. If you'd like an extra challenge, you can try and work it out yourself, but it's really not crucial to understanding any of the content today.
MovePageTool: This executes moving to a given new page when called and updates theWikiGametask state if successful. You should implement both theexecute()function and thedescription()property.
When formatting this tool list, refer back to your code for the arithmetic game, or the OpenAI function-calling docs here.
Why not just use WikipediaPage.links() to get a list of links directly?
We don't just present a list of the accessible links, as this is not very faithful to the wikipedia game. The agent does perform somewhat better if we just give it a list of links, but the task of parsing the content of wikipedia pages and isolating the most important links is big part of the challenge of the wikipedia game.
Caveat for the GetContentTool
The GetContentTool wraps all the texts that correspond to links in <link></link> tags. However, since we identify links in the text via their names on wikipedia pages, there are certain articles that will never (or only very rarely) get flagged as links. For example, the page "Python (programming language)" is almost never referenced by its title, instead its almost always referenced as just "Python"; the same is true for cities and towns, which often have names such as e.g. "Juneau, Alaska", but these are almost always referred to as just "Juneau" in the articles where they appear. For this reason, you should avoid having goal pages which are likely to be referenced by a different string than their title.
@tool
def GetContentTool(game: WikiGame) -> Tool:
async def execute() -> str:
"""
Get all the content for the wikipedia page you are currently on. Anything which corresponds to a link is wrapped in <link></link> tags.
Args:
None
Returns:
str: The content of the page with any accessible links wrapped in <link></link> tags
"""
raise NotImplementedError("You need to implement the GetContentTool")
return execute
@tool
def MovePageTool(game: WikiGame) -> Tool:
async def execute(page: str) -> str:
"""
Move to a new wikipedia page by clicking on a link in the current page content. Modifies the game state in place.
Args:
page: The title of the page you want to move to. This must be accessible from the current page (and be a different page), or the move will fail.
Returns:
str: A message indicating whether the move was successful
"""
raise NotImplementedError("You need to implement the MovePageTool")
return execute
Hint: Regex for wrapping links
The code below describes how to wrap links, where content is the wikipedia page content, and permitted_links is the list of permitted links returned by our function earlier.
for word in sorted(permitted_links, key=len, reverse=True):
content = re.sub(
r"""(\s|[,.)!?;:'"])(""" + re.escape(word) + r""")(\s|[,.)!?;:'"s])""",
r"\1<link>\2</link>\3",
content,
count=1,
flags=re.IGNORECASE,
)
Solution
@tool
def GetContentTool(game: WikiGame) -> Tool:
async def execute() -> str:
"""
Get all the content for the wikipedia page you are currently on. Anything which corresponds to a link is wrapped in <link></link> tags.
Args:
None
Returns:
str: The content of the page with any accessible links wrapped in <link></link> tags
"""
content = game.current_page.content
permitted_links = get_permitted_links(game.current_page)
for word in sorted(permitted_links, key=len, reverse=True):
content = re.sub(
r"""(\s|[,.)!?;:'"])(""" + re.escape(word) + r""")(\s|[,.)!?;:'"s])""",
r"\1<link>\2</link>\3",
content,
count=1,
flags=re.IGNORECASE,
)
return content
return execute
@tool
def MovePageTool(game: WikiGame) -> Tool:
async def execute(page: str) -> str:
"""
Move to a new wikipedia page by clicking on a link in the current page content. Modifies the game state in place.
Args:
page: The title of the page you want to move to. This must be accessible from the current page (and be a different page), or the move will fail.
Returns:
str: A message indicating whether the move was successful
"""
page_no_underscore = page.replace("_", " ")
if game.is_permitted_link(page):
new_page = game.get_page(page)
game.current_page = new_page
return "Move successful"
elif game.is_permitted_link(page_no_underscore):
new_page = game.get_page(page_no_underscore)
return "Move successful"
else:
return "Move failed, link not permitted. Remember you can only move to pages which are wrapped in <link></link> tags in the content you retrieved using the GetContentTool."
return execute
Exercise - Build a WikiAgent
We will now build a WikiAgent that can use these tools to solve the WikiGame. We want to build the agent so that it can be called via an agent loop using the execute() method, similar to the one we had for the arithmetic game.
There are a few further considerations in this case that we didn't have for the arithmetic game.
Context window constraint
Since Wikipedia articles could be very long, the length of the LLM's context window becomes a constraint. GPT-4o and GPT-4o-mini both have context windows of 128k tokens (which corresponds to ~96k words). For reference, the wikipedia page for the United States has around 10k words alone and the agent will often need to visit more than 10 articles in one run of the game, not counting its own output, which can add up to be significant.
We'll solve this for now by simply resetting the messages of the agent every time it reaches a new wikipedia page, and providing an updated set of instructions, so the agent can locate itself in the game. We'll cover different methods for addressing this issue later, you can probably already think of some. So be careful to include the current page and goal page for the agent in the instruction. For this reason you should write a _reset_history() function in the agent's definition.
Variables and methods of this function
We've implemented one method for this function which isn't very conceptually important, this is:
reset_history()- this is useful for dealing with context window constraints as mentioned above. This method should be called each time the model successfully moves to a new page, to clear its current chat history.
You'll need to implement the following variables:
- system_instruction - this should return the system message as a ChatMessageSystem which we'll give to the model as it attempts the task. It should tell the model what the WikiGame is, and only the basics of how to play it.
-
on_page_instruction- this should be a user message as aChatMessageUsertelling the model the specific page it is on, and the page it should try to reach. -
next_step_instruction- this should be a user message, also as aChatMessageUser, that prompts the model to take its next action. This should be passed to the model after each time the model makes a tool call.
You'll also need to implement the following methods
-
_start()- This should load the initialsystem_instructionandon_page_instructioninto thestate.messages, so that the model can start theWikiGamewith the necessary instructions to know what to do. -
instruction_refresh()- this function should reset the instruction variables we defined. Then we can call it whenever we move page and update our WikiGame class. This should just reset the variables. -
execute()- This function should contain a loop that consists of our main agent logic. Within this loop, we should be making a call to the LLM API, and using the methods from inspect, or from the rest of the function to handle the response, whether the response is a tool call, or purely a text response. -
_handle_tool_calls()- This function should execute the model's tool calls, append the results to the message history, and perform any necessary post-tool processing (like refreshing instructions after a page move)
@agent
def WikiAgent(tools: list[Tool], game: WikiGame):
system_instruction =
on_page_instruction =
next_step_instruction =
raise NotImplementedError("You need to implement the prompts for the WikiAgent")
async def instruction_refresh() -> None:
nonlocal system_instruction, on_page_instruction, next_step_instruction
raise NotImplementedError("You need to implement the instruction_refresh function")
async def _reset_history(state: AgentState):
state.messages = []
state = await _start(state)
return state
async def _start(state: AgentState) -> AgentState:
raise NotImplementedError("You need to implement the _start function")
async def _handle_tool_calls(state: AgentState) -> AgentState:
raise NotImplementedError("You need to implement the _handle_tool_calls function")
async def execute(state: AgentState) -> AgentState:
raise NotImplementedError("You need to implement the execute function")
return execute
Solution
@agent
def WikiAgent(tools: list[Tool], game: WikiGame):
system_instruction = ChatMessageSystem(
content="You are a wikipedia-racing AI. Your aim is to reach the goal page by accessing links from a series of wikipedia pages."
)
on_page_instruction = ChatMessageUser(
content=f"You are currently on page: {game.current_page.title}. Your goal page is {game.goal_page.title}."
)
next_step_instruction = ChatMessageUser(content="What will you do next?")
async def instruction_refresh() -> None:
nonlocal system_instruction, on_page_instruction, next_step_instruction
system_instruction = ChatMessageSystem(
content="You are a wikipedia-racing AI. Your aim is to reach the goal page by accessing links from a series of wikipedia pages."
)
on_page_instruction = ChatMessageUser(
content=f"You are currently on page: {game.current_page.title}. Your goal page is {game.goal_page.title}."
)
next_step_instruction = ChatMessageUser(content="What will you do next?")
async def _reset_history(state: AgentState):
state.messages = []
state = await _start(state)
return state
async def _start(state: AgentState) -> AgentState:
state.messages.append(system_instruction)
state.messages.append(on_page_instruction)
return state
async def _handle_tool_calls(state: AgentState) -> AgentState:
messages, state.output = await execute_tools(messages=state.messages, tools=tools)
state.messages.extend(messages)
if state.output.message.tool_calls[0].function == "MovePageTool" and "success" in messages[-1].content.lower():
await instruction_refresh()
state = await _reset_history(state)
return state
async def execute(state: AgentState) -> AgentState:
success = False
state = await _start(state)
while not success:
state.messages.append(next_step_instruction)
state.output = await get_model().generate(
input=state.messages,
tools=tools,
)
state.messages.append(state.output.message)
if state.output.message.tool_calls:
state = await _handle_tool_calls(state)
if game.check_win():
success = True
return state
return execute
Exercise - Run the task
Now, similar to how we ran the arithmetic_agent, use the eval function to run the wikipedia_agent on the task below. This time is slightly different, as we need to define our tool_list, since our tools take our game as an argument (and we need to make sure they're accessing the same game as the agent, otherwise their execution will be incorrect). We also make sure we include a message_limit, (40 should be a fine message limit to start with) so that the agent doesn't run forever.
game = WikiGame("Python (programming language)", "Artificial intelligence")
# Use the eval function to evaluate your WikiAgent on a task where it has to get from the "Python (programming language)" page to the "Artificial intelligence" page.
Solution
game = WikiGame("Python (programming language)", "Artificial intelligence")
tool_list = [GetContentTool(game), MovePageTool(game)]
@task
def wiki_task() -> Task:
return Task(dataset=[Sample(input="", target="")], message_limit=20)
eval(
solver=as_solver(WikiAgent(tools=tool_list, game=game)),
tasks=wiki_task(),
)
Your agent should be able to accomplish the following tasks. If the agent fails on the first try, then run the agent again (we've tried to cut down on random behaviour by the agents by setting the temperature to 0, however OpenAI's models retain some randomness at temperature 0 which compounds as they proceed through the task).
game_1 = WikiGame("Elizabeth I", "United States")
tool_list = [GetContentTool(game_1), MovePageTool(game_1)]
@task
def wiki_task() -> Task:
return Task(dataset=[Sample(input="", target="")], message_limit=80)
eval(
solver=as_solver(WikiAgent(tools=tool_list, game=game_1)),
tasks=wiki_task(),
)
game_2 = WikiGame("County Seat", "Saint Pierre and Miquelon")
tool_list = [GetContentTool(game_2), MovePageTool(game_2)]
@task
def wiki_task() -> Task:
return Task(dataset=[Sample(input="", target="")], message_limit=80)
eval(
solver=as_solver(WikiAgent(tools=tool_list, game=game_2)),
tasks=wiki_task(),
)