3️⃣ Building a more complex agent: WikiGame
Learning Objectives
- Understand how to build a more complex agent that implements dynamic decision-making
- Observe the failure modes of a more complex agent
The task in this section simulates the Wikipedia game, where the agent starts on one Wikipedia page and attempts to navigate to a goal page using only links found in the main content of Wikipedia pages. Compared to the previous sections, the main challenge here is to implement dynamic decision making while parsing noisy and imperfect outputs.
Quick intro to the Wikipedia API
Our agent will interact with Wikipedia by making tool calls to the Wikipedia API. We will only need to learn the following key functions from the Wikipedia API to be able to implement the basic dynamics of the game.
wikipedia.page()- Returns aWikipediaPageobject, which contains various attributes and methods to access page content. (See page docs for these attributes.)wikipediaPage.title- Returns the title of the pagewikipediaPage.content- Returns the full text content of the page (this can be very long, make sure to take snippets when possible to not use up the context window of the LLM)wikipediaPage.summary- Returns a summary of the page (i.e. the introductory text of the Wikipage before the first section title).wikipediaPage.links- Returns a list of all links as strings
Aside: Wikipedia API content can be weird!
The wikipedia API often outputs content in unintuitive ways. For example, articles that are essentially just a big list become near useless, since the content usually omits the list (for example, see the wikipedia API content for List of countries and dependencies by population). Another issue that you might encounter is that the API formats mathematical expressions in $\LaTeX$ quite poorly (for example, see the wikipedia API content for Kullback-Leibler divergence). This is why it's important to determine what content the wikipedia API produces when .content is called — and why you want to make sure you're testing a large diversity of wikipedia articles.
Aside: Wikipedia "summaries" can be long!
The wikipedia API accesses summaries of pages by presenting all the information before the first titled section. For certain (generally obscure) wikipedia pages, this summary itself can be extremely long, and contain lots of information that is unnecessary to determine the key information about the page the model should be trying to access. We'll handle this later when it comes up by truncating wikipedia's summary to just the first ~500 characters
Run the following code to see how these wikipedia API functions work!
# Retrieve a Wikipedia page from its title
page = wikipedia.page("Large language model")
# Access basic page information
print("Title:", page.title)
print("\nURL", page.url)
print(f"\nSummary (word count {len(page.summary.split())}):", page.summary)
print(
f"\nContent (word count {len(page.content.split())}):",
page.content[:1000],
"......",
)
print(f"""\nLinks (link count {len(page.links)}): [{", ".join(page.links[:7])}, ......]""")
Click to see the output of this code (the wikipedia page might have changed slightly)
Title: Large language modelURL https://en.wikipedia.org/wiki/Large_language_model
Summary (word count 95): A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text. The largest and most capable LLMs are generative pretrained transformers (GPTs). Modern models can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in.
Content (word count 6887): A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text. The largest and most capable LLMs are generative pretrained transformers (GPTs). Modern models can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in.
== History ==
Before 2017, there were a few language models that were large as compared to capacities then available. In the 1990s, the IBM alignment models pioneered statistical language modelling. A smoothed n-gram model in 2001 trained on 0.3 billion words achieved state-of-the-art perplexity at the time. In the 2000s, as Internet use became prevalent, some rese ......
Links (link count 524): [15.ai, AI-complete, AI explainability, API, Action selection, Activation function, Active learning (machine learning), ......]
The following two cell blocks cause an error when run (you should see a DisambiguationError for the first, and a PageError for the second). These are common errors that LLMs can encounter when moving between wikipedia pages, and so we'll need to find a way to handle them:
try:
page = wikipedia.page("Python")
except DisambiguationError as e:
print(type(e), "\n\n", e)
Click to see the output of this code
<class 'wikipedia.exceptions.DisambiguationError'>"Python" may refer to: Pythonidae Python (genus) Python (mythology) Python (programming language) CMU Common Lisp PERQ 3 Python of Aenus Python (painter) Python of Byzantium Python of Catana Python Anghelo Python (Efteling) Python (Busch Gardens Tampa Bay) Python (Coney Island, Cincinnati, Ohio) Python (automobile maker) Python (Ford prototype) Python (missile) Python (nuclear primary) Colt Python Python (codename) Python (film) Monty Python Python (Monty) Pictures Timon of Phlius Pithon Pyton c:\Users\calsm\anaconda3\envs\arena\Lib\site-packages\wikipedia\wikipedia.py:389: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 389 of the file c:\Users\calsm\anaconda3\envs\arena\Lib\site-packages\wikipedia\wikipedia.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.
lis = BeautifulSoup(html).find_all('li')
try:
page = wikipedia.page("Animalss", auto_suggest=False)
except Exception as e:
print(type(e), "\n\n", e)
Click to see the output of this code
<class 'wikipedia.exceptions.PageError'>Page id "Animalss" does not match any pages. Try another id!
We can handle these errors using the following code:
# Fixes PageError by allowing redirects
page = wikipedia.page("Animalss", redirect=True)
print(page.title)
# Fixes DisambiguationError by selecting the first option
try:
page = wikipedia.page("Python")
except DisambiguationError as e:
page = wikipedia.page(e.options[0])
print(page.title)
The errors above are:
-
DisambiguationError: This was raised because the title "Python" can correspond to multiple pages. Whenever this error is raised, we get a list of options that Wikipedia suggests we could mean, and so we choose the first. -
PageError: This was raised for "Animalss" as there is no Wikipedia page with that title. We can usually avoid these by settingredirect = Trueand allowing Wikipedia to redirect us.
We have implemented a simple function get_page() for you to get the WikipediaPage object for a particular page title with error handling.
def get_page(title: str) -> WikipediaPage:
"""
Get a Wikipedia page object given a title. If the title is ambiguous, choose the first option.
If the title is not found, try to find a similar title.
Args:
title (str): The title of the Wikipedia page
Returns:
WikipediaPage: The Wikipedia page
"""
try:
return wikipedia.page(title, auto_suggest=False, redirect=True)
except DisambiguationError as e:
return wikipedia.page(e.options[0], auto_suggest=False, redirect=True)
except PageError:
return wikipedia.page(title, auto_suggest=True, redirect=True)
What do the kwargs redirect and auto_suggest in wikipedia.page() do?
redirect
- This kwarg enables redirecting when you reference an article title with slight differences to how it is stored in Wikipedia. For example, the Wikipedia API will generally access the correct page if there is a capitalization error on the first letter, but not for capitalization errors in the middle of the word if redirect = False:
# This returns a WikipediaPage object for the "Human" page
page = wikipedia.page("huMan", redirect = True, auto_suggest=False)
# This raises a PageError since there is no page called "huMan"
page = wikipedia.page("huMan", redirect=False, auto_suggest=False)
- By default, we should set redirect = True in the wikipedia.page() function.
auto_suggest
- This kwarg enables the API to provide suggestions. This allows a lot more than redirect does, since redirect is only for the "obvious" cases (e.g. "huMan" → "Human", "U.S. President" → "President of the United States", etc.). When auto_suggest is true, it would allow something like "president of states" → "President of the United States", "gogle" → "Google"; both of which would raise an error if redirect = True, auto_suggest = False.
- However, auto_suggest can sometimes be too permissive and lead to errors. For example, the below code will return a WikipediaPage object for the "Man" page. This is clearly not what we were trying to access, and the auto_suggest has gotten carried away in this case:
page = wikipedia.page("Human", redirect= False, auto_suggest=True)
- If redirect = True and auto_suggest=True, then auto_suggest takes priority.
- By default, we should set auto_suggest to False unless it is used as a last resort to resolve an error!
Exercise - Implement get_permitted_links()
```yaml Difficulty: 🔴⚪⚪⚪⚪ Importance: 🔵🔵⚪⚪⚪
You should spend up to ~10 mins on this exercise. ```
This is a quick exercise to familarize you with the Wikipedia API.
When you get the links from a page using page.links, this will include every possible Wikipedia link that is accessible from the HTML on that page, including those that are not in the main page content (e.g. links in sidebars, links in footnotes etc.), which are irrelevant or not permitted by the rules of the Wiki game.
Write a simple get_permitted_links() function. This should only return the links that can be found inside the main content. The resulting list of permitted links should be about a third as long as the list of links from page.links (although it varies slightly by page).
def get_permitted_links(current_page: WikipediaPage) -> list[str]:
"""
Get "permitted" links (i.e. links that are in the content of the page) from a Wikipedia page.
Args:
current_page (WikipediaPage): The current Wikipedia page
Returns:
list[str]: A list of permitted links from current_page
"""
raise NotImplementedError("You need to implement the get_permitted_links function")
tests.test_get_permitted_links(get_permitted_links)
Solution
def get_permitted_links(current_page: WikipediaPage) -> list[str]:
"""
Get "permitted" links (i.e. links that are in the content of the page) from a Wikipedia page.
Args:
current_page (WikipediaPage): The current Wikipedia page
Returns:
list[str]: A list of permitted links from current_page
"""
all_links = current_page.links
content_lower = current_page.content.lower()
permitted_links = [link for link in all_links if link.lower() in content_lower]
if current_page.title in permitted_links:
permitted_links.remove(current_page.title)
return permitted_links
LLM Agent for WikiGame

The WikiGame class
Below is the WikiGame class that instantiates the Wikipedia game. This class contains the following functionalities:
- Keeps track of task state variables (e.g. the current page, the page history)
- Task-specific helper functions for calling the Wikipedia API.
The implementation of this class has been provided for you, but you should read and understand how they're being implemented, and what they do:
WikiGameinitialises with 4 variables:starting_pagewhich gives the page that the agent should begin on.goal_pagewhich gives the page that the agent should aim to get to.current_pagewhich tracks the current page that the agent is on.page_historywhich tracks all the pages the agent has visited in the game (initially only thestarting_page).
It also comes with 4 methods:
- get_page() which takes the title of a Wikipedia page as a string, and returns the WikipediaPage object associated to this tile (this is the same as the get_page() function we introduced earlier).
- get_permitted_links() which gets the permitted links from WikiGame.current_page. You can replace this function with your solution to the get_permitted_links exercise earlier if you prefer.
- is_permitted_link() which takes a link name, and returns True if this link is a permitted link, and False otherwise.
- check_win() which returns self.current_page == self.goal_page, corresponding to whether the agent has won the game.
Providing information to the agent
While models are trained on most of the Wikipedia content, a particular page may still be confused with something else, or be an article that was added after the training cutoff. Models also can't generally recall information in their training data if they only come up once or twice (as is often the case for obscure wikipedia articles). So you should use the game's get_summary() function to provide details of the goal page to the agent in its initial message.
class WikiGame:
def __init__(
self,
starting_page: str,
goal_page: str,
):
"""
This task simulates the Wikipedia game, where the agent starts on one Wikipedia page and
attempts to navigate to a goal page using only links found in the main content of Wikipedia
pages.
Args:
starting_page (str): The page the agent starts on.
goal_page (str): The page the agent is trying to reach.
Attributes:
page_history (list[str]): The history of pages visited by the agent.
starting_page (WikipediaPage): The starting page of the game.
goal_page (WikipediaPage): The goal page of the game.
current_page (WikipediaPage): The current page the agent is on.
"""
self.page_history: list[str] = [starting_page]
self.starting_page: WikipediaPage = self.get_page(starting_page)
self.goal_page: WikipediaPage = self.get_page(goal_page)
self.current_page: WikipediaPage = self.starting_page
# ========================= Helper Functions (given) =========================
# Get page and page summary
@staticmethod
def get_page(title: str) -> WikipediaPage:
"""
Get a Wikipedia page object given a title. If the title is ambiguous, choose the first
option. If the title is not found, try to find a similar title.
Args:
title (str): The title of the Wikipedia page
Returns:
WikipediaPage: The Wikipedia page
"""
try:
return wikipedia.page(title, auto_suggest=False, redirect=True)
except DisambiguationError as e:
return wikipedia.page(e.options[0], auto_suggest=False, redirect=True)
except PageError:
return wikipedia.page(title, auto_suggest=True, redirect=True)
def get_page_summary(self, page: WikipediaPage | None = None) -> str:
"""
Get summary of a wikipedia page, to the last full stop within the first 500 characters.
This can be used to give a brief overview of a page to the agent.
Args:
page (WikipediaPage): The Wikipedia page object.
Returns:
str: The summary of the Wikipedia page.
"""
page = page if page else self.goal_page
summary = page.content[:500]
last_period_index = summary.rfind(".")
return summary[: last_period_index + 1] if last_period_index != -1 else summary
# Get and check permitted links
def get_permitted_links(self) -> list[str]:
"""
Returns a list of permitted links (i.e. links in the main page content) for the current page.
Returns:
list[str]: The permitted links.
"""
all_links = self.current_page.links
content_lower = self.current_page.content.lower()
permitted_links = [link for link in all_links if link.lower() in content_lower]
if self.current_page.title in permitted_links:
permitted_links.remove(self.current_page.title)
return permitted_links
def is_permitted_link(self, link: str) -> bool:
"""
Returns True if the link is in the permitted links for the current page, False otherwise.
Args:
link (str): The link to check.
Returns:
bool: True if the link is permitted, False otherwise
"""
return link.lower() in (x.lower() for x in self.get_permitted_links())
# ========================= Task State Management (given) =========================
def check_win(self) -> bool:
return self.current_page == self.goal_page
Exercise - Build tools for the WikiGame
```yaml Difficulty: 🔴🔴⚪⚪⚪ Importance: 🔵🔵🔵⚪⚪
You should spend up to 15-20 mins on this exercise. ```
The basic WikiAgent will need these two tools to play the game:
1. GetContentTool: This returns the full content of the current page, with all the wikipedia links wrapped in <link></link> tags (as otherwise they are presented as strings and indistinguishable from normal text). As implementing this involves dealing with annoying regex, we have done this for you, but you should fill in the description() property.
2. MovePageTool: This executes moving to a given new page when called and updates the WikiGame task state if successful. You should implement both the execute() function and the description() property.
When formatting this tool list, refer back to your code for the arithmetic game, or the OpenAI function-calling docs here.
Why not just use WikipediaPage.links() to get a list of links directly?
We don't just present a list of the accessible links, as this is not very faithful to the wikipedia game. The agent does perform somewhat better if we just give it a list of links, but the task of parsing the content of wikipedia pages and isolating the most important links is big part of the challenge of the wikipedia game.
Caveat for the GetContentTool
The GetContentTool wraps all the texts that correspond to links in tags. However, since we identify links in the text via their names on wikipedia pages, there are certain articles that will never (or only very rarely) get flagged as links. For example, the page "Python (programming language)" is almost never referenced by its title, instead its almost always referenced as just "Python"; the same is true for cities and towns, which often have names such as e.g. "Juneau, Alaska", but these are almost always referred to as just "Juneau" in the articles where they appear. For this reason, you should avoid having goal pages which are likely to be referenced by a different string than their title.
class GetContentTool:
"""
The GetContentTool retrieves the full content of the current Wikipedia page, marking all
accessible links within the main content by wrapping them in <link></link> tags.
This is an example of a tool that provides the agent with detailed page content to enable
reasoning about possible next steps in the Wikipedia game.
"""
name = "get_content"
@staticmethod
def execute(task: WikiGame) -> str:
"""
Get all the content for the wikipedia page you are currently on. Anything which corresponds
to a link is wrapped in <link></link> tags.
Args:
task (WikiGame): The current task object.
Returns:
str: The content of the page with links wrapped
"""
content = task.current_page.content
permitted_links = get_permitted_links(task.current_page)
for word in sorted(permitted_links, key=len, reverse=True):
content = re.sub(
r"""(\s|[,.)!?;:'"])(""" + re.escape(word) + r""")(\s|[,.)!?;:'"s])""",
r"\1<link>\2</link>\3",
content,
count=1,
flags=re.IGNORECASE,
)
return content
@property
def description(self):
"""
Provides the description of the getContent tool
Returns:
dict: The description of the tool for the API
"""
raise NotImplementedError("You need to implement the description property for the GetContentTool")
class MovePageTool:
"""
The MovePageTool allows the agent to navigate to a different Wikipedia page using a valid link
found in the current page content.
This is an example of a tool that modifies the task state dynamically based on inputs from the
agent.
"""
name = "move_page"
@staticmethod
def execute(new_page: str, task: WikiGame) -> str:
"""
Changes your current page to a specified new page which is accessible via a link from the
current page. You can only call this function once at a time, as it will take you to a
different page.
Args:
task (WikiGame): The current task object.
new_page (str): The title of the new page to move to.
Returns:
str: A message indicating the result of the move
"""
raise NotImplementedError("You need to implement the execute method for the MovePageTool")
@property
def description(self):
"""
Provides the description of the move_page tool
Returns:
dict: The description of the move_page tool for the API
"""
raise NotImplementedError("You need to implement the description property for the MovePageTool")
tests.test_get_content_tool(GetContentTool)
tests.test_move_page_tool(MovePageTool)
GetContentTool_inst = GetContentTool()
MovePageTool_inst = MovePageTool()
wiki_game_tools = [GetContentTool_inst, MovePageTool_inst]
Solution
class GetContentTool:
"""
The GetContentTool retrieves the full content of the current Wikipedia page, marking all
accessible links within the main content by wrapping them in <link></link> tags.
This is an example of a tool that provides the agent with detailed page content to enable
reasoning about possible next steps in the Wikipedia game.
"""
name = "get_content"
@staticmethod
def execute(task: WikiGame) -> str:
"""
Get all the content for the wikipedia page you are currently on. Anything which corresponds
to a link is wrapped in <link></link> tags.
Args:
task (WikiGame): The current task object.
Returns:
str: The content of the page with links wrapped
"""
content = task.current_page.content
permitted_links = get_permitted_links(task.current_page)
for word in sorted(permitted_links, key=len, reverse=True):
content = re.sub(
r"""(\s|[,.)!?;:'"])(""" + re.escape(word) + r""")(\s|[,.)!?;:'"s])""",
r"\1<link>\2</link>\3",
content,
count=1,
flags=re.IGNORECASE,
)
return content
@property
def description(self):
"""
Provides the description of the getContent tool
Returns:
dict: The description of the tool for the API
"""
return {
"type": "function",
"function": {
"name": self.name,
"description": "Get all the content for the wikipedia page you are currently on. Anything which corresponds to a link you can select to move to will be wrapped in <link></link> tags.",
"parameters": {
"type": "object",
"properties": {},
"required": [],
},
},
}
class MovePageTool:
"""
The MovePageTool allows the agent to navigate to a different Wikipedia page using a valid link
found in the current page content.
This is an example of a tool that modifies the task state dynamically based on inputs from the
agent.
"""
name = "move_page"
@staticmethod
def execute(new_page: str, task: WikiGame) -> str:
"""
Changes your current page to a specified new page which is accessible via a link from the
current page. You can only call this function once at a time, as it will take you to a
different page.
Args:
task (WikiGame): The current task object.
new_page (str): The title of the new page to move to.
Returns:
str: A message indicating the result of the move
"""
new_page_normalized = new_page.replace("_", " ")
if (
task.is_permitted_link(new_page_normalized)
and get_page(new_page_normalized).title != task.current_page.title
):
task.current_page = task.get_page(new_page_normalized)
return f"Moving page to {task.current_page.title}"
else:
return f"Couldn't move page to {new_page}. This is not a valid link."
@property
def description(self):
"""
Provides the description of the move_page tool
Returns:
dict: The description of the move_page tool for the API
"""
return {
"type": "function",
"function": {
"name": self.name,
"description": "Changes your current page to a specified new page which is accessible via a link from the current page. You can only call this function once at a time, as it will take you to a different page.",
"parameters": {
"type": "object",
"properties": {
"new_page": {
"type": "string",
"description": 'The title of the new page you want to move to. This should be formatted the way the title appears on wikipedia (e.g. to move to the wikipedia page for the United States of America, you should enter "United States"). Underscores are not necessary.',
}
},
"required": ["new_page"],
},
},
}
Exercise - Build a WikiAgent
```yaml Difficulty: 🔴🔴🔴🔴⚪ Importance: 🔵🔵🔵🔵🔵
You should spend up to 30-60 mins on this exercise. ```
We will now build a WikiAgent that can use these tools to solve the WikiGame. We want to build the agent so that it can be called via an agent loop using the run() method, similar to the one we had for the arithmetic game.
There are a few further considerations in this case that we didn't have for the arithmetic game.
Context window constraint
Since Wikipedia articles could be very long, the length of the LLM's context window becomes a constraint. GPT-4o and GPT-4o-mini both have context windows of 128k tokens (which corresponds to ~96k words). For reference, the wikipedia page for the United States has around 10k words alone and the agent will often need to visit more than 10 articles in one run of the game, not counting its own output, which can add up to be significant.
We'll solve this for now by simply resetting the messages of the agent every time it reaches a new wikipedia page, and providing an updated set of instructions, so the agent can locate itself in the game. We'll cover different methods for addressing this issue later, you can probably already think of some. So be careful to include the current page and goal page for the agent in the instruction.
Since we'll reset the chat_history attribute of the agent class each time it reaches a new page, we'll also store a full_chat_history property that won't get reset, so we can access the entire run of the game.
Printing output
The WikiGame is a lot longer than the ArithmeticTask, with a much larger volume of agent, task and tool messages. If you don't want to see this output, you should use the verbose parameter to determine whether to print the output or not.
Methods of this class
We've implemented two methods for this class which aren't very conceptually important, these are:
- handle_refusal() - this only gets called if the ChatCompletionMessage has response.refusal == True, and adds the refusal to the chat history, and prints that the model has refused the request (in practice this virtually never happens).
reset_history()- this is useful for dealing with context window constraints as mentioned above. This method should be called each time the model moves to a new page, to clear it's current chat history.
You'll need to implement the following methods:
- system_instruction - this should return the system message we give to the model as it attempts the task. It should tell the model what the WikiGame is, and only the basics of how to play it.
-
on_page_instruction- this should return a user message telling the model the specific page it is on, and the page it should try to reach. -
next_step_instruction- this should return a user message that prompts the model to take its next action. This should be passed to the model after each time the model makes a tool call. -
update_history- this should update theWikiAgent.chat_historywith a message or list of messages (recall, a message can be adictif it's generated by us, or aChatCompletionMessageif its generated by the OpenAI API). -
start- This should load the initialsystem_instructionandon_page_instructioninto thechat_history, so that the model can start theWikiGamewith the necessary instructions to know what to do. -
handle_tool_calls- This function should be used whenever the model makes a tool call. You should use theexecute_tool_callsfunction thatWikiAgentinherits fromSimpleAgent.(use execute tool calls, add them to the chat history) -
run- This function should run "one loop" of the Wikipedia game. This should be making a call to the LLM API, and using the methods in the rest of the class to handle the response, whether the response is a tool call, or purely a text response.
class WikiAgent(SimpleAgent):
"""
WikiAgent is an LLM-based agent designed to navigate the Wikipedia game by integrating decision-
making & tool use. It inherits from SimpleAgent, but should be modified to work effectively in
the Wikipedia game context.
Attributes:
model (str): The model used for generating responses (inherited)
tools (list[Any]): List of tools (inherited)
client (OpenAI): OpenAI client for API calls (inherited)
task (WikiGame): The current task being executed
chat_history (list[dict]): History of interactions (inherited)
Inherited Methods:
get_response(use_tool: bool = True) -> ChatCompletionMessage:
Get response from the model (inherited)
execute_tool_calls(message: ChatCompletionMessage) -> list[str]:
Execute tool calls from the model's response (inherited)
Methods:
handle_refusal(response: ChatCompletionMessage):
Handle refusals from the model response (implemented below)
reset_history():
Empty self.chat_history of the agent (implemented below)
system_instruction() -> dict:
Generate instructions for the game. Formatted as a system prompt (to be implemented)
on_page_instruction() -> dict:
Tell the agent what page they are on, and what page they are trying to get to. Formatted
as a user prompt (to be implemented)
next_step_instruction() -> dict:
Ask the agent "What's your next step?" after making a tool call. Formatted as a user
prompt (to be implemented)
handle_tool_calls(response: ChatCompletionMessage):
Handle tool calls from the model response (to be implemented)
start():
Put the starting instructions in agent.chat_history when the agent starts a new page or
starts the game (to be implemented)
run(with_tool: bool = True) -> bool:
Run one loop of the Wikipedia agent (to be implemented)
"""
def __init__(
self,
task: WikiGame,
tools: list[Any],
model="gpt-4o-mini",
chat_history: list[dict] = None,
verbose: bool = True,
temperature=1,
):
super().__init__(model=model, tools=tools, task=task, temperature=temperature)
self.chat_history = chat_history if chat_history else []
self.full_chat_history = chat_history if chat_history else []
# ^ All messages that have been sent in the chat history.
# We have to erase each time a new page is reached for context window reasons.
self.verbose = verbose
self.start()
def handle_refusal(self, response: ChatCompletionMessage):
"""
Handles refusals in the wikipedia game context:
Args:
response (ChatCompletionMessage): The response from the model
"""
self.update_history({"role": "assistant", "content": response.refusal})
if self.verbose:
print(f"\nMODEL REFUSAL: {response.refusal}")
def reset_history(self):
"""
Empty self.chat_history of the agent.
"""
self.chat_history = []
# ========================= Prompting (to implement) =========================
@property
def system_instruction(self) -> dict:
"""
Generate the starting instructions for the game, formatted as a system prompt.
Returns:
dict: The starting instructions.
"""
raise NotImplementedError("You need to implement the system_instruction property")
@property
def on_page_instruction(self) -> dict:
"""
Tell the agent what page they are on and give a summary of the page, formatted as a user prompt.
Returns:
dict: The instructions for the current page.
"""
raise NotImplementedError("You need to implement the on_page_instruction property")
@property
def next_step_instruction(self) -> dict:
"""
Ask the agent "What's your next step?" after making a tool call, formatted as a user prompt.
Returns:
dict: The instructions for the next step.
"""
raise NotImplementedError("You need to implement the next_step_instruction property")
# ========================= Chat History Management (to implement) =========================
def update_history(
self,
message: dict[str, str]
| ChatCompletionMessage
| list[dict[str, str] | ChatCompletionMessage],
):
"""
Update self.chat_history and self.full_chat_history with a message or list of messages.
Args:
message: The message(s) to add to the chat history
"""
raise NotImplementedError("You need to implement the update_history method")
def start(self):
"""
A function to put the starting instructions in agent.chat_history when the agent starts a
new page or starts the game.
"""
raise NotImplementedError("You need to implement the start method")
# ========================= Task Execution (to implement) =========================
def handle_tool_calls(self, response: ChatCompletionMessage):
"""
Handles tool_calls in the wikipedia game context:
- Adds the model response to the chat_history
- Executes the tool calls using execute_tool_calls
- Appends the tool responses to the chat_history
- If the agent has moved to a new page:
- Reset the chat_history, and call start()
- Otherwise
- Get the next_step_message instruction from the task and append it to chat_history
Args:
response (ChatCompletionMessage): The response from the model
"""
raise NotImplementedError("You need to implement the handle_tool_calls method")
def run(self):
"""
This function runs the agent in the wikipedia game context. It:
- Gets the current task instruction
- Gets the response from the model
- Handles the response in the cases:
- tool calls (using handle_tool_calls)
- refusals (using handle_refusal)
- no tool calls (using update_history)
"""
raise NotImplementedError("You need to implement the run method")
tests.test_wiki_agent(WikiAgent)
Solution
class WikiAgent(SimpleAgent):
"""
WikiAgent is an LLM-based agent designed to navigate the Wikipedia game by integrating decision-
making & tool use. It inherits from SimpleAgent, but should be modified to work effectively in
the Wikipedia game context.
Attributes:
model (str): The model used for generating responses (inherited)
tools (list[Any]): List of tools (inherited)
client (OpenAI): OpenAI client for API calls (inherited)
task (WikiGame): The current task being executed
chat_history (list[dict]): History of interactions (inherited)
Inherited Methods:
get_response(use_tool: bool = True) -> ChatCompletionMessage:
Get response from the model (inherited)
execute_tool_calls(message: ChatCompletionMessage) -> list[str]:
Execute tool calls from the model's response (inherited)
Methods:
handle_refusal(response: ChatCompletionMessage):
Handle refusals from the model response (implemented below)
reset_history():
Empty self.chat_history of the agent (implemented below)
system_instruction() -> dict:
Generate instructions for the game. Formatted as a system prompt (to be implemented)
on_page_instruction() -> dict:
Tell the agent what page they are on, and what page they are trying to get to. Formatted
as a user prompt (to be implemented)
next_step_instruction() -> dict:
Ask the agent "What's your next step?" after making a tool call. Formatted as a user
prompt (to be implemented)
handle_tool_calls(response: ChatCompletionMessage):
Handle tool calls from the model response (to be implemented)
start():
Put the starting instructions in agent.chat_history when the agent starts a new page or
starts the game (to be implemented)
run(with_tool: bool = True) -> bool:
Run one loop of the Wikipedia agent (to be implemented)
"""
def __init__(
self,
task: WikiGame,
tools: list[Any],
model="gpt-4o-mini",
chat_history: list[dict] = None,
verbose: bool = True,
temperature=1,
):
super().__init__(model=model, tools=tools, task=task, temperature=temperature)
self.chat_history = chat_history if chat_history else []
self.full_chat_history = chat_history if chat_history else []
# ^ All messages that have been sent in the chat history.
# We have to erase each time a new page is reached for context window reasons.
self.verbose = verbose
self.start()
def handle_refusal(self, response: ChatCompletionMessage):
"""
Handles refusals in the wikipedia game context:
Args:
response (ChatCompletionMessage): The response from the model
"""
self.update_history({"role": "assistant", "content": response.refusal})
if self.verbose:
print(f"\nMODEL REFUSAL: {response.refusal}")
def reset_history(self):
"""
Empty self.chat_history of the agent.
"""
self.chat_history = []
# ========================= Prompting (to implement) =========================
@property
def system_instruction(self) -> dict:
"""
Generate the starting instructions for the game, formatted as a system prompt.
Returns:
dict: The starting instructions.
"""
return {
"role": "system",
"content": "You are a wikipedia-racing AI. Your aim is to reach the goal page by accessing links from a series of wikipedia pages.",
}
@property
def on_page_instruction(self) -> dict:
"""
Tell the agent what page they are on and give a summary of the page, formatted as a user prompt.
Returns:
dict: The instructions for the current page.
"""
return {
"role": "user",
"content": f"You are currently on page: {self.task.current_page.title}. Your goal page is {self.task.goal_page.title}.",
}
@property
def next_step_instruction(self) -> dict:
"""
Ask the agent "What's your next step?" after making a tool call, formatted as a user prompt.
Returns:
dict: The instructions for the next step.
"""
return {"role": "user", "content": "What's your next step?"}
# ========================= Chat History Management (to implement) =========================
def update_history(
self,
message: dict[str, str]
| ChatCompletionMessage
| list[dict[str, str] | ChatCompletionMessage],
):
"""
Update self.chat_history and self.full_chat_history with a message or list of messages.
Args:
message: The message(s) to add to the chat history
"""
if isinstance(message, list):
self.chat_history.extend(message)
self.full_chat_history.extend(message)
else:
self.chat_history.append(message)
self.full_chat_history.append(message)
def start(self):
"""
A function to put the starting instructions in agent.chat_history when the agent starts a
new page or starts the game.
"""
instruction_messages = [
self.system_instruction,
self.on_page_instruction,
]
self.update_history(instruction_messages)
if self.verbose:
print(
f"\nSYSTEM: \n{instruction_messages[0]['content']} \n\nUSER: \n{instruction_messages[1]['content']}"
)
# ========================= Task Execution (to implement) =========================
def handle_tool_calls(self, response: ChatCompletionMessage):
"""
Handles tool_calls in the wikipedia game context:
- Adds the model response to the chat_history
- Executes the tool calls using execute_tool_calls
- Appends the tool responses to the chat_history
- If the agent has moved to a new page:
- Reset the chat_history, and call start()
- Otherwise
- Get the next_step_message instruction from the task and append it to chat_history
Args:
response (ChatCompletionMessage): The response from the model
"""
# Update history
self.update_history(response)
if self.verbose:
print(f"\nAssistant: \n{response.content}")
# Execute the tool calls
tool_responses = self.execute_tool_calls(response)
# Add tool calls and responses to the history
for tool_call, tool_response in zip(response.tool_calls, tool_responses):
self.update_history(apply_tool_call_format(tool_call, tool_response))
if self.verbose:
print(
f"\nTOOL CALL: \nTool = {tool_call.function.name}, Args = {tool_call.function.arguments} \nTOOL RESPONSE:\n {tool_response[:300]}"
)
# Move to new page if necessary
if any("Moving page" in tool_response for tool_response in tool_responses):
self.reset_history()
self.task.page_history.append(self.task.current_page.title)
if self.verbose:
print(
f"""{("-" 50)} \n\nMOVED PAGE \n\nPATH HISTORY (N={len(self.task.page_history)}): {" -> ".join(self.task.page_history)} \n\n{("-" 50)}"""
)
# Give starting instructions if moved to a new page
self.start()
# Otherwise ask the agent what the next step is
else:
next_step_message = self.next_step_instruction
self.update_history(next_step_message)
if self.verbose:
print(f"""\nUSER: \n{next_step_message["content"]}""")
def run(self):
"""
This function runs the agent in the wikipedia game context. It:
- Gets the current task instruction
- Gets the response from the model
- Handles the response in the cases:
- tool calls (using handle_tool_calls)
- refusals (using handle_refusal)
- no tool calls (using update_history)
"""
# Get the response from the model
response = self.get_response()
# Handle the response
## If tool calls, handle_tool_calls
if response.tool_calls:
self.handle_tool_calls(response)
## If no tool call: Handle edge cases
### Check if there's a refusal to answer:
elif response.refusal:
self.handle_refusal(response)
# Else response content does not contain tool calls or refusal, and we add it to the
# chat_history in an assistant format.
else:
self.update_history({"role": "assistant", "content": response.content})
if self.verbose:
print(f"\nMODEL RESPONSE: \n{response.content}")
Exercise - Run the task
```yaml Difficulty: 🔴🔴⚪⚪⚪ Importance: 🔵🔵⚪⚪⚪
You should spend up to 10-15 mins on this exercise. ```
Similar to the ArithmeticAgent, write an agent loop for the WikiAgent. You may want use a try/except block in this loop (as occasionally an error can be raised if the length of messages extends past the context window of gpt-4o-mini).
def agent_loop(agent, num_loops=10):
"""
Run the agent loop for a given number of loops
Args:
agent (WikiAgent): The agent to run
game (WikiGame): The game to play
num_loops (int): The number of loops to run
"""
raise NotImplementedError("You need to implement the agent_loop function")
Solution
def agent_loop(agent, num_loops=10):
"""
Run the agent loop for a given number of loops
Args:
agent (WikiAgent): The agent to run
game (WikiGame): The game to play
num_loops (int): The number of loops to run
"""
for i in range(num_loops):
if agent.task.check_win():
print("Success!")
return
try:
agent.run()
except Exception as e:
print(f"Error: {e}")
break
Your agent should be able to accomplish the following tasks. If the agent fails on the first try, then run the agent again (we've tried to cut down on random behaviour by the agents by setting the temperature to 0, however OpenAI's models retain some randomness at temperature 0 which compounds as they proceed through the task).
game_1 = WikiGame("Elizabeth I", "United States")
agent = WikiAgent(task=game_1, tools=wiki_game_tools, model="gpt-4o-mini", temperature=0)
agent_loop(agent, 30)
game_2 = WikiGame("County Seat", "Saint Pierre and Miquelon")
agent = WikiAgent(task=game_2, tools=wiki_game_tools, model="gpt-4o-mini", temperature=0)
agent_loop(agent, 30)
Once you've seen that the agent can accomplish the above, try out some different articles and spot the common failure modes the agent falls into. Try to think of and discuss some ways you could mitigate these failure modes.
We should also ensure that the messages that are printed while the agent runs are faithful to the actual chat history (it can be easy to make minor mistakes in the run() logic, or distributed across the variety of methods we're using, that mess up the agent's chat_history, which affects what the agent sees). In order to check this, you can run the following code to print the full_chat_history of the agent, which should contain every message the agent encountered as it worked through the task.
for message in agent.full_chat_history:
try:
if message["role"] == "tool" and message["name"] == "get_content":
print(f"{message['role']}:\n {message['content'][:200]} ...")
else:
print(f"{message['role']}:\n {message['content']}")
except:
print(f"{message.role}: {message.content}")