4️⃣ Elicitation

Learning Objectives

Understand the different methods of elicitation

Understand how to improve prompting, tools, history storage, and information access in LLM agents

You may have observed that while our initial implementation of WikiAgent succeeds at these relatively challenging games, if we increase the difficulty slightly, then the agent will fail (one possible example is the game: Joinery → Amethyst; our agent will usually fail on this task). However, this doesn't mean that GPT-4o-mini does not have the capability to perform better on this task, but this capability might be blocked because we:

Prompted the model poorly or ineffectively.
Stored and presented the task history poorly.
Didn't give the model sufficient tools to accomplish the task.

In general, it is hard to show that a model does not have a certain capability, even if we've failed to demonstrate this capability. For example, it took 3.5 years after the release of GPT-2 (and 1.5 years after the release of GPT-3) for people to discover that chain-of-thought reasoning massively improves model performance, which enabled the same models to complete significantly harder tasks. Dangerous capability evaluations for LLM agents require us to elicit the best capabilities possible, until we feel we've managed to gain evidence of absence, not just absence of evidence.

Broadly speaking, there are two categories of elicitation:

Narrow elicitation: Task-specific methods that improve model performance on a particular task or small class of tasks, but likely won't impact model performance in general across many tasks.
- E.g. A tool that gives the model access to the content of arbitrary wikipedia articles. This will improve performance on this task significantly, but wouldn't generalize to other tasks.
General elicitation: Task-agnostic methods that improve model performance on a wide array of possible tasks.
- E.g. Chain-of-thought prompting: This tends to improve model performance on a wide array of tasks. These sorts of elicitation methods are the ones we're most interested in. If researchers find an improvement to models that is roughly as easy and effective as chain-of-thought prompting, then we would see a very rapid increase in risk from AI.

The elicitation methods we'll try in this section will mostly revolve around prompting in order to obtain better performance, including chain-of-thought prompting, the ReAct Framework; as well as some more exotic methods, like a lookahead tool.

Tip - How to find wikipedia pages to test on

You might start having a hard time coming up with wikipedia pages to test on. Luckily, there are websites which generate random pages for this purpose, one good website is accessible via: https://wikispeedruns.com/ (you may want to change the "Random Article Generator Settings" to sample from the most popular 100,000 wikipedia pages, as the default setting of 3000 will generally generate paths that are too easy to test our agent). We've also provided you with a list of 18 wikipedia pairs, stored as wiki_pairs. These are ordered approximately in increasing difficulty.

To test whether two pages are connected via links, use this free online tool to see the possible paths between pages: https://www.sixdegreesofwikipedia.com/ (be somewhat careful with this though, as the paths that this website believes are accessible may not be accessible to our agent).

In this section, we'll use the gpt-4o-mini-2024-07-18 model to gauge whether our elicitation methods are effective since OpenAI will occasionally release small updates to gpt-4o-mini which change its behaviour. However, if you're curious, you can try testing your elicitation methods on the newest gpt-4o-mini model. What you will most likely notice is that your elicitation methods improve the model significantly less, and the model performs much better at the task without needing as much elicitation. This is because their most recent models are generally more capable, and so saturate the evaluation of "How well can a model play the Wikipedia game" faster. For a real agent evaluation, you'd want to have increasingly difficult tasks, so that even as models improve, we can find tasks that are too difficult for them to achieve (e.g. the 16-hour tasks on METR's Measuring AI Ability to Complete Long Tasks)

os.environ["INSPECT_EVAL_MODEL"] = "openai/gpt-4o-mini-2024-07-18"

As you should already know, prompting can have a large impact on model performance. There are many changes you could make for prompts in this task. You should experiment first with more general elicitation methods such as getting the agent to think more deeply, and output plans in different ways. After this, you might try more narrow elicitation methods, such as:

Telling the agent how many pages it's visited.
Telling the agent if it's already visited the page it's on (and how many times).
Schedule different prompts and planning methods for the "zoom out" and "zoom in" sections of the game, since we know that a good general strategy for playing the wikipedia game is:

Narrow article (with few links) -> General article (with many links) -> Narrow article (with few links)

Exercise - Engineer prompts

Difficulty: 🔴🔴⚪⚪⚪

Importance: 🔵🔵🔵⚪⚪

You should spend up to 20-35 mins on this exercise.

Try and design prompts that improve the performance of the wikipedia agent. You may have to do a decent amount of experimentation here. Remember that your prompts will have to be robust to:

Different tasks within the wikipedia game,
Different states within those tasks,
Different failure-modes the agent could encounter.

The rest of your agent should be defined the same way.

See if you can significantly improve performance. There's a test task below that you should aim to be able to solve with improved prompting. You'll need to modify the system_instruction, on_page_instruction and next_step_instruction, and don't forget to also modify the instruction_refresh() function, otherwise your prompts will revert to their old version every time the agent visits a new page.

@agent
def WikiAgentPrompting(tools: list[Tool], game: WikiGame) -> Agent:
    system_instruction =
    on_page_instruction =
    next_step_instruction =
    raise NotImplementedError("You need to implement the prompts for the WikiAgentPrompting")

    async def _reset_history(state: AgentState):
        state.messages = []
        state = await _start(state)
        return state

    async def instruction_refresh() -> None:
        nonlocal system_instruction, on_page_instruction, next_step_instruction
        raise NotImplementedError("You need to reimplement the instruction_refresh function
        ")

    async def _start(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the next_step_instruction function")

    async def _reset_history(state: AgentState):
        state.messages = []
        state = await _start(state)
        return state

    async def _start(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the _start function (copy your solution from before)")

    async def _handle_tool_calls(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the _handle_tool_calls function (copy your solution from before)")

    async def execute(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the execute function (copy your solution from before)")

    return execute

Solution

This isn't a perfect solution, but is an example of improved prompting compared to that in the original WikiGame class solution code. You may be able to do even better!

@agent
def WikiAgentPrompting(tools: list[Tool], game: WikiGame) -> Agent:
    system_instruction = ChatMessageSystem(
        content=f"You are a wikipedia-racing AI. Your goal is to reach {game.goal_page.title} by accessing links from wikipedia pages. Your current page is {game.current_page.title}."
    )

    on_page_instruction = ChatMessageUser(
        content=f"""You are currently on page: {game.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {game.goal_page.title} has the following summary:\n\n[Begin Summary]\n{game.get_page_summary(game.goal_page)}\n[End Summary]\n\nThe path you have taken so far is {" -> ".join(game.page_history)}.
            """
    )

    next_step_instruction = ChatMessageUser(
        content=f"""What's your next step to reach {game.goal_page.title}? Make sure to think carefully about what steps you should take to get there."""
    )

    async def _reset_history(state: AgentState):
        state.messages = []
        state = await _start(state)
        return state

    async def instruction_refresh() -> None:
        nonlocal system_instruction, on_page_instruction, next_step_instruction
        system_instruction = ChatMessageSystem(
            content=f"You are a wikipedia-racing AI. Your goal is to reach {game.goal_page.title} by accessing links from wikipedia pages. Your current page is {game.current_page.title}."
        )

        on_page_instruction = ChatMessageUser(
            content=f"""You are currently on page: {game.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {game.goal_page.title} has the following summary:\n\n[Begin Summary]\n{game.get_page_summary(game.goal_page)}\n[End Summary]\n\nThe path you have taken so far is {" -> ".join(game.page_history)}.
                """
        )

        next_step_instruction = ChatMessageUser(
            content=f"""What's your next step to reach {game.goal_page.title}? Make sure to think carefully about what steps you should take to get there."""
        )

    async def _start(state: AgentState) -> AgentState:
        state.messages.append(system_instruction)
        state.messages.append(on_page_instruction)
        return state

    async def _reset_history(state: AgentState):
        state.messages = []
        state = await _start(state)
        return state

    async def _start(state: AgentState) -> AgentState:
        state.messages.append(system_instruction)
        state.messages.append(on_page_instruction)
        return state

    async def _handle_tool_calls(state: AgentState) -> AgentState:
        messages, state.output = await execute_tools(messages=state.messages, tools=tools)
        state.messages.extend(messages)
        if state.output.message.tool_calls[0].function == "MovePageTool" and "success" in messages[-1].content.lower():
            await instruction_refresh()
            state = await _reset_history(state)
        return state

    async def execute(state: AgentState) -> AgentState:
        success = False
        state = await _start(state)
        while not success:
            state.messages.append(next_step_instruction)
            state.output = await get_model().generate(
                input=state.messages,
                tools=tools,
            )
            state.messages.append(state.output.message)
            if state.output.message.tool_calls:
                state = await _handle_tool_calls(state)
            if game.check_win():
                success = True
        return state

    return execute

LLM agents can be quite random - as you might have noticed - as a result of the default temperature being 1, and agents operating over a much longer horizon than usual for LLMs. So we'll do our testing at temperature = 0. This impacts performance noticeably, but better elicitation methods still have a noticeable effect.

Your original WikiAgent may not reliably be able to solve the example path Mandate of Heaven -> Doric Greek at temperature 0 (although it may occasionally get lucky). However, with sufficiently improved prompting, you should be able to get the agent to solve this task reliably.

game = WikiGame("Mandate of Heaven", "Doric Greek")
tool_list = [GetContentTool(game), MovePageTool(game)]

@task
def wiki_task() -> Task:
    return Task(dataset=[Sample(input="", target="")], message_limit=80)

eval(
    solver=as_solver(WikiAgent(tools=tool_list, game=game)),
    tasks=wiki_task(),
)

game = WikiGame("Mandate of Heaven", "Doric Greek")
tool_list = [GetContentTool(game), MovePageTool(game)]

@task
def wiki_task() -> Task:
    return Task(dataset=[Sample(input="", target="")], message_limit=80)

eval(
    solver=as_solver(WikiAgentPrompting(tools=tool_list, game=game)),
    tasks=wiki_task(),
)

Exercise - Implement the ReAct framework

Difficulty: 🔴🔴⚪⚪⚪

Importance: 🔵🔵🔵⚪⚪

You should spend up to 15-20 mins on this exercise.

The ReAct framework is an extension of chain-of-thought reasoning. Instead of prompting the model to think step-by-step, it separates this into two steps, especially designed for agent-based tasks:

Reasoning: The model is asked to reason about its current situation, and what sort of actions it should consider taking.
Action: Then, the model is asked to perform an action based on its outputted reasoning.

We'll need to write new prompts to indicate that the model should think during the reasoning step, and that the model should act during the action step. During the reasoning step, we don't want the model to make any tool calls, so we can use the tool_choice argument of generate() -- by default this is set to "auto", but since we want no tools to be called, we should set tool_choice="none".

Here you'll need to implement the generate_reason and generate_action functions, and reimplement execute() to work in the ReAct framework. Otherwise you should copy your solutions from before.

@agent
def WikiAgentReAct(tools: list[Tool], game: WikiGame) -> Agent:
    system_instruction =
    on_page_instruction =
    raise NotImplementedError("You need to implement the prompts for the WikiAgentReAct (copy your solutions from before). You likely won't need the next_step_instruction this time.")
    system_instruction = ChatMessageSystem(
        content=f"You are a wikipedia-racing AI. Your goal is to reach {game.goal_page.title} by accessing links from wikipedia pages. Your current page is {game.current_page.title}."
    )

    on_page_instruction = ChatMessageUser(
        content=f"""You are currently on page: {game.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {game.goal_page.title} has the following summary:\n\n[Begin Summary]\n{game.get_page_summary(game.goal_page)}\n[End Summary]\n\nThe path you have taken so far is {" -> ".join(game.page_history)}.
            """
    )

    async def _reset_history(state: AgentState):
        state.messages = []
        state = await _start(state)
        return state

    async def instruction_refresh() -> None:
        nonlocal system_instruction, on_page_instruction
        raise NotImplementedError("You need to reimplement the instruction_refresh function (copy your solution from before)")

    async def generate_reason(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the generate_reason function")

    async def generate_action(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the generate_action function")

    async def _start(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the _start function (copy your solution from before)")

    async def _handle_tool_calls(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the _handle_tool_calls function (copy your solution from before)")

    async def execute(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to reimplement the execute function")

    return execute

Solution

@agent
def WikiAgentReAct(tools: list[Tool], game: WikiGame) -> Agent:
    system_instruction = ChatMessageSystem(
        content=f"You are a wikipedia-racing AI. Your goal is to reach {game.goal_page.title} by accessing links from wikipedia pages. Your current page is {game.current_page.title}."
    )

    on_page_instruction = ChatMessageUser(
        content=f"""You are currently on page: {game.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {game.goal_page.title} has the following summary:\n\n[Begin Summary]\n{game.get_page_summary(game.goal_page)}\n[End Summary]\n\nThe path you have taken so far is {" -> ".join(game.page_history)}.
            """
    )

    async def _reset_history(state: AgentState):
        state.messages = []
        state = await _start(state)
        return state

    async def instruction_refresh() -> None:
        nonlocal system_instruction, on_page_instruction
        system_instruction = ChatMessageSystem(
            content=f"You are a wikipedia-racing AI. Your goal is to reach {game.goal_page.title} by accessing links from wikipedia pages. Your current page is {game.current_page.title}."
        )

        on_page_instruction = ChatMessageUser(
            content=f"""You are currently on page: {game.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {game.goal_page.title} has the following summary:\n\n[Begin Summary]\n{game.get_page_summary(game.goal_page)}\n[End Summary]\n\nThe path you have taken so far is {" -> ".join(game.page_history)}.
                """
        )

    async def generate_reason(state: AgentState) -> AgentState:
        state.messages.append(
            ChatMessageUser(
                content=f"""Before you decide on your next step, think carefully about what steps you should take to get to {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else."""
            )
        )
        state.output = await get_model().generate(input=state.messages, tools=tools, tool_choice="none")
        state.messages.append(state.output.message)
        return state

    async def generate_action(state: AgentState) -> AgentState:
        state.messages.append(
            ChatMessageUser(
                content=f"""Now based on your reasoning above, what action will you take to reach {game.goal_page.title}?"""
            )
        )
        state.output = await get_model().generate(input=state.messages, tools=tools, tool_choice="auto")
        state.messages.append(state.output.message)
        return state

    async def _start(state: AgentState) -> AgentState:
        state.messages.append(system_instruction)
        state.messages.append(on_page_instruction)
        return state

    async def _handle_tool_calls(state: AgentState) -> AgentState:
        messages, state.output = await execute_tools(messages=state.messages, tools=tools)
        state.messages.extend(messages)
        if state.output.message.tool_calls[0].function == "MovePageTool":
            await instruction_refresh()
            state = await _reset_history(state)
        return state

    async def execute(state: AgentState) -> AgentState:
        success = False
        state = await _start(state)
        while not success:
            state = await generate_reason(state)
            state = await generate_action(state)
            if state.output.message.tool_calls:
                state = await _handle_tool_calls(state)
            if game.check_win():
                success = True
        return state

    return execute

Now run your Wikipedia ReAct agent (we haven't given tests to check that the model works, since your precise implementation may deviate from ours, by running the agent, and checking its chat_history, you should be able to tell whether your ReAct framework is operating correctly). You should be able to notice an improved reasoning process each time the model runs, and might notice that on some paths the model performs more effectively (although this is hard to demonstrate conclusively).

However, you'll also likely notice that the difference between effective prompting, and the ReAct method we've implemented here isn't massive. Using the ReAct framework is similar to chain-of-thought prompting in this case, and prompting can make a difference only up to a point. However, ReAct does tend to make the agent more reliable on higher temperatures (although this is impossible to identify in just a single run).

# Run the game with WikiAgentPrompting
game = WikiGame("Balto-Slavic languages", "Netscape Navigator 9")
tool_list = [GetContentTool(game), MovePageTool(game)]


@task
def wiki_task() -> Task:
    return Task(dataset=[Sample(input="", target="")], message_limit=80)


eval(
    solver=as_solver(WikiAgentPrompting(tools=tool_list, game=game)),
    tasks=wiki_task(),
)

# Run the game with WikiAgentReAct
game = WikiGame("Balto-Slavic languages", "Netscape Navigator 9")
tool_list = [GetContentTool(game), MovePageTool(game)]


@task
def wiki_task() -> Task:
    return Task(dataset=[Sample(input="", target="")], message_limit=80)


eval(
    solver=as_solver(WikiAgentReAct(tools=tool_list, game=game)),
    tasks=wiki_task(),
)

Exercise - Let the LLM see its entire chat history

Difficulty: 🔴🔴⚪⚪⚪

Importance: 🔵🔵⚪⚪⚪

You should spend up to 10-15 mins on this exercise.

You may have noticed that the agent performs significantly worse as a result of the fact that we decided to reset the chat history every time the agent encounters a new page. For example, it will occasionally come up with good plans and not follow through on them, since its in-context memory has been erased. We can fix this issue by letting the agent see the entirety of its chat history.

The main obstacle to allowing the agent to see its entire history is the capacity of its context window -- specifically due to the length of wikipedia articles that the agent has to retrieve in order to play the game. However, we can fix this issue by resetting only the outputs of the GetContentTool function each time the agent moves to a new page, instead of resetting the entire chat history.

When we reset this content, we should still let the agent know that Wikipedia content was output in that location, as otherwise it will confuse the agent about the get_content tool. You should replace the content with "Wikipedia content was output here. Wikipedia page: {page_title}" so that the agent knows that the get_content tool works as expected.

Modify the _reset_history function in the WikiAgentHistory function below to accomplish this. You may also need to modify the _handle_tool_calls function to keep track of the previous page.

@agent
def WikiAgentHistory(tools: list[Tool], game: WikiGame, verbose: bool = True):
    system_instruction =
    on_page_instruction =
    raise NotImplementedError("You need to implement the prompts for the WikiAgentHistory (copy your solutions from before).")
    async def instruction_refresh() -> None:
        nonlocal system_instruction, on_page_instruction
        raise NotImplementedError("You need to reimplement the instruction_refresh function (copy your solution from before)")

    async def _reset_history(state: AgentState, previous_page: str) -> AgentState:
        raise NotImplementedError("You need to implement the new _reset_history function")

    async def generate_reason(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the generate_reason function")

    async def generate_action(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the generate_action function")

    async def _start(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the _start function (copy your solution from before)")

    async def _handle_tool_calls(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to reimplement the _handle_tool_calls function")

    async def execute(state: AgentState) -> AgentState:
        raise NotImplementedError("You need to implement the execute function (copy your solution from before)")

    return execute

Solution

@agent
def WikiAgentHistory(tools: list[Tool], game: WikiGame, verbose: bool = True):
    system_instruction = ChatMessageSystem(
        content=f"You are a wikipedia-racing AI. Your goal is to reach {game.goal_page.title} by accessing links from wikipedia pages. Your current page is {game.current_page.title}."
    )

    on_page_instruction = ChatMessageUser(
        content=f"""You are currently on page: {game.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {game.goal_page.title} has the following summary:\n\n[Begin Summary]\n{game.get_page_summary(game.goal_page)}\n[End Summary]\n\nThe path you have taken so far is {" -> ".join(game.page_history)}.
            """
    )

    async def instruction_refresh() -> None:
        nonlocal system_instruction, on_page_instruction
        system_instruction = ChatMessageSystem(
            content=f"You are a wikipedia-racing AI. Your goal is to reach {game.goal_page.title} by accessing links from wikipedia pages. Your current page is {game.current_page.title}."
        )
        on_page_instruction = ChatMessageUser(
            content=f"""You are currently on page: {game.current_page.title}. Make sure you start by reasoning about what steps you should take to get to the article on {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else. In case you're unsure, {game.goal_page.title} has the following summary:\n\n[Begin Summary]\n{game.get_page_summary(game.goal_page)}\n[End Summary]\n\nThe path you have taken so far is {" -> ".join(game.page_history)}.
                """
        )

    async def _reset_history(state: AgentState, previous_page: str) -> AgentState:
        for message in state.messages:
            if (
                isinstance(message, ChatMessageTool)
                and message.function == "GetContentTool"
                and "Wikipedia page content for page" not in message.content
            ):
                message.content = f"Wikipedia page content for page {previous_page} was output here, but has been removed for brevity."
        return state

    async def generate_reason(state: AgentState) -> AgentState:
        state.messages.append(
            ChatMessageUser(
                content=f"""Before you decide on your next step, think carefully about what steps you should take to get to {game.goal_page.title}. When coming up with a strategy, make sure to pay attention to the path you have already taken, and if your current strategy doesn't seem to be working out, try something else."""
            )
        )
        state.output = await get_model().generate(input=state.messages, tools=tools, tool_choice="none")
        state.messages.append(state.output.message)
        return state

    async def generate_action(state: AgentState) -> AgentState:
        state.messages.append(
            ChatMessageUser(
                content=f"""Now based on your reasoning above, what action will you take to reach {game.goal_page.title}?"""
            )
        )
        state.output = await get_model().generate(input=state.messages, tools=tools, tool_choice="auto")
        state.messages.append(state.output.message)
        return state

    async def _start(state: AgentState) -> AgentState:
        state.messages.append(system_instruction)
        state.messages.append(on_page_instruction)
        return state

    async def _handle_tool_calls(state: AgentState) -> AgentState:
        messages, state.output = await execute_tools(messages=state.messages, tools=tools)
        state.messages.extend(messages)
        if state.output.message.tool_calls[0].function == "MovePageTool" and "success" in messages[-1].content.lower():
            previous_page = game.current_page.title
            await instruction_refresh()
            state = await _reset_history(state, previous_page)
        return state

    async def execute(state: AgentState) -> AgentState:
        success = False
        state = await _start(state)
        while not success:
            state = await generate_reason(state)
            state = await generate_action(state)
            if state.output.message.tool_calls:
                state = await _handle_tool_calls(state)
            if game.check_win():
                success = True
        return state

    return execute

Now let's test our new agent on a different path, we've proposed one below, but feel free to try it on an example path of your own choosing, and see how it performs compared to our previous WikiAgentReAct.

game = WikiGame("Blavatnik School of Government", "Free Thai Movement")
tool_list = [GetContentTool(game), MovePageTool(game)]

@task
def wiki_task() -> Task:
    return Task(dataset=[Sample(input="", target="")], message_limit=120)

eval(
    solver=as_solver(WikiAgentReAct(tools=tool_list, game=game)),
    tasks=wiki_task(),
)

game = WikiGame("Blavatnik School of Government", "Free Thai Movement")
tool_list = [GetContentTool(game), MovePageTool(game)]

@task
def wiki_task() -> Task:
    return Task(dataset=[Sample(input="", target="")], message_limit=120)

eval(
    solver=as_solver(WikiAgentHistory(tools=tool_list, game=game)),
    tasks=wiki_task(),
)

Exercise - Implement a reflexion tool

Difficulty: 🔴🔴🔴⚪⚪

Importance: 🔵🔵🔵⚪⚪

You should spend up to 25-35 mins on this exercise.

The reflexion paper proposes a method that improves performance by getting LLMs to do self-reflection. The original paper looks at LLM agents in a RL set-up, where getting a reward signal on the agent's signal is slow and expensive. The key idea is to get quick cheap feedback from an evaluator on every proposed action, then to reflect on this feedback before taking the next action, as opposed to waiting for the final outcome. In their case, the evaluator was a heuristic function that estimated the reward function.

We will borrow and modify this idea by building a tool that allows our agent to perform a lookahead, and then gives feedback on our agent's proposed actions. We allow the agent to suggest candidate paths, then the tool will check if these paths work and inform the model either that the path works, or where the path goes wrong.

We don't want to provide the agent the links or content of every page when it does this lookahead, as then we'd just be reimplementing a smaller version of the game inside the game. Instead, we'll let the agent suggest paths without seeing any content or links, and then let it know if this path works. It's very likely that a suggested link will — at some point — not be accessible from one of the pages, but this tool will still be useful to help the agent plan.

@tool
def TestPathTool(game: WikiGame) -> Tool:
    async def execute(path: str) -> str:
        """
        Test a path of wikipedia pages to see if it leads to the goal page. The path should be a series of page titles separated by '->'.

        Args:
            path (str): The path to test formatted as a series of wikipedia page titles separated by '->'. The path must start with the current page title. The path doesn't have to end with the goal page title.

        Returns:
            str: The result of the test. Success if the path leads to the goal page. Otherwise returns failure, and where the path failed.
        """
        raise NotImplementedError("You need to implement the TestPathTool function")

    return execute

import os

game = WikiGame("Blavatnik School of Government", "Free Thai Movement")
tool_list = [GetContentTool(game), MovePageTool(game), TestPathTool(game)]

@task
def wiki_task() -> Task:
    return Task(dataset=[Sample(input="", target="")], message_limit=120)

eval(solver=as_solver(WikiAgentHistory(tools=tool_list, game=game)), tasks=wiki_task())

Solution

@tool
def TestPathTool(game: WikiGame) -> Tool:
    async def execute(path: str) -> str:
        """
        Test a path of wikipedia pages to see if it leads to the goal page. The path should be a series of page titles separated by '->'.

        Args:
            path (str): The path to test formatted as a series of wikipedia page titles separated by '->'. The path must start with the current page title. The path doesn't have to end with the goal page title.

        Returns:
            str: The result of the test. Success if the path leads to the goal page. Otherwise returns failure, and where the path failed.
        """
        # Split the path into individual page titles
        path_nodes = [node.strip() for node in path.split("->")]
        if not path_nodes:
            return "Failure: Path is empty."
        if len(path_nodes) == 1:
            return "Failure: Path must contain at least two pages (current and goal)."
        if path_nodes[0].lower() != game.current_page.title.lower():
            return f"Failure: Path must start with the current page '{game.current_page.title}'."
        for i in range(len(path_nodes) - 1):
            current_node = path_nodes[i]
            next_node = path_nodes[i + 1]
            permitted_links = (link.lower() for link in get_permitted_links(get_page(current_node)))
            if next_node.lower() not in permitted_links:
                return f"This path works until {next_node}, which is not a permitted link on the page {current_node}."
        return "Success! Following this path will work successfully."

    return execute

You'll likely see that the agent often doesn't use this tool effectively, and when it does, will make suboptimal decisions based on the output of this tool:

One common failure mode is that the model will try a promising path, be told by the tool that it goes wrong somewhere, and then abandon the entire path for a much less promising path (without doing any further testing).
Another common issue is that the agent will only use the tool to test whether it is possible to move a single page ahead, which is not the intended use of the tool (as the agent should be able to work out which pages it can move to in one step by looking at the page's content).

Although it may be tempting to continue adding additional tools to agents, if they're not capable of using them correctly and effectively, then these tools may actually harm performance. There are tasks where a 'lookahead' tool could be used effectively, however it turns out that the Wikipedia game task isn't one of them.