5️⃣ Bonus
In this bonus section, we'll suggest some modifcations to the wikipedia game to make it more difficult so that you can then go on and try further elicitation methods of your own.
Alternatively, if you're tired of the Wikipedia game, and are feeling ambitious, you might want to try designing your own agent task, and quantifying performance on that task.
Exercise - Implement additional rules
Allow the game to have additional rules. Some suggestions are a "No country pages" rule, and a "No articles above a given length" rule, but feel free to add more. With all of our elicitation methods, the agent generally only fails if the path is impossible or unreasonably hard. To implement a no country rule, you may want to use the wikipedia API's "categories" attribute for WikipediaPage objects.
First, let's modify the WikiGame task to store the rules for the Wikipedia game. We've modified the class for you to allow for the rules we described above.
class WikiGameRules(WikiGame):
def __init__(
self,
starting_page: str,
goal_page: str,
rules: list[Literal["no countries", "no pages with length above 30000"]] | None = None,
):
super().__init__(starting_page, goal_page)
self.rules = rules
Now let's modify the prompts given to the LLM API in the Agent function so that we inform the agent about any additional rules it will have to abide by. We should have the option to maintain our original prompts (in case we decide to run the WikiAgent without any rules), so the new system_instruction method should first check whether there are any additional rules, and only return the modified system prompt if there are.
Now you'll have to implement these rules by modifying the MovePageTool class, so that the agent can only move page if it's within the rules of the game. If you're running the agent with the reflexion tool, you may also want to modify the logic of that tool to abide by the rules.
# Implement modified MovePageTool to enforce rules here.
Try further elicitation methods
Read some further resources on building and eliciting behaviour from LLM agents, and try implementing some of your own methods to elicit improved performance on the task. If you start seeing diminishing returns from elicitation (due to saturating performance on the task), come up with new ways to make the task harder. Alternatively, if you're feeling particularly ambitious, you can try and come up with your own more difficult task and build an agent to try and accomplish this from scratch.