Learn to build and run evaluations for large language models, including dataset generation and LLM agents.