## 🌐 AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? [**AssistantBench**](https://oriyor.github.io/AssistantBench) evaluates the ability of AI agents to solve reaslistic and time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?".

AssistantBench example

### ⛰️ Dataset and leaderboard To start working on AssistantBench, please check out our HuggingFace [dataset](https://huggingface.co/datasets/AssistantBench/AssistantBench) and [leaderboard](https://huggingface.co/spaces/AssistantBench/leaderboard), where you can also make new submissions. ### 🤖 SPA We also introduce SeePlanAct (SPA), a new web agent built to tackle tasks in AssistantAgent. Code to run SPA and additional resources will be released soon! ### ✍ Citation ``` @misc{yoran2024assistantbenchwebagentssolve, title={AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?}, author={Ori Yoran and Samuel Joseph Amouyal and Chaitanya Malaviya and Ben Bogin and Ofir Press and Jonathan Berant}, year={2024}, eprint={2407.15711}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.15711}, } ```