Hub
Docs
Try for Free
xiangyi-li
/
webarena
mirrored 13 minutes ago
Benchmark Card
Files and versions
Leaderboard
like
0
__init__.py
181 B
evaluators.py
13.3 kB
helper_functions.py
7.57 kB
add comment
a year ago
use fuzzy_match for UA tasks and update ua eval prompt
a year ago
main
evaluation_harness
release commit
2 years ago
Shuyan Zhou
Update README.md
daee18d