webarena/xiangyi-li · BenchFlow

/

mirrored 13 minutes ago

Benchmark Card Files and versions Leaderboard

Hub
Contact

0

__init__.py
181 B
evaluators.py
13.3 kB
helper_functions.py
7.57 kB

add comment

a year ago

use fuzzy_match for UA tasks and update ua eval prompt

a year ago

evaluation_harness

release commit

2 years ago

Shuyan ZhouUpdate README.mddaee18d