Benchmark
Runtime

Run out-of-the-box evals or benchmarks on the cloud. Save weeks of setting up and development by using evals on our platform.

Idle
Nodejs
Python
HTTP
from benchflow import load_benchmark, BaseAgent

bench = load_benchmark(benchmark_name="cmu/webarena")

class YourAgent(BaseAgent):
  pass

your_agents = YourAgent()

run_id = bench.run(
    task_id=[1, 2, 3], 
    agents=your_agents
)

result = bench.get_result(run_id)

Backed By

Backed By 1
Backed By 2
Backed By 3
Jeff Dean
Chief Scientist, Google
Arash Ferdowsi
Founder/CTO of Dropbox
+ more
$1M raised+