BenchFlow builds the environments AI agents learn in.

A frontier environment lab for AI agents. We ship SkillsBench, ClawsBench, and the BenchFlow runtime.

Thesis

Data is the bottleneck. Environments are the new data.

AI data went from labels to post-training trajectories to environments. Models in 2026 don’t get better from more static prompts — they get better from running through realistic environments and being judged on the whole workflow.

  1. 1.0

    Labels

    Image tags, span annotations, yes/no labels.

  2. 2.0

    Post-training

    SFT, preferences, reward labels, short trajectories.

  3. 3.0we’re here

    Environments

    Stateful workplaces with services, files, tools, verifiers, replay.

Ecosystem

  1. May 26· CAIS · San Jose

    Agent Skills ’26 workshop

    First workshop on agent skills. Speakers: Dawn Song, Ross Taylor, Kanav Garg (DeepMind), Yu Su. Live SkillsBench design challenge.

    agentskills-workshop.org
  2. May 27· San Francisco

    SkillsBench 1.0 Launch party

    Launch party for SkillsBench 1.0. Details coming soon.