BenchFlow builds the environments AI agents learn in.

A frontier environment lab for AI agents. We ship SkillsBench, ClawsBench, and the BenchFlow runtime.

Thesis

Data is the bottleneck. Environments are the new data.

AI data went from labels to post-training trajectories to environments. Models in 2026 don’t get better from more static prompts — they get better from running through realistic environments and being judged on the whole workflow.

  1. 1.0

    Labels

    Image tags, span annotations, yes/no labels.

  2. 2.0

    Post-training

    SFT, preferences, reward labels, short trajectories.

  3. 3.0we’re here

    Environments

    Stateful workplaces with services, files, tools, verifiers, replay.

Ecosystem

  1. May 26· CAIS · San Jose

    Agent Skills ’26 workshop

    First workshop on agent skills. Speakers: Dawn Song, Ross Taylor, Kanav Garg (DeepMind), Yu Su. Live SkillsBench design challenge.

    agentskills-workshop.org
  2. May 27· Google DeepMind · Mountain View

    BenchFlow / SkillsBench / ClawsEnv 1.0

    Co-launch with Kaggle and Google DeepMind. Kaggle competition with $50k prize pool already secured.

Backed by

Angels

Firms

+ more across Anthropic, Google DeepMind, and the frontier AI ecosystem.