# Exploring WebArena Results with Zeno 


[Zeno](https://zenoml.com/) provides interative interface to explore the results of your agents in WebArena. You can easily
* Visualize the trajectories
* Compare the performance of different agents
* Interactively select and analyze trajectories with various filters such as trajectory length 

In [None]:
!pip install zeno_client

In [None]:
import pandas as pd
import json
import os
from dotenv import load_dotenv

import zeno_client

We first need to convert and combine the output `HTML` trajectories into a single `JSON` file using the `html2json` script:
Remember to change `result_folder` to the path you saved your `render_*.html`. The results will be saved to `{{result_folder}}/json_dump.json`. For example:

In [None]:
!python html2json.py --result_folder ../cache/918_text_bison_001_cot --config_json ../config_files/test.raw.json
!python html2json.py --result_folder ../cache/919_gpt35_16k_cot --config_json ../config_files/test.raw.json
!python html2json.py --result_folder ../cache/919_gpt35_16k_cot_na --config_json ../config_files/test.raw.json
!python html2json.py --result_folder ../cache/919_gpt35_16k_direct --config_json ../config_files/test.raw.json
!python html2json.py --result_folder ../cache/919_gpt35_16k_direct_na --config_json ../config_files/test.raw.json
!python html2json.py --result_folder ../cache/919_gpt4_8k_cot --config_json ../config_files/test.raw.json

Next you will record the json file names in `RESULT_JSONS` and provide the model tag in `RESULT_NAMES`

In [None]:
RESULT_JSONS = [
 "../cache/918_text_bison_001_cot/json_dump.json", 
 "../cache/919_gpt35_16k_cot/json_dump.json",
 "../cache/919_gpt35_16k_cot_na/json_dump.json",
 "../cache/919_gpt35_16k_direct/json_dump.json",
 "../cache/919_gpt35_16k_direct_na/json_dump.json",
 "../cache/919_gpt4_8k_cot/json_dump.json",
 ]
RESULT_NAMES = ["palm-2-cot-uahint", "gpt35-cot", "gpt35-cot-uahint", "gpt35-direct", "gpt35-direct-uahint", "gpt4-cot"]

## Obtaining Data

We can use the first results file to create the base `dataset` we'll upload to Zeno with just the initial prompt intent.

In [None]:
with open(RESULT_JSONS[0], "r") as f:
 raw_json: dict = json.load(f)

In [None]:
df = pd.DataFrame(
 {
 "example_id": list(raw_json.keys()),
 "site": [", ".join(x["sites"]) for x in raw_json.values()],
 "eval_type": [", ".join(x["eval_types"]) for x in raw_json.values()],
 "achievable": [x["achievable"] for x in raw_json.values()],
 "context": [
 json.dumps(
 [
 {
 "role": "system",
 "content": row["intent"],
 }
 ]
 )
 for row in raw_json.values()
 ],
 }
)

## Authenticate and Create a Project

We can now create a new [Zeno](https://zenoml.com) project and upload this data.

Create an account and API key by signing up at [Zeno Hub](https://hub.zenoml.com) and going to your [Account page](http://hub.zenoml.com/account). Save the API key in a `.env` file.

In [None]:
# read ZENO_API_KEY from .env file
load_dotenv(override=True)

client = zeno_client.ZenoClient("os.environ.get("ZENO_API_KEY")")

In [None]:
project = client.create_project(
 name="WebArena Tester",
 view={
 "data": {
 "type": "list",
 "elements": {"type": "message", "content": {"type": "markdown"}},
 "collapsible": "top",
 },
 "label": {"type": "markdown"},
 "output": {
 "type": "list",
 "elements": {
 "type": "message",
 "highlight": True,
 "content": {"type": "markdown"},
 },
 "collapsible": "top",
 },
 },
 metrics=[
 zeno_client.ZenoMetric(name="success", type="mean", columns=["success"]),
 zeno_client.ZenoMetric(
 name="# of go backs", type="mean", columns=["# of go_backs"]
 ),
 zeno_client.ZenoMetric(name="# of steps", type="mean", columns=["# of steps"]),
 ],
)

In [None]:
project.upload_dataset(df, id_column="example_id", data_column="context")

# Uploading Model Outputs

We can now upload the full trajectory outputs for our models.

If you want to display the images, you will need to upload the images to a publically accessible location and provide the URL in the `image_url` field.

In [None]:
image_base_url = None

In [None]:
def format_message(row):
 return_list = []
 for message in row["messages"]:
 role = "user" if "user" in message else "assistant"

 if role == "user":
 if image_base_url:
 content = (
 "[![image](%s/%s)](%s/%s)\n%s"
 % (
 image_base_url,
 "/".join(message["image"].split("/")[-2:]),
 image_base_url,
 "/".join(message["image"].split("/")[-2:]),
 message[role],
 )
 )
 else:
 content = message[role]
 else:
 content = message[role]
 return_list.append({"role": role, "content": content})
 return return_list

In [None]:
def get_system_df(result_path: str):
 with open(result_path, "r") as f:
 json_input: dict = json.load(f)
 return pd.DataFrame(
 {
 "example_id": list(json_input.keys()),
 "# of clicks": [
 sum(
 [
 1
 for x in r["messages"]
 if "assistant" in x and "`click" in x["assistant"]
 ]
 )
 for r in json_input.values()
 ],
 "# of types": [
 sum(
 [
 1
 for x in r["messages"]
 if "assistant" in x and "`type" in x["assistant"]
 ]
 )
 for r in json_input.values()
 ],
 "# of go_backs": [
 sum(
 [
 1
 for x in r["messages"]
 if "assistant" in x and "`go_back" in x["assistant"]
 ]
 )
 for r in json_input.values()
 ],
 "# of steps": [len(r["messages"]) for r in json_input.values()],
 "context": [json.dumps(format_message(row)) for row in json_input.values()],
 "success": [r["success"] for r in json_input.values()],
 }
 )

In [None]:
for i, system in enumerate(RESULT_JSONS):
 output_df = get_system_df(system)
 project.upload_system(
 output_df, name=RESULT_NAMES[i], id_column="example_id", output_column="context"
 ) 