/
Swebenchff06407
# SWE-bench Datasets
SWE-bench offers multiple datasets for evaluating language models on software engineering tasks. This guide explains the different datasets and how to use them.
## Available Datasets
SWE-bench provides several dataset variants:
| Dataset | Description | Size | Use Case |
|---------|-------------|------|----------|
| **SWE-bench** | Full benchmark with diverse repositories | 2,294 instances | Comprehensive evaluation |
| **SWE-bench Lite** | Smaller subset for quick evaluations | 534 instances | Faster iteration, development |
| **SWE-bench Verified** | Expert-verified solvable problems | 500 instances | High-quality evaluation |
| **SWE-bench Multimodal** | Includes screenshots and UI elements | 100 dev instances (500 test) | Testing multimodal capabilities |
## Accessing Datasets
All datasets are available on Hugging Face:
```python
from datasets import load_dataset
# Load main dataset
sbf = load_dataset('princeton-nlp/SWE-bench')
# Load lite variant
sbl = load_dataset('princeton-nlp/SWE-bench_Lite')
# Load verified variant
sbv = load_dataset('princeton-nlp/SWE-bench_Verified', split='test')
# Load multimodal variant
sbm_dev = load_dataset('princeton-nlp/SWE-bench_Multimodal', split='dev')
sbm_test = load_dataset('princeton-nlp/SWE-bench_Multimodal', split='test')
```
## Dataset Structure
Each instance in the datasets has the following structure:
```python
{
"instance_id": "owner__repo-pr_number",
"repo": "owner/repo",
"issue_id": issue_number,
"base_commit": "commit_hash",
"problem_statement": "Issue description...",
"version": "Repository package version",
"issue_url": "GitHub issue URL",
"pr_url": "GitHub pull request URL",
"patch": "Gold solution patch (don't look at this if you're trying to solve the problem)",
"test_patch": "Test patch",
"created_at": "Date of creation",
"FAIL_TO_PASS": "Fail to pass test cases",
"PASS_TO_PASS": "Pass test cases"
}
```
SWE-bench Verified also includes:
```python
{
# ... standard fields above ...
"difficulty": "Difficulty level"
}
```
The multimodal dataset also includes:
```python
{
# ... standard fields above ...
"image_assets": {
"problem_statement": ["url1", "url2", ...],
"patch": ["url1", "url2", ...],
"test_patch": ["url1", "url2", ...]
}
}
```
Note that for the `test` split of the multimodal dataset, the `patch`, `test_patch`, and `image_assets` fields will be empty.
## Paper's Retrieval Datasets
For the BM25 retrieval datasets used in the SWE-bench paper, you can load the datasets as follows:
```python
# Load oracle retrieval dataset
oracle_retrieval = load_dataset('princeton-nlp/SWE-bench_oracle', split='test')
# Load BM25 retrieval dataset
sbf_bm25_13k = load_dataset('princeton-nlp/SWE-bench_bm25_13K', split='test')
sbf_bm25_27k = load_dataset('princeton-nlp/SWE-bench_bm25_27K', split='test')
sbf_bm25_40k = load_dataset('princeton-nlp/SWE-bench_bm25_40K', split='test')
```