# Docker Setup Guide ## Introduction SWE-bench uses a Docker-based evaluation harness to ensure consistent, reproducible results across different platforms. This containerized approach eliminates environment discrepancies and provides isolated environments for each evaluation task. ## Prerequisites Before setting up Docker for SWE-bench, ensure you have: - Docker installed on your system ([Docker installation guide](https://docs.docker.com/engine/install/)) - For Linux users, follow the [post-installation steps](https://docs.docker.com/engine/install/linux-postinstall/) - Sufficient disk space (at least 120GB free) - Adequate system resources (16GB+ RAM recommended) ## Docker Installation ### macOS 1. Download and install Docker Desktop for Mac from the [official website](https://www.docker.com/products/docker-desktop) 2. Increase resource allocation in Docker Desktop settings: - Open Docker Desktop preferences - Go to Resources > Advanced - Allocate at least 8 CPUs and 16GB RAM - Set disk image size to at least 120GB ### Linux 1. Install Docker using your distribution's package manager or follow the [official guide](https://docs.docker.com/engine/install/) 2. Add your user to the docker group to run Docker without sudo: ```bash sudo groupadd docker sudo usermod -aG docker $USER newgrp docker # Apply changes without logging out ``` ### Windows 1. Install Docker Desktop for Windows from the [official website](https://www.docker.com/products/docker-desktop) 2. Ensure WSL 2 is installed and configured 3. Increase resource allocation in Docker Desktop settings: - Open Docker Desktop settings - Go to Resources > Advanced - Allocate at least 8 CPUs and 16GB RAM - Set disk image size to at least 120GB ## Testing Your Docker Installation Verify your Docker installation with these commands: ```bash # Check Docker version docker --version # Run a simple test container docker run hello-world # Check available disk space docker system df ``` ## Docker Resource Management ### Understanding SWE-bench's Docker Usage The SWE-bench evaluation harness builds Docker images in three layers: 1. **Base image**: Common dependencies for all evaluations 2. **Environment images**: Python environments for different configurations (~60 images) 3. **Instance images**: Specific dependencies for each evaluation task These images require significant disk space, so it's important to understand how to manage them. ### Resource Management Commands Useful commands for managing Docker resources: ```bash # View Docker disk usage docker system df # Remove unused containers docker container prune # Remove unused images docker image prune # Remove all unused Docker objects (containers, images, networks, volumes) docker system prune # Remove all stopped containers docker container prune # Remove all dangling images docker image prune ``` ## Cache Level Configuration SWE-bench provides different caching options to balance speed vs. storage: | Cache Level | Description | Storage Impact | Performance | |-------------|-------------|----------------|------------| | `none` | No image caching | Minimal (~120GB during run) | Slowest | | `base` | Cache only base image | Minimal (~120GB during run) | Slow | | `env` (default) | Cache base and environment images | Moderate (~100GB) | Moderate | | `instance` | Cache all images | High (~2,000GB) | Fastest | Set the cache level when running the evaluation: ```bash python -m swebench.harness.run_evaluation \ --predictions_path \ --cache_level env \ --clean True ``` For most users, the default `env` setting provides a good balance between evaluation speed and disk usage. ## Performance Optimization ### Setting the Right Number of Workers The optimal number of workers depends on your system resources: - Use fewer than `min(0.75 * os.cpu_count(), 24)` workers - For an 8-core machine, 6 workers is typically appropriate - For a 16-core machine, 12 workers is typically appropriate ```bash python -m swebench.harness.run_evaluation \ --predictions_path \ --max_workers 8 ``` Increasing worker count beyond your system's capabilities can actually slow down evaluation due to resource contention. ## Troubleshooting Docker Issues ### Common Problems and Solutions 1. **Insufficient disk space**: - Free up disk space or increase Docker Desktop's disk image size - Use `--cache_level=env` or `--cache_level=base` to reduce storage needs 2. **Docker build failures**: - Check network connectivity - Inspect build logs in `logs/build_images` 3. **Permission issues**: - Ensure your user is in the docker group (Linux) - Run with elevated privileges if necessary 4. **Slow evaluation times**: - Reduce the number of parallel workers - Check CPU and memory usage during evaluation - Consider using a more powerful machine 5. **Network-related issues**: - Check Docker network settings: ```bash docker network ls docker network inspect bridge ``` ## Cleaning Up After Evaluation To reclaim disk space after running evaluations: ```bash # Remove all unused Docker resources docker system prune -a # Or for more control, remove specific resources docker container prune # Remove all stopped containers docker image prune # Remove unused images ``` You can also set `--clean=True` when running the evaluation to automatically clean up instance-specific resources.