Skip to content

SWE Task Setup

The SWE-bench / R2E-Gym training and evaluation paths in AgentFly run patched repositories inside enroot containers and grade them with swebench's harness. Because these dependencies are heavy and only required for SWE work, they are not part of the default install. This page covers what to install and how to point AgentFly at your container images.

Affected scripts and tests

These are the entry points that need this setup:

  • examples/eval_scripts/swebench/run_swebench_verified.sh — SWE-bench Verified evaluation.
  • examples/train_scripts/swe/train_swe_r2e_gym.slurm
  • examples/train_scripts/swe/train_swe_r2e_gym_tools.slurm
  • examples/train_scripts/swe/train_swe_r2e_gym_tools_32B.slurm
  • examples/train_scripts/context/context_run_swe.sh
  • tests/unit/rewards/swe/test_eval_r2e_gym.py
  • tests/unit/rewards/swe/test_get_patch_from_runtime.py

1. Extra Python dependencies

Install these into the same conda environment used for AgentFly:

pip install swebench
pip install git+https://github.com/R2E-Gym/R2E-Gym.git

swebench provides the grading harness (swebench.harness.*) and r2egym provides the trajectory utilities and log parsers used by the R2E-Gym reward. Both are imported lazily, so the rest of AgentFly works without them, but any SWE reward call will fail with ImportError until they are installed.

2. enroot

The SWE rewards launch sandboxed shells inside enroot containers. Install the enroot system package as described in Installation — sudo is required.

3. Fetch the container images

Each task uses a different image set. Place all of them under one directory and remember the path:

Task Image set
SWE-bench Verified evaluation swe-bench-verified
R2E-Gym training (lite) r2e-gym-lite

Example layout:

/some/big/disk/enroot/images/
├── r2e-gym-lite/
│   ├── <instance>.sqsh
│   └── ...
└── swe-bench-verified/
    ├── <instance>.sqsh
    └── ...

The build/import commands depend on which image source you use (Docker Hub, your registry, or pre-built .sqsh files); refer to the upstream R2E-Gym and SWE-bench docs for image provisioning. AgentFly only needs the resulting directory to exist and contain the per-instance images.

4. Set ENROOT_IMAGES_PATH

The example scripts and tests read ENROOT_IMAGES_PATH from the environment and fall back to a placeholder otherwise. Export it in your shell before launching, e.g.:

export ENROOT_IMAGES_PATH=/some/big/disk/enroot/images/r2e-gym-lite
sbatch examples/train_scripts/swe/train_swe_r2e_gym.slurm
export ENROOT_IMAGES_PATH=/some/big/disk/enroot/images/swe-bench-verified
bash examples/eval_scripts/swebench/run_swebench_verified.sh

For the CLI evaluator you can also pass it inline:

python -m agentfly.cli swebench \
    --enroot-images-path /some/big/disk/enroot/images/swe-bench-verified \
    ...

5. Smoke test

With the environment variable set and images present:

pytest tests/unit/rewards/swe/test_eval_r2e_gym.py -x

A successful run confirms that r2egym, swebench, enroot, and your image directory are all wired up correctly.