Skip to content

Predefined Tools

The following are predefined tools that can be used directly with agents.

Code Interpreter

code_interpreter(code: str, context: Context) async

Run the code in docker container and return the output from stdout or stderr.

Uses one Python sandbox per rollout (acquired via Context); the sandbox is released when the rollout ends. Warm the pool at training start with ResourceEngine.start(python_sandbox_spec(), size=32, backend="local") if needed.

Parameters:

  • code (str) –

    The code to run.

  • context (Context) –

    Injected rollout context; used to acquire the sandbox resource.

Returns:

  • str –

    The output from stdout or stderr.

Calculator

calculator(expression: str)

Calculate the result of a mathematical expression.

Parameters:

  • expression (str) –

    The mathematical expression to calculate

Returns:

  • str –

    The result of the expression

ALFWorld

alfworld_step(action: str, context: Context) async

Take an action in the ALFWorld environment and return the observation

Parameters:

  • action (str) –

    The action to take in the environment

  • context (Context) –

    Injected rollout context; used to acquire the ALFWorld resource.

Returns:

  • dict –

    A dictionary containing the observation, reward, done, and info

ScienceWorld

scienceworld_explorer(action: str, context: Context) async

Executes an action in the ScienceWorld environment and returns the resulting observation.

Parameters:

  • action (str) –

    The action to perform in the environment. Valid actions include commands like 'look around', 'inventory', 'open ', etc.

  • context (Context) –

    Injected rollout context; used to acquire the ScienceWorld resource.

Returns:

  • str –

    The observation returned by the environment after performing the action, or an error message if the action is invalid or an exception occurs.

SWE (container workspace)

Tools for SWE-bench style rollouts: they are stateful and expect rollout Context with context.metadata["image_id"] set to the task container image. File tools mount file_manager.py into the container and delegate reads and edits to that helper. run_shell_command runs in the container with working directory /testbed. On first shell acquisition for a command, if context.metadata includes git_commit_hash, the shell tool runs git fetch and git checkout for that commit under /testbed (for example swe-smith style tasks).

File workspace

Defined in agentfly.tools.src.file.tools (also re-exported from agentfly.tools).

read_file(path: str, start_line: Optional[int] = None, end_line: Optional[int] = None, context: Context = None) async

Reads a file from the workspace with line numbers. Args: path: The relative path to the file (under container workspace). start_line: Start line number to read from. end_line: End line number to read to.

Search for a regex pattern across all files in a directory. Args: pattern: The regex/string to search for. path: Directory to search in. include: Glob filter, e.g. "*.py" for Python files only.

create_file(path: str, content: str = '', context: Context = None) async

Create a new file under the workspace. Fails if the path already exists. Args: path: Relative path for the new file. content: Initial file body (default empty).

edit_file(path: str, search_block: str, replace_block: str, context: Context) async

Surgically replaces a block of text in a file. Only replaces the first occurrence of search_block. Args: path: Path to file. search_block: Exact text to find. replace_block: Text to insert.

undo_edit(path: str, context: Context) async

Reverts the last modification made to a specific file. Args: path: The path of the file to revert.

run_python(path: str, timeout: int = 60, context: Context = None) async

Run Python only: executes a script file under the workspace (python3 path; cwd is workspace root). Args: path: Relative path to a .py file inside the workspace. timeout: Max seconds for the subprocess (default 60).

Shell

run_shell_command(cmd: str, context: Context) async

Runs a shell command in the container workspace.

Commands are checked by :class:~.command_filter.CommandFilter before execution; blocked commands return the filter reason (e.g. Blocked: ...) without running.

Parameters:

  • cmd (str) –

    The shell command to run (e.g. "ls -la", "cat file.txt", "pwd"). For multi-line python -c code, use bash ANSI-C quoting so newlines work: python3 -c $'line1\nline2\nline3' (the \n are real newlines inside the container).

  • context (Context) –

    Injected rollout context; used to acquire the container resource.

Retrieval

async_dense_retrieve(query: str) async