Skip to content

Code Execution Reward

code_reward_test(prediction: str, context: Context) -> dict async

Run code in the rollout's Python sandbox and return reward. Uses the same sandbox as the code tool (context.acquire_resource with scope=rollout). Caller must pass context when invoking this reward.

Source code in src/agentfly/rewards/code_reward.py
@reward(name="code_reward_test")
async def code_reward_test(prediction: str, context: Context) -> dict:
    """
    Run code in the rollout's Python sandbox and return reward.
    Uses the same sandbox as the code tool (context.acquire_resource with scope=rollout).
    Caller must pass context when invoking this reward.
    """
    try:
        env = await context.acquire_resource(
            spec=PythonSandboxSpec, scope="global", backend="local"
        )
        result = await env.step(prediction)
        return {"reward": 1.0, "output": result}
    except Exception as e:
        return {"reward": 0.0, "output": str(e)}

Description

Evaluates code execution in a sandboxed Python environment, providing binary success/failure feedback. Returns a dict with reward (1.0 on success, 0.0 on error) and output (stdout/stderr or exception message).

Decorator Configuration: - name: "code_reward_test" - resource_spec: PythonSandboxSpec - backend: "local"

Technical Details

Implementation: - Executes code in isolated Python sandbox - Captures both successful outputs and exceptions - Returns binary reward based on execution success - Provides detailed output for debugging

Error Handling: - Catches all exceptions during code execution - Returns error details in output field - Ensures safe evaluation without affecting host system

Example Usage:

from agentfly.core import Context
from agentfly.rewards.code_reward import code_reward_test

# Inside a rollout, `context` is injected and passed through to rewards.
result = await code_reward_test(
    prediction="print('Hello, World!')",
    context=context,
)
print(result)
# {"reward": 1.0, "output": "Hello, World!"}

result = await code_reward_test(
    prediction="print(undefined_variable)",
    context=context,
)
print(result)
# {"reward": 0.0, "output": "NameError: name 'undefined_variable' is not defined"}

Use Cases: - Code generation evaluation - Programming task assessment - Syntax and runtime error detection - Training code-writing agents

Environment Integration: - Requires active Python sandbox environment - Isolated execution prevents system interference - Supports concurrent code evaluation through pooling