Skip to content

BaseReward

BaseReward(name: str | None = None, func: Callable | None = None, auto_register: bool = True)

Base class for reward functions.

Supports both decorator-based and inheritance-based definitions. All arguments must be passed as keyword arguments (dictionary args), similar to tools.

  1. Decorator-based (existing pattern):
@reward(name="my_reward")
def my_reward_function(prediction: str, answer: str) -> float:
    return 1.0 if prediction == answer else 0.0

# Call with keyword arguments
result = await my_reward(prediction="hello", answer="hello")
  1. Inheritance-based (new pattern):
class MyReward(BaseReward):
    # Class-level metadata (shared across all instances)
    name = "my_reward"

    def __init__(self, threshold: float = 0.5):
        super().__init__()  # Class attributes are set at class definition time
        self.threshold = threshold  # Instance data

    def call(self, prediction: str, answer: str) -> float:
        """
        Calculate reward based on prediction and answer.

        Args:
            prediction (str): The predicted answer
            answer (str): The correct answer

        Returns:
            float: The reward value
        """
        return 1.0 if prediction == answer else 0.0

# Create instance and call with keyword arguments
my_reward = MyReward()
result = await my_reward(prediction="hello", answer="hello")

Note: Metadata (name, etc.) is stored as class attributes. Stateful rewards get resources via Context (injected by caller): use context.acquire_resource(spec=..., scope="rollout", backend="local").

Parameters:

  • name (str | None, default: None ) –

    Optional name override (for decorator-based rewards)

  • func (Callable | None, default: None ) –

    Optional function (for decorator-based rewards)

  • auto_register (bool, default: True ) –

    Whether to automatically register this reward instance (defaults to True)

Note
  • For stateful rewards, accept context: Context and call context.acquire_resource(spec=..., scope="rollout", backend="local").
  • Caller must pass context (and any env) in kwargs when invoking the reward.

Functions

__call__(**kwargs)

Allow rewards to be called directly as functions.

Parameters:

  • **kwargs –

    All arguments as keyword arguments (e.g. context, prediction, env).

Returns:

  • –

    reward_value_or_dict or coroutine: Coroutine if async, else direct result.

reward(name: Optional[str] = None, auto_register: bool = True)

Decorator that creates a BaseReward and registers it.

Stateful rewards should accept context: Context and call context.acquire_resource(spec=..., scope="rollout", backend="local"). The caller must pass context (and any other args) when invoking the reward.

Parameters:

  • name (Optional[str], default: None ) –

    The name of the reward (defaults to function name)

  • auto_register (bool, default: True ) –

    Whether to automatically register in REWARD_REGISTRY

Returns:

  • –

    A BaseReward class (callable)