BaseReward¶

`BaseReward(name: str | None = None, func: Callable | None = None, auto_register: bool = True)` ¶

Base class for reward functions.

Supports both decorator-based and inheritance-based definitions. All arguments must be passed as keyword arguments (dictionary args), similar to tools.

Decorator-based (existing pattern):

@reward(name="my_reward")
def my_reward_function(prediction: str, answer: str) -> float:
    return 1.0 if prediction == answer else 0.0

# Call with keyword arguments
result = await my_reward(prediction="hello", answer="hello")

Inheritance-based (new pattern):

class MyReward(BaseReward):
    # Class-level metadata (shared across all instances)
    name = "my_reward"

    def __init__(self, threshold: float = 0.5):
        super().__init__()  # Class attributes are set at class definition time
        self.threshold = threshold  # Instance data

    def call(self, prediction: str, answer: str) -> float:
        """
        Calculate reward based on prediction and answer.

        Args:
            prediction (str): The predicted answer
            answer (str): The correct answer

        Returns:
            float: The reward value
        """
        return 1.0 if prediction == answer else 0.0

# Create instance and call with keyword arguments
my_reward = MyReward()
result = await my_reward(prediction="hello", answer="hello")

Note: Metadata (name, etc.) is stored as class attributes. Stateful rewards get resources via Context (injected by caller): use context.acquire_resource(spec=..., scope="rollout", backend="local").

Parameters:

name (str | None, default: None ) –

Optional name override (for decorator-based rewards)
func (Callable | None, default: None ) –

Optional function (for decorator-based rewards)
auto_register (bool, default: True ) –

Whether to automatically register this reward instance (defaults to True)

Note

For stateful rewards, accept context: Context and call context.acquire_resource(spec=..., scope="rollout", backend="local").
Caller must pass context (and any env) in kwargs when invoking the reward.

Functions¶

`call(**kwargs)` ¶

Allow rewards to be called directly as functions.

Parameters:

**kwargs –

All arguments as keyword arguments (e.g. context, prediction, env).

Returns:

–

reward_value_or_dict or coroutine: Coroutine if async, else direct result.

`reward(name: Optional[str] = None, auto_register: bool = True)` ¶

Decorator that creates a BaseReward and registers it.

Stateful rewards should accept context: Context and call context.acquire_resource(spec=..., scope="rollout", backend="local"). The caller must pass context (and any other args) when invoking the reward.

Parameters:

name (Optional[str], default: None ) –

The name of the reward (defaults to function name)
auto_register (bool, default: True ) –

Whether to automatically register in REWARD_REGISTRY

Returns:

–

A BaseReward class (callable)

BaseReward¶

BaseReward(name: str | None = None, func: Callable | None = None, auto_register: bool = True) ¶

Functions¶

__call__(**kwargs) ¶

reward(name: Optional[str] = None, auto_register: bool = True) ¶

`BaseReward(name: str | None = None, func: Callable | None = None, auto_register: bool = True)` ¶

`call(**kwargs)` ¶

`reward(name: Optional[str] = None, auto_register: bool = True)` ¶