BaseReward¶
BaseReward(name: str | None = None, func: Callable | None = None, auto_register: bool = True)
¶
Base class for reward functions.
Supports both decorator-based and inheritance-based definitions. All arguments must be passed as keyword arguments (dictionary args), similar to tools.
- Decorator-based (existing pattern):
@reward(name="my_reward")
def my_reward_function(prediction: str, answer: str) -> float:
return 1.0 if prediction == answer else 0.0
# Call with keyword arguments
result = await my_reward(prediction="hello", answer="hello")
- Inheritance-based (new pattern):
class MyReward(BaseReward):
# Class-level metadata (shared across all instances)
name = "my_reward"
def __init__(self, threshold: float = 0.5):
super().__init__() # Class attributes are set at class definition time
self.threshold = threshold # Instance data
def call(self, prediction: str, answer: str) -> float:
"""
Calculate reward based on prediction and answer.
Args:
prediction (str): The predicted answer
answer (str): The correct answer
Returns:
float: The reward value
"""
return 1.0 if prediction == answer else 0.0
# Create instance and call with keyword arguments
my_reward = MyReward()
result = await my_reward(prediction="hello", answer="hello")
Note: Metadata (name, etc.) is stored as class attributes. Stateful rewards get resources via Context (injected by caller): use context.acquire_resource(spec=..., scope="rollout", backend="local").
Parameters:
-
name(str | None, default:None) –Optional name override (for decorator-based rewards)
-
func(Callable | None, default:None) –Optional function (for decorator-based rewards)
-
auto_register(bool, default:True) –Whether to automatically register this reward instance (defaults to True)
Note
- For stateful rewards, accept context: Context and call context.acquire_resource(spec=..., scope="rollout", backend="local").
- Caller must pass context (and any env) in kwargs when invoking the reward.
reward(name: Optional[str] = None, auto_register: bool = True)
¶
Decorator that creates a BaseReward and registers it.
Stateful rewards should accept context: Context and call context.acquire_resource(spec=..., scope="rollout", backend="local"). The caller must pass context (and any other args) when invoking the reward.
Parameters:
-
name(Optional[str], default:None) –The name of the reward (defaults to function name)
-
auto_register(bool, default:True) –Whether to automatically register in REWARD_REGISTRY
Returns:
-
–
A BaseReward class (callable)