Math Reward Functions¶
Math reward functions evaluate agent performance on mathematical problem-solving tasks with various behavioral requirements.
math_equal_reward¶
math_equal_reward(final_response: str, answer: str, **kwargs) -> float
¶
Calculate the reward for the agent's response based on the mathematical equality. - 1.0 if the agent's predicted answer is mathematically equal to the correct answer - 0.0 otherwise
Parameters:
-
prediction(str) –The agent's predicted answer
-
answer(str) –The correct answer
-
**kwargs–Additional arguments (ignored)
Returns:
-
float(float) –The reward value
Source code in src/agentfly/rewards/math_reward.py
Returns 1.0 if the agent's answer is mathematically equivalent to the gold answer, 0.0 otherwise.
math_equal_reward_tool¶
math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float
¶
Calculate the reward for the agent's response based on the mathematical equality and tool usage. - 0.0 if no tool used - 0.1 if tool used but answer incorrect - 1.0 if tool used and answer correct
Parameters:
-
prediction(str) –The agent's predicted answer
-
answer(str) –The correct answer
-
trajectory(List[Dict]) –The agent's conversation trajectory
Returns:
-
dict(float) –A dictionary containing the reward and accuracy - reward (float): The reward value - acc (float): The accuracy value
Source code in src/agentfly/rewards/math_reward.py
Like math_equal_reward but gated on tool usage:
- 0.0 if no tool used
- 0.1 if tool used but answer incorrect
- 1.0 if tool used and answer correct
math_equal_reward_think¶
math_equal_reward_think(final_response: str, answer: str, trajectory: List[Dict]) -> float
¶
This reward function is used to reward the agent for using thinking and tool calling. The reward contains two parts:
format: if there is
Source code in src/agentfly/rewards/math_reward.py
Adds a structured-thinking requirement on top of math_equal_reward_tool: assistant turns must use <think>...</think> tags.
Technical Details¶
Symbolic Math Comparison:
- Uses sympy for mathematical equivalence checking
- Handles LaTeX expressions via parse_latex
- Supports boxed answers: \\boxed{expression}
- Robust to formatting differences
Trajectory Analysis: - Parses conversation messages by role - Detects tool usage through "tool" role messages - Analyzes assistant responses for format compliance - Supports both string and list content formats
Think Tag Parsing:
- Extracts content from <think>...</think> tags
- Handles incomplete or malformed thinking patterns
- Requires both thinking and answer components
Common Usage Patterns:
from agentfly.rewards import math_equal_reward, math_equal_reward_tool
# Basic correctness
result = math_equal_reward(
final_response="\\boxed{42}",
answer="\\boxed{42}",
)
print(result) # {"reward": 1.0}
# Tool usage requirement
trajectory = [
{"role": "assistant", "content": "I need to calculate..."},
{"role": "tool", "content": "calculation result"},
{"role": "assistant", "content": "The answer is \\boxed{42}"},
]
result = math_equal_reward_tool(
final_response="\\boxed{42}",
answer="\\boxed{42}",
trajectory=trajectory,
)
print(result)
Use Cases: - Mathematical problem solving evaluation - Process-based reward for tool usage - Training structured reasoning behaviors - Multi-step mathematical reasoning assessment