Math Reward Functions¶

Math reward functions evaluate agent performance on mathematical problem-solving tasks with various behavioral requirements.

math_equal_reward¶

`math_equal_reward(final_response: str, answer: str, **kwargs) -> float` ¶

Calculate the reward for the agent's response based on the mathematical equality. - 1.0 if the agent's predicted answer is mathematically equal to the correct answer - 0.0 otherwise

Parameters:

prediction (str) –

The agent's predicted answer
answer (str) –

The correct answer
**kwargs –

Additional arguments (ignored)

Returns:

float ( float ) –

The reward value

Source code in src/agentfly/rewards/math_reward.py

@reward(name="math_equal_reward")
def math_equal_reward(final_response: str, answer: str, **kwargs) -> float:
    """
    Calculate the reward for the agent's response based on the mathematical equality.
    - 1.0 if the agent's predicted answer is mathematically equal to the correct answer
    - 0.0 otherwise

    Args:
        prediction (str): The agent's predicted answer
        answer (str): The correct answer
        **kwargs: Additional arguments (ignored)

    Returns:
        float: The reward value
    """
    if symbolic_math_equal(final_response, answer):
        return 1.0
    else:
        return 0.0

Returns 1.0 if the agent's answer is mathematically equivalent to the gold answer, 0.0 otherwise.

math_equal_reward_tool¶

`math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float` ¶

Calculate the reward for the agent's response based on the mathematical equality and tool usage. - 0.0 if no tool used - 0.1 if tool used but answer incorrect - 1.0 if tool used and answer correct

Parameters:

prediction (str) –

The agent's predicted answer
answer (str) –

The correct answer
trajectory (List[Dict]) –

The agent's conversation trajectory

Returns:

dict ( float ) –

A dictionary containing the reward and accuracy - reward (float): The reward value - acc (float): The accuracy value

Source code in src/agentfly/rewards/math_reward.py

@reward(name="math_equal_reward_tool")
def math_equal_reward_tool(
    final_response: str, answer: str, trajectory: List[Dict]
) -> float:
    """
    Calculate the reward for the agent's response based on the mathematical equality and tool usage.
    - 0.0 if no tool used
    - 0.1 if tool used but answer incorrect
    - 1.0 if tool used and answer correct

    Args:
        prediction (str): The agent's predicted answer
        answer (str): The correct answer
        trajectory (List[Dict]): The agent's conversation trajectory

    Returns:
        dict: A dictionary containing the reward and accuracy
            - reward (float): The reward value
            - acc (float): The accuracy value
    """
    has_called_tool = False
    for msg in trajectory:
        if msg["role"] == "tool":
            has_called_tool = True
            break

    reward = 0.0
    answer_correct = symbolic_math_equal(final_response, answer)
    if not has_called_tool:
        reward = 0.0
    elif has_called_tool and not answer_correct:
        reward = 0.1
    elif has_called_tool and answer_correct:
        reward = 1.0
    else:
        raise ValueError(
            f"Invalid prediction or trajectory for math reward with format: Trajectory: {trajectory}"
        )
    return {
        "reward": reward,
        "acc": 1.0 if answer_correct else 0.0,
    }

Like math_equal_reward but gated on tool usage:

0.0 if no tool used
0.1 if tool used but answer incorrect
1.0 if tool used and answer correct

math_equal_reward_think¶

`math_equal_reward_think(final_response: str, answer: str, trajectory: List[Dict]) -> float` ¶

This reward function is used to reward the agent for using thinking and tool calling. The reward contains two parts: format: if there is in the response, and the agent has called the tool, then the reward is 1.0 if the answer is correct, otherwise 0.1. if there is no or the agent has not called the tool, then the reward is 0.0.

Source code in src/agentfly/rewards/math_reward.py

@reward(name="math_equal_reward_think")
def math_equal_reward_think(
    final_response: str, answer: str, trajectory: List[Dict]
) -> float:
    """
    This reward function is used to reward the agent for using thinking and tool calling. The reward contains two parts:
    format: if there is <think> in the response, and the agent has called the tool, then the reward is 1.0 if the answer is correct, otherwise 0.1. if there is no <think> or the agent has not called the tool, then the reward is 0.0.
    """
    has_called_tool = False
    for msg in trajectory:
        if msg["role"] == "tool":
            has_called_tool = True
            break

    all_have_thinking = True
    for msg in trajectory:
        if msg["role"] == "assistant":
            if isinstance(msg["content"], str):
                content = msg["content"]
            elif isinstance(msg["content"], list):
                content = msg["content"][-1]["text"]
            else:
                raise ValueError(f"Invalid content type: {type(msg['content'])}")
            extracted_thinking, extracted_answer = parse_thinking_response(content)
            if extracted_thinking is None:
                all_have_thinking = False

    if not all_have_thinking or not has_called_tool:
        return {
            "reward": 0.0,
            "acc": 0.0,
        }

    answer_correct = symbolic_math_equal(final_response, answer)
    if answer_correct:
        return {
            "reward": 1.0,
            "acc": 1.0,
        }
    else:
        return {
            "reward": 0.1,
            "acc": 0.0,
        }

Adds a structured-thinking requirement on top of math_equal_reward_tool: assistant turns must use <think>...</think> tags.

Technical Details¶

Symbolic Math Comparison: - Uses sympy for mathematical equivalence checking - Handles LaTeX expressions via parse_latex - Supports boxed answers: \\boxed{expression} - Robust to formatting differences

Trajectory Analysis: - Parses conversation messages by role - Detects tool usage through "tool" role messages - Analyzes assistant responses for format compliance - Supports both string and list content formats

Think Tag Parsing: - Extracts content from <think>...</think> tags - Handles incomplete or malformed thinking patterns - Requires both thinking and answer components

Common Usage Patterns:

from agentfly.rewards import math_equal_reward, math_equal_reward_tool

# Basic correctness
result = math_equal_reward(
    final_response="\\boxed{42}",
    answer="\\boxed{42}",
)
print(result)  # {"reward": 1.0}

# Tool usage requirement
trajectory = [
    {"role": "assistant", "content": "I need to calculate..."},
    {"role": "tool", "content": "calculation result"},
    {"role": "assistant", "content": "The answer is \\boxed{42}"},
]
result = math_equal_reward_tool(
    final_response="\\boxed{42}",
    answer="\\boxed{42}",
    trajectory=trajectory,
)
print(result)

Use Cases: - Mathematical problem solving evaluation - Process-based reward for tool usage - Training structured reasoning behaviors - Multi-step mathematical reasoning assessment

Math Reward Functions¶

math_equal_reward¶

math_equal_reward(final_response: str, answer: str, **kwargs) -> float ¶

math_equal_reward_tool¶

math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float ¶

math_equal_reward_think¶

math_equal_reward_think(final_response: str, answer: str, trajectory: List[Dict]) -> float ¶

Technical Details¶

`math_equal_reward(final_response: str, answer: str, **kwargs) -> float` ¶

`math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float` ¶

`math_equal_reward_think(final_response: str, answer: str, trajectory: List[Dict]) -> float` ¶