Skip to content

Predefined Rewards

The following are predefined reward functions.

qa_f1_reward(final_response: str, answer: str, trajectory: List[str]) -> float

Calculate the reward for the agent's response based on the F1 score.

Parameters:

  • prediction (str) –

    The agent's predicted answer

  • answer (str) –

    The correct answer

  • trajectory (List[str]) –

    The agent's conversation trajectory

Returns:

  • dict ( float ) –

    A dictionary containing the reward, F1 score, EM score, precision, and recall - reward (float): The reward value - f1 (float): The F1 score - em (float): The EM score - precision (float): The precision score - recall (float): The recall score

qa_f1_reward_tool(final_response: str, answer: str, trajectory: List[str]) -> float

Calculate the reward for the agent's response based on the F1 score and EM score. - 0.0 if no tool used - 0.1 if tool used but answer incorrect - 1.0 if tool used and answer correct

Parameters:

  • prediction (str) –

    The agent's predicted answer

  • answer (str) –

    The correct answer

  • trajectory (List[str]) –

    The agent's conversation trajectory

Returns:

  • dict ( float ) –

    A dictionary containing the reward, F1 score, EM score, precision, and recall - reward (float): The reward value - f1 (float): The F1 score - em (float): The EM score - precision (float): The precision score - recall (float): The recall score

math_equal_reward(final_response: str, answer: str, **kwargs) -> float

Calculate the reward for the agent's response based on the mathematical equality. - 1.0 if the agent's predicted answer is mathematically equal to the correct answer - 0.0 otherwise

Parameters:

  • prediction (str) –

    The agent's predicted answer

  • answer (str) –

    The correct answer

  • **kwargs –

    Additional arguments (ignored)

Returns:

  • float ( float ) –

    The reward value

math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float

Calculate the reward for the agent's response based on the mathematical equality and tool usage. - 0.0 if no tool used - 0.1 if tool used but answer incorrect - 1.0 if tool used and answer correct

Parameters:

  • prediction (str) –

    The agent's predicted answer

  • answer (str) –

    The correct answer

  • trajectory (List[Dict]) –

    The agent's conversation trajectory

Returns:

  • dict ( float ) –

    A dictionary containing the reward and accuracy - reward (float): The reward value - acc (float): The accuracy value

webshop_reward(final_response: str, context: Context, task_id: int) -> dict async

Calculates the reward for the WebShop environment based on the environment state. Uses the same rollout resource as the webshop tools (context.acquire_resource).

Parameters:

  • final_response (str) –

    The agent's final response. Not used in this reward function.

  • context (Context) –

    Injected rollout context; used to acquire the WebShop resource.

  • task_id (int) –

    The identifier for the current task. Used to match with golden answer.

Returns:

  • dict ( dict ) –

    A dictionary containing the reward (float) and output (str) from the environment step.

alfworld_episode_reward(context: Context) -> Dict[str, Any] async

Simple ALFWorld episode reward that checks if the episode is done. Uses the same rollout resource as the alfworld tools (context.acquire_resource).

scienceworld_reward(final_response: str, context: Context) -> dict async

Computes the reward for a given prediction in the ScienceWorld environment. Uses the same rollout resource as the scienceworld tools (context.acquire_resource).

Parameters:

  • final_response (str) –

    The agent's final response. Not used in this reward function.

  • context (Context) –

    Injected rollout context; used to acquire the ScienceWorld resource.

Returns:

  • dict ( dict ) –

    A dictionary containing the reward and the observation output after taking the 'get_reward' step.