Predefined Rewards¶

The following are predefined reward functions.

`qa_f1_reward(final_response: str, answer: str, trajectory: List[str]) -> float` ¶

Calculate the reward for the agent's response based on the F1 score.

Parameters:

prediction (str) –

The agent's predicted answer
answer (str) –

The correct answer
trajectory (List[str]) –

The agent's conversation trajectory

Returns:

dict ( float ) –

A dictionary containing the reward, F1 score, EM score, precision, and recall - reward (float): The reward value - f1 (float): The F1 score - em (float): The EM score - precision (float): The precision score - recall (float): The recall score

`qa_f1_reward_tool(final_response: str, answer: str, trajectory: List[str]) -> float` ¶

Calculate the reward for the agent's response based on the F1 score and EM score. - 0.0 if no tool used - 0.1 if tool used but answer incorrect - 1.0 if tool used and answer correct

Parameters:

prediction (str) –

The agent's predicted answer
answer (str) –

The correct answer
trajectory (List[str]) –

The agent's conversation trajectory

Returns:

dict ( float ) –

A dictionary containing the reward, F1 score, EM score, precision, and recall - reward (float): The reward value - f1 (float): The F1 score - em (float): The EM score - precision (float): The precision score - recall (float): The recall score

`math_equal_reward(final_response: str, answer: str, **kwargs) -> float` ¶

Calculate the reward for the agent's response based on the mathematical equality. - 1.0 if the agent's predicted answer is mathematically equal to the correct answer - 0.0 otherwise

Parameters:

prediction (str) –

The agent's predicted answer
answer (str) –

The correct answer
**kwargs –

Additional arguments (ignored)

Returns:

float ( float ) –

The reward value

`math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float` ¶

Calculate the reward for the agent's response based on the mathematical equality and tool usage. - 0.0 if no tool used - 0.1 if tool used but answer incorrect - 1.0 if tool used and answer correct

Parameters:

prediction (str) –

The agent's predicted answer
answer (str) –

The correct answer
trajectory (List[Dict]) –

The agent's conversation trajectory

Returns:

dict ( float ) –

A dictionary containing the reward and accuracy - reward (float): The reward value - acc (float): The accuracy value

`webshop_reward(final_response: str, context: Context, task_id: int) -> dict` `async` ¶

Calculates the reward for the WebShop environment based on the environment state. Uses the same rollout resource as the webshop tools (context.acquire_resource).

Parameters:

final_response (str) –

The agent's final response. Not used in this reward function.
context (Context) –

Injected rollout context; used to acquire the WebShop resource.
task_id (int) –

The identifier for the current task. Used to match with golden answer.

Returns:

dict ( dict ) –

A dictionary containing the reward (float) and output (str) from the environment step.

`alfworld_episode_reward(context: Context) -> Dict[str, Any]` `async` ¶

Simple ALFWorld episode reward that checks if the episode is done. Uses the same rollout resource as the alfworld tools (context.acquire_resource).

`scienceworld_reward(final_response: str, context: Context) -> dict` `async` ¶

Computes the reward for a given prediction in the ScienceWorld environment. Uses the same rollout resource as the scienceworld tools (context.acquire_resource).

Parameters:

final_response (str) –

The agent's final response. Not used in this reward function.
context (Context) –

Injected rollout context; used to acquire the ScienceWorld resource.

Returns:

dict ( dict ) –

A dictionary containing the reward and the observation output after taking the 'get_reward' step.

Predefined Rewards¶

qa_f1_reward(final_response: str, answer: str, trajectory: List[str]) -> float ¶

qa_f1_reward_tool(final_response: str, answer: str, trajectory: List[str]) -> float ¶

math_equal_reward(final_response: str, answer: str, **kwargs) -> float ¶

math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float ¶

webshop_reward(final_response: str, context: Context, task_id: int) -> dict async ¶

alfworld_episode_reward(context: Context) -> Dict[str, Any] async ¶

scienceworld_reward(final_response: str, context: Context) -> dict async ¶

`qa_f1_reward(final_response: str, answer: str, trajectory: List[str]) -> float` ¶

`qa_f1_reward_tool(final_response: str, answer: str, trajectory: List[str]) -> float` ¶

`math_equal_reward(final_response: str, answer: str, **kwargs) -> float` ¶

`math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float` ¶

`webshop_reward(final_response: str, context: Context, task_id: int) -> dict` `async` ¶

`alfworld_episode_reward(context: Context) -> Dict[str, Any]` `async` ¶

`scienceworld_reward(final_response: str, context: Context) -> dict` `async` ¶