Predefined Rewards¶
The following are predefined reward functions.
qa_f1_reward(final_response: str, answer: str, trajectory: List[str]) -> float
¶
Calculate the reward for the agent's response based on the F1 score.
Parameters:
-
prediction(str) –The agent's predicted answer
-
answer(str) –The correct answer
-
trajectory(List[str]) –The agent's conversation trajectory
Returns:
-
dict(float) –A dictionary containing the reward, F1 score, EM score, precision, and recall - reward (float): The reward value - f1 (float): The F1 score - em (float): The EM score - precision (float): The precision score - recall (float): The recall score
qa_f1_reward_tool(final_response: str, answer: str, trajectory: List[str]) -> float
¶
Calculate the reward for the agent's response based on the F1 score and EM score. - 0.0 if no tool used - 0.1 if tool used but answer incorrect - 1.0 if tool used and answer correct
Parameters:
-
prediction(str) –The agent's predicted answer
-
answer(str) –The correct answer
-
trajectory(List[str]) –The agent's conversation trajectory
Returns:
-
dict(float) –A dictionary containing the reward, F1 score, EM score, precision, and recall - reward (float): The reward value - f1 (float): The F1 score - em (float): The EM score - precision (float): The precision score - recall (float): The recall score
math_equal_reward(final_response: str, answer: str, **kwargs) -> float
¶
Calculate the reward for the agent's response based on the mathematical equality. - 1.0 if the agent's predicted answer is mathematically equal to the correct answer - 0.0 otherwise
Parameters:
-
prediction(str) –The agent's predicted answer
-
answer(str) –The correct answer
-
**kwargs–Additional arguments (ignored)
Returns:
-
float(float) –The reward value
math_equal_reward_tool(final_response: str, answer: str, trajectory: List[Dict]) -> float
¶
Calculate the reward for the agent's response based on the mathematical equality and tool usage. - 0.0 if no tool used - 0.1 if tool used but answer incorrect - 1.0 if tool used and answer correct
Parameters:
-
prediction(str) –The agent's predicted answer
-
answer(str) –The correct answer
-
trajectory(List[Dict]) –The agent's conversation trajectory
Returns:
-
dict(float) –A dictionary containing the reward and accuracy - reward (float): The reward value - acc (float): The accuracy value
webshop_reward(final_response: str, context: Context, task_id: int) -> dict
async
¶
Calculates the reward for the WebShop environment based on the environment state. Uses the same rollout resource as the webshop tools (context.acquire_resource).
Parameters:
-
final_response(str) –The agent's final response. Not used in this reward function.
-
context(Context) –Injected rollout context; used to acquire the WebShop resource.
-
task_id(int) –The identifier for the current task. Used to match with golden answer.
Returns:
-
dict(dict) –A dictionary containing the reward (float) and output (str) from the environment step.
alfworld_episode_reward(context: Context) -> Dict[str, Any]
async
¶
Simple ALFWorld episode reward that checks if the episode is done. Uses the same rollout resource as the alfworld tools (context.acquire_resource).
scienceworld_reward(final_response: str, context: Context) -> dict
async
¶
Computes the reward for a given prediction in the ScienceWorld environment. Uses the same rollout resource as the scienceworld tools (context.acquire_resource).
Parameters:
-
final_response(str) –The agent's final response. Not used in this reward function.
-
context(Context) –Injected rollout context; used to acquire the ScienceWorld resource.
Returns:
-
dict(dict) –A dictionary containing the reward and the observation output after taking the 'get_reward' step.