System architecture
System Architecture¶
AgentFly organizes agentic reinforcement learning into a four-layer system that separates user-facing logic from rollout orchestration and low-level resources:
- Agent Layer: Exposes the main interfaces to users. It abstracts:
- Agents: classes that implement the interaction loop (e.g.,
ReactAgent,HFAgent). - Tools: callable interfaces to external systems (functions, APIs, environments).
-
Rewards: task-specific reward functions used during training. Agentic RL at this layer is decomposed into defining agents, tools, and rewards.
-
Rollout Layer: Builds and runs the agent loop (multi-turn, tool-using trajectories). It calls the model, invokes tools, collects observations, and packages trajectories and rewards to be consumed by the RL trainer (e.g., Verl PPO / GRPO).
-
Context Layer: Acts as the glue between rollout and resources. It:
- Tracks rollout IDs, task metadata, and auxiliary fields.
- Injects contextual information (e.g., gold answers or extra fields) into tools and rewards.
-
Manages which resources (containers, model engines, etc.) are bound to which trajectory IDs.
-
Resource Layer: Implements scalable, asynchronous resource management. It manages pools of resource units such as:
- Docker/enroot containers for environments (code interpreter, ALFWorld, WebShop, ScienceWorld, Chess, etc.).
- Model engines (e.g., async vLLM) used for generation. Resources are allocated, reused, and released asynchronously, enabling high-throughput, multi-turn rollouts.
This architecture is what allows AgentFly to support:
- Multi-turn, tool-rich rollouts with masking-based multi-turn RL.
- Asynchronous generation → tool calling → reward calculation pipelines.
- Scaling to many concurrent environments simply by increasing pool sizes in tool/reward definitions.