🪽 AgentFly¶

Training scalable LLM agents with RL (multi-turn, asynchronous tools/rewards, multimodal)

Note

Parts of this documentation are AI-generated. We aim to keep it accurate, but if you find a discrepancy with the code, the code is the source of truth — please open an issue or PR.

Overall Structure

Resources¶

AgentFly Paper 📜GitHub Repo 💻Weights & Biases 📈Models 🤗Tutorials 📚

AgentFly: Extensible and Scalable Reinforcement Learning for LLM Agents

Methods to build LLM agents have evolved from prompt engineering and supervised finetuning to agentic reinforcement learning (agentic RL). AgentFly is an agentic RL framework that tackles bottlenecks in environment interaction, reward calculation, and large-scale training through a four-layer design: an agent layer for defining agents, tools, and rewards; a rollout layer that drives agent loops and collects trajectories; a context layer that injects task metadata and coordinates resources; and a resource layer that manages low-level execution backends such as containers and model engines. With a suite of prebuilt tools and environments (including search, code, and interactive environments), AgentFly enables scalable training of multi-turn, tool-using LLM agents across diverse tasks.

Read Paper

GitHub Repository

Code repository in GitHub.

Explore Code

WandB

The training curves, parameters, rewards, and trajectories.

Training

HuggingFace

Check out the models on Hugging Face. Agent for code interpreter, retrieval, ScienceWorld, WebShop, etc.

Explore Model

Tutorials

Check out the tutorials on how to build agents, tools, rewards, and start training.

Read More

Welcome to join our community!¶

WeChat Group

Scan to join WeChat group.
Discord

Join our Discord community.