Skip to content

🪽 AgentFly

Training scalable LLM agents with RL (multi-turn, asynchronous tools/rewards, multimodal)

Note

Parts of this documentation are AI-generated. We aim to keep it accurate, but if you find a discrepancy with the code, the code is the source of truth — please open an issue or PR.

Overall Structure

Resources

  • AgentFly: Extensible and Scalable Reinforcement Learning for LLM Agents


    Methods to build LLM agents have evolved from prompt engineering and supervised finetuning to agentic reinforcement learning (agentic RL). AgentFly is an agentic RL framework that tackles bottlenecks in environment interaction, reward calculation, and large-scale training through a four-layer design: an agent layer for defining agents, tools, and rewards; a rollout layer that drives agent loops and collects trajectories; a context layer that injects task metadata and coordinates resources; and a resource layer that manages low-level execution backends such as containers and model engines. With a suite of prebuilt tools and environments (including search, code, and interactive environments), AgentFly enables scalable training of multi-turn, tool-using LLM agents across diverse tasks.

    Read Paper

  • WandB


    The training curves, parameters, rewards, and trajectories.

    Training

  • HuggingFace


    Check out the models on Hugging Face. Agent for code interpreter, retrieval, ScienceWorld, WebShop, etc.

    Explore Model

  • Tutorials


    Check out the tutorials on how to build agents, tools, rewards, and start training.

    Read More

Welcome to join our community!

  • WeChat Group

    Scan to join WeChat group.


    WeChat QR Code

  • Discord

    Join our Discord community.


    Discord QR Code