CLI Reference¶

The agentfly command-line interface is a small dispatcher (src/agentfly/cli.py) that hands argv off to one of four subcommands. Invoke it as either:

python -m agentfly.cli <command> [args...]
# or, if installed as a script:
agentfly <command> [args...]

agentfly --help lists the available commands. The four subcommands are:

Command	Purpose	Style
`train`	RL training (verl PPO)	Hydra
`deploy`	Print/run a `vllm serve` command for local inference	click flags
`swebench`	SWE-Bench-style evaluation over a JSON dataset	click flags
`search`	Standalone dense-retriever HTTP server	env vars

`agentfly train`¶

Runs agentfly.verl.trainer.main_ppo with Hydra-style config overrides. All arguments after train are forwarded to Hydra; there are no flags parsed by the CLI itself.

agentfly train \
    algorithm.adv_estimator=grpo \
    data.train_files=./data/rlhf/math/gsm8k_train.json \
    agent.init_config.agent_type=hf \
    agent.init_config.tools="[calculator]" \
    ...

The full set of override keys you'll typically use is documented in Hydra Config. For an end-to-end walkthrough, see First Training. The upstream Hydra schema lives in verl/verl/trainer/config/ppo_trainer.yaml.

`agentfly deploy`¶

Prints and executes a vllm serve command for serving the model with an OpenAI-compatible API plus chat-template injection (so tool-calling models like Qwen2.5 work out of the box). Useful for setting up a local LLM endpoint that agents can hit via the client backend.

agentfly deploy \
    --model-name-or-path Qwen/Qwen2.5-VL-3B-Instruct \
    --template qwen2.5-vl-system-tool \
    --tp 2 --dp 2 \
    --port 8000

Flag	Type	Default	Purpose
`--model-name-or-path`	str	—	HuggingFace model id or local path.
`--template`	str	`None`	`chat-bricks` template name. If set, the template's Jinja is written under `$AGENT_DATA_DIR/cache/jinja_template.jinja` and passed to `vllm serve` via `--chat-template`.
`--tp`	int	`1`	Tensor-parallel size.
`--pp`	int	`1`	Pipeline-parallel size.
`--dp`	int	`1`	Data-parallel size.
`--gpu-memory-utilization`	float	`0.8`	Forwarded to `vllm serve --gpu-memory-utilization`.
`--tool-call-parser`	str	`hermes`	Forwarded to `vllm serve --tool-call-parser`.
`--port`	int	`8000`	Forwarded to `vllm serve --port`.
`--allowed-local-media-path`	str	`None`	Forwarded to `vllm serve --allowed-local-media-path` (for vision models that load local images).

The command emits and runs:

vllm serve <model> [--chat-template ...] --trust-remote-code \
    --tensor-parallel-size <tp> --pipeline-parallel-size <pp> \
    --data-parallel-size <dp> --port <port> \
    --gpu-memory-utilization <util> \
    --enable-auto-tool-choice --tool-call-parser <parser> \
    [--allowed-local-media-path ...]

`agentfly swebench`¶

Runs an SWE-Bench-style evaluation: loads a JSON dataset of issues, runs an agent (Bash- or Qwen3-Coder-style) over each, evaluates with a registered reward, and writes one JSON per sample plus a run_summary.json.

agentfly swebench \
    --data-path ./data/rlhf/os/swe-bench-verified.json \
    --result-dir ./results/swe/run-2026-04 \
    --model-name-or-path Qwen/Qwen3-32B-Coder \
    --agent qwen3_coder \
    --tool-set bash \
    --reward-name r2e_gym_reward \
    --backend client \
    --vllm-base-url http://localhost:8000/v1

Required¶

Flag	Purpose
`--data-path`	JSON file: a list of instances, or `{"data": [...]}` / `{"instances": [...]}`.
`--result-dir`	Output directory; gets one JSON per sample plus `run_summary.json`.
`--model-name-or-path`	HF model id or local path.

Generation / rollout¶

Flag	Default	Purpose
`--template`	`None`	Optional `chat-bricks` template name.
`--temperature`	`0.0`	Sampling temperature.
`--max-turns`	`30`	Per-sample turn cap.
`--num-chains`	`1`	Parallel chains per sample.
`--max-concurrent-chains`	unlimited	Cap concurrent chains across the batch.

Agent / tools / reward¶

Flag	Choices / default	Purpose
`--agent`	`bash` \| `qwen3_coder` (default)	Which SWE agent class to use.
`--tool-set`	`bash` (default) \| `file`	`bash`: bash only. `file`: file workspace tools (Qwen3-Coder).
`--reward-name`	`r2e_gym_reward` (default)	Registered reward name (e.g. `r2e_gym_reward`, `swe_reward`).

Backend¶

Flag	Choices / default	Purpose
`--backend`	`client` (default) \| `async_vllm`	Use a remote OpenAI-compatible endpoint or a local vLLM engine.
`--vllm-base-url`	`http://localhost:8000/v1`	Endpoint URL when `--backend client`.
`--api-key`	`EMPTY`	API key for the client backend.
`--tensor-parallel-size` / `--tp`	`1`	TP size when `--backend async_vllm`.
`--data-parallel-size` / `--dp`	`1`	DP size when `--backend async_vllm`.
`--max-model-len`	`None`	Optional max sequence length for vLLM.

Resources / scope¶

Flag	Default	Purpose
`--resource-backend`	`local`	`Context` resource backend (e.g. `local`, `ray`).
`--enroot-images-path`	`None`	If set, sets `ENROOT_IMAGES_PATH` for container-backed rewards.
`--enroot-async / --no-enroot-async`	`--enroot-async`	Sets `ENROOT_ASYNC=1` (default) or `0`.

Sampling¶

Flag	Default	Purpose
`--limit`	all	Run at most this many samples (after shuffle).
`--seed`	`None`	Shuffle dataset with this seed.

run_summary.json ends with overall accuracy printed to stdout.

`agentfly search`¶

Starts a FastAPI HTTP server wrapping the dense retriever (E5-base-v2 + FAISS over Wikipedia-18). Used as a remote retriever process so the GPU stays in one process while clients call via HTTP. The matching tool is agentfly.tools.src.search.async_dense_retrieve_api.

There are no CLI flags. Configuration is entirely via environment variables:

Env var	Default	Purpose
`RETRIEVER_CORPUS_FILE`	`$AGENT_CACHE_DIR/data/search/wiki-18.jsonl`	Corpus file path.
`RETRIEVER_INDEX_FILE`	`$AGENT_CACHE_DIR/data/search/e5_Flat.index`	FAISS index file path.
`RETRIEVER_HOST`	`0.0.0.0`	Bind host.
`RETRIEVER_PORT`	`8765`	Bind port.

Endpoints:

Method	Path	Purpose
`GET`	`/health`	Health check + retriever-loaded status.
`POST`	`/search`	Body: `{"query": str, "top_k": int}`. Returns `{"results": [{"contents": str}]}`.

Typical invocation:

export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:${LD_LIBRARY_PATH}"
python -m agentfly.cli search

The corpus and FAISS index are auto-downloaded into $AGENT_CACHE_DIR/data/search/ by agentfly.tools.utils.data.download_tool_data("asyncdense_retrieve") if they're not already present.

Adding a New Subcommand¶

The dispatcher is a few-dozen-line file at src/agentfly/cli.py. Adding a new subcommand is two changes there:

Add a branch to the if command == ... chain that imports your target module and rewrites sys.argv.
Update the --help print block at the top.

The pattern is module-import based, so your subcommand can use any argument-parsing style (Hydra, click, argparse, plain sys.argv) — the dispatcher just hands argv off and calls module.main().