HiPER is a hierarchical reinforcement learning framework for training large language model agents in long-horizon environments. Instead of treating agent behavior as a flat sequence of actions, HiPER explicitly separates high-level planning from low-level execution, and introduces Hierarchical Advantage Estimation (HAE) for more effective credit assignment across multiple time scales. This repository builds on verl-agent, with extensions to both the agent interface and the training algorithm. [Our webpage is still under construction, check back for updates!]
conda create -n verl python==3.12 -y
conda activate verl
pip3 install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn==2.7.4.post1 --no-build-isolation
pip3 install -e .
pip3 install vllm==0.8.5
pip3 install peft==0.17.1
pip3 install gymnasium==0.29.1
pip3 install stable-baselines3==2.6.0
pip install alfworld
alfworld-download -f
To avoid conflict, it is recommended to install ALFWorld and WebShop separately in two conda environments (e.g. verl-alfworld and verl-webshop). Note that WebShop requires Python version <=3.10
conda create -n verl-webshop python==3.10 -y
conda activate verl-webshop
cd ./agent_system/environments/env_package/webshop/webshop
./setup.sh -d all
cd repo_root/ # replace with your repository root path
pip3 install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn==2.7.4.post1 --no-build-isolation
pip3 install -e .
pip3 install vllm==0.8.2
pip3 install peft==0.17.1
# spacy 3.7.2 requires typer<0.10.0,>=0.3.0, but you have typer 0.15.2 which is incompatible.
# weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.15.2 which is incompatible.
# The above warnings can be ignored.
bash example_scripts/HiPER_trainer/run_alfworld.sh # ALFWorld
bash example_scripts/HiPER_trainer/run_webshop.sh # WebShop
If you find HiPER helpful, please cite our paper below:
@article{peng2026hiper,
title={HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents},
author={Peng, Jiangweizhi and Liu, Yuanxin and Zhou, Ruida and Fleming, Charles and Wang, Zhaoran and Garcia, Alfredo and Hong, Mingyi},
journal={arXiv preprint arXiv:2602.16165},
year={2026}
}
Our codebase is built upon verl-agent and veRL. The environments are adapted from ALFWorld and WebShop. We sincerely thank the authors and contributors of these projects for making their valuable work publicly available.