OSPO: One Step Policy Optimization

Article: Zijian Zhao, Sen Li*,"One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms" (under review)

An expanded version will be provided at RS2002/Scale-OSPO.

1. Workflow

2. Simulator

The dataset used in this study is derived from the yellow taxi data in Manhattan.

For route planning, we utilize the Project-OSRM/osrm-backend: Open Source Routing Machine - C++ backend. Specifically, we employ the US Northeast region for our experiments, with the OSRM file available for download at the Geofabrik Download Server. To avoid conflicts with other programs on our device, we chose to use port 6000 instead of the default port 5000. Consequently, you can use the following command in Docker:

docker run -t -i -p 6000:6000 -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-routed --algorithm mld /data/us-northeast-latest.osrm -p 6000

~~The processed data can be found in the ./data directory.~~

Considering the copyright, we have removed the processed data. However, the data processing code is available in the ./data directory. Please download the dataset from the link provided above and use our code to process it.

3. How to Run

python train.py

You can also set different parameters in the process function in Worker.py of GRPO to replicate the ablation study presented in our paper.

4. Parameters

The model parameters and training log files are located in the ./GRPO/parameters and ./OSPO/parameters directory.

5. Citation

@article{zhao2025one,
  title={One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms},
  author={Zhao, Zijian and Li, Sen},
  journal={arXiv preprint arXiv:2507.15351},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
GRPO		GRPO
OSPO		OSPO
data		data
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OSPO: One Step Policy Optimization

1. Workflow

2. Simulator

3. How to Run

4. Parameters

5. Citation

About

Uh oh!

Releases

Packages

Languages

RS2002/OSPO

Folders and files

Latest commit

History

Repository files navigation

OSPO: One Step Policy Optimization

1. Workflow

2. Simulator

3. How to Run

4. Parameters

5. Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages