Article: Zijian Zhao, Sen Li*,"One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms" (under review)
An expanded version will be provided at RS2002/Scale-OSPO.
The dataset used in this study is derived from the yellow taxi data in Manhattan.
For route planning, we utilize the Project-OSRM/osrm-backend: Open Source Routing Machine - C++ backend. Specifically, we employ the US Northeast region for our experiments, with the OSRM file available for download at the Geofabrik Download Server. To avoid conflicts with other programs on our device, we chose to use port 6000 instead of the default port 5000. Consequently, you can use the following command in Docker:
docker run -t -i -p 6000:6000 -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-routed --algorithm mld /data/us-northeast-latest.osrm -p 6000The processed data can be found in the ./data directory.
Considering the copyright, we have removed the processed data. However, the data processing code is available in the ./data directory. Please download the dataset from the link provided above and use our code to process it.
python train.pyYou can also set different parameters in the process function in Worker.py of GRPO to replicate the ablation study presented in our paper.
The model parameters and training log files are located in the ./GRPO/parameters and ./OSPO/parameters directory.
@article{zhao2025one,
title={One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms},
author={Zhao, Zijian and Li, Sen},
journal={arXiv preprint arXiv:2507.15351},
year={2025}
}
