Merge branch 'main' of github.com:metadriverse/scenarionet

2023-09-08 14:59:55 +01:00
parent 0c75d1edf1 d060ac70c8
commit f251d46083
39 changed files with 1586 additions and 277 deletions
--- a/README.md
+++ b/README.md
@@ -2,26 +2,29 @@

 **Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling**

+[
 [**Webpage**](https://metadriverse.github.io/scenarionet/) |
 [**Code**](https://github.com/metadriverse/scenarionet) |
 [**Video**](https://youtu.be/3bOqswXP6OA) |
 [**Paper**](http://arxiv.org/abs/2306.12241) |
-[**Documentation**](https://scenarionet.readthedocs.io/en/latest/) |
+[**Documentation**](https://scenarionet.readthedocs.io/en/latest/)
+]

 ScenarioNet allows users to load scenarios from real-world dataset like Waymo, nuPlan, nuScenes, l5 and synthetic
 dataset such as procedural generated ones and safety-critical ones generated by adversarial attack.
 The built database provides tools for building training and test sets for ML applications.

-![scenarios](docs/asset/scenarios.png)
-
 Powered by [MetaDrive Simulator](https://github.com/metadriverse/metadrive), the scenarios can be reconstructed for
 various applications like AD stack test, reinforcement learning, imitation learning, scenario generation and so on.

-![sensors](docs/asset/sensor.png)
-
+![system](docs/asset/system_01.png)

 ## Installation

+The detailed installation guidance is available
+at [documentation](https://scenarionet.readthedocs.io/en/latest/install.html).
+A simplest way to do this is as follows.
+
 ```
 # create environment
 conda create -n scenarionet python=3.9
@@ -38,87 +41,21 @@ cd scenarionet
 pip install -e .
 ```

-## Usage
+## API reference

-We provide some explanation and demo for all scripts here.
-**You are encouraged to try them on your own, add ```-h``` or ```--help``` argument to know more details about these
-scripts.**
+All operations and API reference is available at
+our [documentation](https://scenarionet.readthedocs.io/en/latest/operations.html).
+If you already have ScenarioNet installed, you can check all operations by `python -m scenarionet.list`.

-### Convert
+## Citation

-**Waymo**: the following script can convert Waymo tfrecord (version: v1.2, data_bin: training_20s) to Metadrive scenario 
-description and store them at directory ./waymo
+If you used this project in your research, please cite

+```latex
+@article{li2023scenarionet,
+    title={ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling},
+    author={Li, Quanyi and Peng, Zhenghao and Feng, Lan and Duan, Chenda and Mo, Wenjie and Zhou, Bolei and others},
+    journal={arXiv preprint arXiv:2306.12241},
+    year={2023}
+    }
 ```
-python -m scenarionet.convert_waymo -d waymo --raw_data_path /path/to/tfrecords --num_workers=16
-```
-
-**nuPlan**: the following script will convert nuPlan split containing .db files to Metadrive scenario description and
-store them at directory ./nuplan
-
-```
-python -m scenarionet.convert_nuplan -d nuplan -raw_data_path /path/to/.db files --num_workers=16
-```
-
-**nuScenes**: as nuScenes split can be read by specifying version like v1.0-mini and v1.0-training, the following script
-will convert all scenarios in that split
-
-```
-python -m scenarionet.convert_nuscenes -d nuscenes --version v1.0-mini --num_workers=16
-```
-
-**PG**: the following script can generate 10000 scenarios stored at directory ./pg
-
-```
-python -m scenarionet.scripts.convert_pg -d pg --num_workers=16 --num_scenarios=10000
-```
-
-### Merge & move
-
-For merging two or more database, use
-
-```
-python -m scenarionet.merge_database -d /destination/path --from /database1 /2 ... 
-```
-
-As a database contains a path mapping, one should move database folder with the following script instead of ```cp```
-command.
-Using ```--copy_raw_data``` will copy the raw scenario file into target directory and cancel the virtual mapping.
-
-```
-python -m scenarionet.copy_database --to /destination/path --from /source/path
-```
-
-### Verify
-
-The following scripts will check whether all scenarios exist or can be loaded into simulator.
-The missing or broken scenarios will be recorded and stored into the error file. Otherwise, no error file will be
-generated.
-With teh error file, one can build a new database excluding or including the broken or missing scenarios.
-
-**Existence check**
-
-```
-python -m scenarionet.check_existence -d /database/to/check --error_file_path /error/file/path
-```
-
-**Runnable check**
-
-```
-python -m scenarionet.check_simulation -d /database/to/check --error_file_path /error/file/path
-```
-
-**Generating new database**
-
-```
-python -m scenarionet.generate_from_error_file -d /new/database/path --file /error/file/path
-```
-
-### visualization
-
-Visualizing the simulated scenario
-
-```
-python -m scenarionet.run_simulation -d /path/to/database --render --scenario_index
-```
-
--- a/docs/asset/system_01.png
+++ b/docs/asset/system_01.png
--- a/documentation/PG.rst
+++ b/documentation/PG.rst
@@ -0,0 +1,25 @@
+############
+PG
+############
+
+| Website: https://metadriverse.github.io/metadrive/
+| Download: *N/A (collected online)*
+| Paper: https://arxiv.org/pdf/2109.12674.pdf
+
+The PG scenarios are collected by running simulation and record the episodes in MetaDrive simulator.
+The name PG refers to Procedural Generation, which is a technique used to generate maps.
+When a map is determined, the vehicles and objects will be spawned and actuated  according to a hand-crafted rules.
+
+Build PG Database
+===================
+
+If MetaDrive is installed, there is no any further steps required to build the database. Just run the following
+command to generate, i.e. 1000 scenarios::
+
+    python -m scenarionet.convert_pg -d /path/to/pg_database --num_scenarios 1000
+
+
+Known Issues: PG
+==================
+
+N/A
--- a/documentation/README.md
+++ b/documentation/README.md
@@ -3,7 +3,7 @@ This folder contains files for the documentation: [https://scenarionet.readthedo
 To build documents locally, please run the following codes:

 ```
-pip install sphinx sphinx_rtd_theme
+pip install -e .[doc]
 cd scenarionet/documentation
 make html
 ```
--- a/documentation/conf.py
+++ b/documentation/conf.py
@@ -31,7 +31,8 @@ release = '0.1.1'
 # ones.
 extensions = [
    "sphinx.ext.autosectionlabel",
-    "sphinx_rtd_theme"
+    "sphinx_rtd_theme",
+    "sphinxemoji.sphinxemoji"
 ]

 # Add any paths that contain templates here, relative to this directory.
@@ -52,4 +53,6 @@ html_theme = 'sphinx_rtd_theme'
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = []
+# html_static_path = ['custom.css']
+
+# html_css_files = ['custom.css']
--- a/documentation/custom.css
+++ b/documentation/custom.css
@@ -0,0 +1,7 @@
+div.highlight pre {
+    white-space: pre-wrap;       /* Since CSS 2.1 */
+    white-space: -moz-pre-wrap;  /* Mozilla, since 1999 */
+    white-space: -pre-wrap;      /* Opera 4-6 */
+    white-space: -o-pre-wrap;    /* Opera 7 */
+    word-wrap: break-word;       /* Internet Explorer 5.5+ */
+}
--- a/documentation/datasets.rst
+++ b/documentation/datasets.rst
@@ -0,0 +1,27 @@
+#####################
+Datasets
+#####################
+
+Generally, the detailed setup procedure for each dataset can be found at its official document.
+In this section, we still provide a simple guidance about how to setup each dataset in a step by step way,
+saving the time for redirecting to new sites and read the comprehensive guidance.
+
+The content of each subsection is a simplified version based on the official setup procedures of each dataset.
+Thus if you encountered some problems with our simplified setup instructions,
+please read related official documentation.
+
+**Also, we kindly ask you to report the encountered problem when following our procedures.
+We will fix it as best as we can and record it in the troubleshooting section for each dataset.** |:blush:|
+
+.. modify the toctree in index` together
+
+- :ref:`nuplan`
+- :ref:`nuscenes`
+- :ref:`waymo`
+- :ref:`PG`
+- :ref:`lyft`
+- :ref:`new_data`
+
+
+
+
--- a/documentation/description.rst
+++ b/documentation/description.rst
@@ -0,0 +1,7 @@
+.. _desc:
+
+########################
+Scenario Description
+########################
+
+Coming soon! @Zhenghao please help this.
--- a/documentation/example.rst
+++ b/documentation/example.rst
@@ -0,0 +1,142 @@
+#######################
+Waymo Example
+#######################
+
+In this example, we will show you how to convert a small batch of `Waymo <https://waymo.com/intl/en_us/open/>`_ scenarios into the internal Scenario Description.
+After that, the scenarios will be loaded to simulator for closed-loop simulation.
+First of all, please install `MetaDrive <https://github.com/metadriverse/metadrive>`_ and `ScenarioNet <https://github.com/metadriverse/scenarionet>`_ following these steps :ref:`installation`.
+
+1. Setup Waymo toolkit
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For any dataset, this step is necessary after installing ScenarioNet,
+as we need to use the official toolkits of the data provider to parse the original scenario description and convert to our internal scenario description.
+For Waymo data, please install the toolkit via::
+
+    pip install waymo-open-dataset-tf-2-11-0==1.5.0
+
+    # Or install with scenarionet
+    pip install -e .[waymo]
+
+.. note::
+    This package is only supported on Linux platform.
+
+For other datasets like nuPlan and nuScenes, you need to setup `nuplan-devkit <https://github.com/motional/nuplan-devkit>`_ and `nuscenes-devkit <https://github.com/nutonomy/nuscenes-devkit>`_ respectively.
+Guidance on how to setup these datasets and connect them with ScenarioNet can be found at :ref:`datasets`.
+
+2. Prepare Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Access the Waymo motion data at `Google Cloud <https://console.cloud.google.com/storage/browser/waymo_open_dataset_motion_v_1_2_0>`_.
+Download one tfrecord scenario file from ``waymo_open_dataset_motion_v_1_2_0/uncompressed/scenario/training_20s``.
+In this tutorial, we only use the first file ``training_20s.tfrecord-00000-of-01000``.
+Just click the download button |:arrow_down:| on the right side to download it.
+And place the downloaded tfrecord file to a folder. Let's call it ``exp_waymo`` and the structure is like this::
+
+    exp_waymo
+    ├──training_20s.tfrecord-00000-of-01000
+
+.. note::
+    For building database from all scenarios, install ``gsutil`` and use this command:
+    ``gsutil -m cp -r "gs://waymo_open_dataset_motion_v_1_2_0/uncompressed/scenario/training_20s" .``
+    Likewise, place all downloaded tfrecord files to the same folder.
+
+
+3. Build Mini Database
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run the following command to extract scenarios in ``exp_waymo`` to ``exp_converted``::
+
+    python -m scenarionet.convert_waymo -d /path/to/exp_converted/ --raw_data_path /path/to/exp_waymo --num_files=1
+
+.. note::
+    When running ``python -m``, make sure the directory you are at doesn't contain a folder called ``scenarionet``.
+    Otherwise, the running may fail. For more details about the command, use ``python -m scenarionet.convert_waymo -h``
+
+Now all extracted scenarios will be placed in ``exp_converted`` directory.
+If we list the directory with ``ll`` command, the structure will be like::
+
+    exp_converted
+    ├──exp_converted_0
+    ├──exp_converted_1
+    ├──exp_converted_2
+    ├──exp_converted_3
+    ├──exp_converted_4
+    ├──exp_converted_5
+    ├──exp_converted_6
+    ├──exp_converted_7
+    ├──dataset_mapping.pkl
+    ├──dataset_summary.pkl
+
+This is because we use 8 workers to extract the scenarios, and thus the converted scenarios will be stored in 8 subfolders.
+If we go check ``exp_converted_0``, we will see the structure is like::
+
+    ├──sd_waymo_v1.2_2085c5cffcd4727b.pkl
+    ├──sd_waymo_v1.2_27997d88023ff2a2.pkl
+    ├──sd_waymo_v1.2_3ece8d267ce5847c.pkl
+    ├──sd_waymo_v1.2_53e9adfdac0eb822.pkl
+    ├──sd_waymo_v1.2_8e40ffb80dd2f541.pkl
+    ├──sd_waymo_v1.2_df72c5dc77a73ed6.pkl
+    ├──sd_waymo_v1.2_f1f6068fabe77dc8.pkl
+    ├──dataset_mapping.pkl
+    ├──dataset_summary.pkl
+
+Therefore, the subfolder produced by each worker is actually where the converted scenarios are placed.
+To aggregate the scenarios produced by all workers, the ``exp_converted/dataset_mapping.pkl`` stores the mapping
+from `scenario_id` to the path of the target scenario file relative to ``exp_converted``.
+As a result, we can get all scenarios produced by 8 workers by loading the database `exp_converted`.
+
+4. Database Operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Several basic operations are available and allow us to split, merge, move, and check the databases.
+First of all, let's check how many scenarios are included in this database built from ``training_20s.tfrecord-00000-of-01000``::
+
+    python -m scenarionet.num -d /path/to/exp_converted/
+
+It will show that there are totally 61 scenarios.
+For machine learning applications, we usually want to split training/test sets.
+To this end, we can use the following command to build the training set::
+
+    python -m scenarionet.split --from /path/to/exp_converted/ --to /path/to/exp_train --num_scenarios 40
+
+Again, use the following commands to build the test set::
+
+    python -m scenarionet.split --from /path/toexp_converted/ --to /path/to/exp_test --num_scenarios 21 --start_index 40
+
+We add the ``start_index`` argument to select the last 21 scenarios as the test set.
+To ensure that no overlap exists, we can run this command::
+
+    python -m scenarionet.check_overlap --d_1 /path/to/exp_train/ --d_2 /path/to/exp_test/
+
+It will report `No overlapping in two database!`.
+Now, let's suppose that the ``/exp_train/`` and ``/exp_test/`` are two databases built
+from different source and we want to merge them into a larger one.
+This can be achieved by::
+
+    python -m scenarionet.merge --from /path/to/exp_train/ /path/to/exp_test -d /path/to/exp_merged
+
+Let's check if the merged database is the same as the original one::
+
+    python -m scenarionet.check_overlap --d_1 /path/to/exp_merged/ --d_2 /path/to/exp_converted
+
+It will show there are 61 overlapped scenarios.
+Congratulations! Now you are already familiar with some common operations.
+More operations and details is available at :ref:`operations`.
+
+5. Simulation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The database can be loaded to MetaDrive simulator for scenario replay or closed-loop simulation.
+First of all, let's replay scenarios in the ``exp_converted`` database::
+
+    python -m scenarionet.sim -d /path/to/exp_converted
+
+
+By adding ``--render 3D`` flag, we can use 3D renderer::
+
+    python -m scenarionet.sim -d /path/to/exp_converted --render 3D
+
+.. note::
+    ``--render advanced`` enables the advanced deferred rendering pipeline,
+    but an advanced GPU better than RTX 2060 is required.
--- a/documentation/index.rst
+++ b/documentation/index.rst
@@ -1,6 +1,6 @@
-########################
+##########################
 ScenarioNet Documentation
-########################
+##########################


 Welcome to the ScenarioNet documentation!
@@ -17,6 +17,35 @@ You can also visit the `GitHub repo <https://github.com/metadriverse/scenarionet
 Please feel free to contact us if you have any suggestions or ideas!


+.. toctree::
+   :maxdepth: 2
+   :caption: Quick Start
+
+   install.rst
+   example.rst
+   operations.rst
+
+.. modify the toctree in datasets.rst together
+.. toctree::
+   :maxdepth: 1
+   :caption: Supported Dataset
+
+   datasets.rst
+   nuplan.rst
+   nuscenes.rst
+   waymo.rst
+   PG.rst
+   lyft.rst
+   new_data.rst
+
+.. toctree::
+   :maxdepth: 2
+   :caption: System Design
+
+   description.rst
+   simulation.rst
+
+
 Citation
 ########

--- a/documentation/install.rst
+++ b/documentation/install.rst
@@ -0,0 +1,95 @@
+.. _install:
+
+########################
+Installation
+########################
+
+
+1. Conda Environment
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ScenarioNet repo contains tools for converting scenarios and building database from various data sources.
+We recommend to create a new conda environment and install Python>=3.8,<=3.9::
+
+    conda create -n scenarionet python==3.9
+    conda activate scenarionet
+
+2. Make a New Folder (Optional)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In addition, the operations in ScenarioNet are executed as Python module ``python -m``, and thus we have to make sure
+the working directory contains NO folders named ``metadrive`` or ``scenarionet``.
+Therefore, we strongly recommend creating a new folder under your routine working directory.
+For example, supposing you prefer working at ``/home/lee``,
+it would be greate to have a new folder ``mdsn`` created under this path.
+And the ``git clone`` and package installation should happen in this new directory.
+As a result, the directory tree should look like this::
+
+    /home/lee/
+    ├──mdsn
+        ├──metadrive
+        ├──scenarionet
+    ├──...
+
+In this way, you can freely run the dataset operations at any places other than ``/home/lee/mdsn``.
+Now, let's move to this new directory for further installation with ``cd mdsn``.
+
+.. note::
+    This step is optional. One can still ``git clone`` and ``pip install`` the following two packages at any places.
+    If any ``python -m scenarionet.[command]`` fails to run, please check if there is a folder called `metadrive`
+    or `scenarionet` contained in the current directory. If so, please switch to a new directory to avoid this issue.
+
+3. Install MetaDrive
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The simulation part is maintained in `MetaDrive <https://github.com/metadriverse/metadrive>`_ repo, and let's install MetaDrive first.
+The installation of MetaDrive on different platforms is straightforward and easy!
+We recommend to install in the following ways::
+
+    # Method 1 (Recommend, latest version, source code exposed)
+    git clone git@github.com:metadriverse/metadrive.git
+    cd metadrive
+    pip install -e.
+
+    # Method 2 (Stable version, source code hidden)
+    pip install "metadrive-simulator>=0.4.1.1"
+
+To check whether MetaDrive is successfully installed, please run::
+
+    python -m metadrive.examples.profile_metadrive
+
+.. note:: Please do not run the above command at a directory that has a sub-folder called :code:`./metadrive`.
+
+4. Install ScenarioNet
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For ScenarioNet, we only provide Github installation::
+
+    git clone git@github.com:metadriverse/scenarionet.git
+    cd scenarionet
+
+Anyone of the following commands will automatically install basic requirements with additional requirements
+for specific datasets::
+
+    # Install basic requirement only
+    pip install -e .
+
+    # Install  Waymo official toolkit
+    pip install -e .[waymo]
+
+    # Install nuScenes development tookit
+    pip install -e .[nuscenes]
+
+    # Install nuPlan development tookit
+    pip install -e .[nuplan]
+
+    # Install all toolkit for all datasets
+    pip install -e .[all]
+
+
+.. note::
+    If you don't wanna access the source code, you can install these two packages with
+    ``pip install git+https://github.com/metadriverse/scenarionet.git``
+    and ``pip install git+https://github.com/metadriverse/metadrive.git``.
+    Though it is more straightforward, one has to install additional requirements, like development
+    toolkits, manually.
--- a/documentation/lyft.rst
+++ b/documentation/lyft.rst
@@ -0,0 +1,31 @@
+.. _lyft:
+
+####################
+Lyft (In Upgrading)
+####################
+
+| Website: https://woven.toyota/en/prediction-dataset
+| Download: https://woven-planet.github.io/l5kit/dataset.html
+| Paper: https://proceedings.mlr.press/v155/houston21a/houston21a.pdf
+
+This dataset includes the logs of movement of cars, cyclists, pedestrians,
+and other traffic agents encountered by our automated fleet.
+These logs come from processing raw lidar, camera, and radar data through our team’s perception systems and are ideal
+for training motion prediction models.
+The dataset includes:
+
+- 1000+ hours of traffic agent movement
+- 16k miles of data from 23 vehicles
+- 15k semantic map annotations
+
+.. note::
+    Currently, the old Lyft dataset can be read by ``nuscenes-toolkit`` and thus can share the nuScenes convertor.
+    The new Lyft data is now maintained by Woven Planet and we are working on support the ``L5Kit`` for allowing
+    using new Lyft data.
+
+
+Known Issues: Lyft
+===================
+
+N/A
+
--- a/documentation/new_data.rst
+++ b/documentation/new_data.rst
@@ -0,0 +1,53 @@
+.. _new_data:
+
+#############################
+New dataset support
+#############################
+
+We believe it is the effort from community makes it closer to a `ImageNet` of autonomous driving field.
+And thus we encourage the community to contribute convertors for new datasets.
+To build a convertor compatible to our APIs, one should follow the following steps.
+We recommend to take a look at our existing convertors.
+They are good examples and can be adjusted to parse the new dataset.
+Besides, **we are very happy to provide helps if you are working on supporting new datasets!**
+
+**1. Convertor function input/output**
+
+Take the ``convert_waymo(scenario:scenario_pb2.Scenario(), version:str)->metadrive.scenario.ScenarioDescription`` as an example.
+It takes a scenario recorded in Waymo original format as example and returned a ``ScenarioDescription`` which is actually a nested Python ``dict``.
+We just extend the functions of ``dict`` object to pre-define a structure with several required fields to fill out.
+
+The required fields can be found at :ref:`desc`.
+Apart from basic information like ``version`` and ``scenario_id``, there are mainly three fields that needs to fill:
+``tracks``, ``map_features`` and ``dynamic_map_states``,
+which stores objects information, map structure and traffic light states respectively.
+These information can be extracted with the toolkit coupled with the original data.
+
+**2. Fill in the object data**
+
+
+By parsing the ``scenario`` with the official APIs, we can extract the history of all objects easily.
+Generally, the object information is stored in a *frame-centric* way, which means the querying API takes the timestep as
+input and returns all objects present in this frame.
+However, ScenarioNet requires an *object-centric* object history.
+In this way, we can easily know how many objects present in the scenario and retrieve the trajectory of each object with
+its object_id.
+Thus, a convert from *frame-centric* description to *object-centric* description is required.
+A good example for this is the ``extract_traffic(scenario: NuPlanScenario, center)`` function in `nuplan/utils.py <https://github.com/metadriverse/scenarionet/blob/e6831ff972ed0cd57fdcb6a8a63650c12694479c/scenarionet/converter/nuplan/utils.py#L343>`_.
+Similarly, the traffic light states can be extracted from the original data and represented in an *object-centric* way.
+
+**3. Fill in the map data**
+
+Map data consists of lane center lines and various kinds of boundaries such as yellow solid lines and white broken lines.
+All of them are actually lines represented by a list of points.
+To fill in this field, the first step is to query all line objects in the region where the scenario is collected.
+By traversing the line list, and extracting the line type and point list, we can build a map which is ready to be
+loaded into MetaDrive.
+
+**4. Extend the description**
+
+As long as all mandatory fields are filled, one can add new key-value pairs in each level of the nested-dict.
+For example, the some scenarios are labeled with the type of scenario and the behavior of surrounding objects.
+It is absolutely ok to include these information to the scenario descriptions and use them in the later experiments.
+
+
--- a/documentation/nuplan.rst
+++ b/documentation/nuplan.rst
@@ -0,0 +1,144 @@
+#############################
+nuPlan
+#############################
+
+| Website: https://www.nuplan.org/nuplan
+| Download: https://www.nuplan.org/nuplan (Registration required)
+| Paper: https://arxiv.org/pdf/2106.11810.pdf
+| Documentation: https://nuplan-devkit.readthedocs.io/en/latest/
+
+nuPlan is the world's first large-scale planning benchmark for autonomous driving.
+It provides a large-scale dataset with 1200h of human driving data from 4 cities across the US and Asia with widely varying traffic patterns (Boston, Pittsburgh, Las Vegas and Singapore).
+Our dataset is auto-labeled using a state-of-the-art Offline Perception system.
+Contrary to existing datasets of this size, it not only contains the 3d boxes of the objects detected in the dataset,
+but also provides 10% of the raw sensor data (120h).
+We hope this large-scale sensor data can be used to make further progress in the field of end-to-end planning.
+
+1. Install nuPlan Toolkit
+==========================
+
+First of all, we have to install the ``nuplan-devkit``.
+
+.. code-block:: bash
+
+    # 1. install from github (Recommend)
+    git clone git@github.com:nutonomy/nuplan-devkit.git
+    cd nuplan-devkit
+    pip install -r requirements.txt
+    pip install -e .
+
+    # 2. or install from PyPI
+    pip install nuplan-devkit
+
+    # 3. or install with scenarionet
+    pip install -e .[nuplan]
+
+By installing from github, you can access examples and source code the toolkit.
+The examples are useful to verify whether the installation and dataset setup is correct or not.
+
+2. Download nuPlan Data
+===========================
+
+The official data setup page is at https://nuplan-devkit.readthedocs.io/en/latest/dataset_setup.html.
+Despite this, we provide a simplified download instruction for convenient.
+First of all, you need to register on the https://www.nuplan.org/nuplan and go to the Download section.
+There are three types of data: Sensor, Map, Split.
+We only use the last two kind of data, the sensor data is not required by ScenarioNet.
+Thus please download the following files:
+
+- nuPlan Maps
+- nuPlan Mini(Train/Test/Val) Split
+
+.. note::
+    Please download the latest version (V1.1).
+
+
+We recommend to download the mini split to test and make yourself familiar with the setup process.
+All downloaded files are ``.tgz`` files and can be uncompressed by ``tar -zxf xyz.tgz``.
+All data should be placed to ``~/nuplan/dataset`` and the folder structure should comply `file hierarchy <https://nuplan-devkit.readthedocs.io/en/latest/dataset_setup.html#filesystem-hierarchy>`_.
+
+.. code-block:: text
+
+    ~/nuplan
+    ├── exp
+    │   └── ${USER}
+    │       ├── cache
+    │       │   └── <cached_tokens>
+    │       └── exp
+    │           └── my_nuplan_experiment
+    └── dataset
+        ├── maps
+        │   ├── nuplan-maps-v1.0.json
+        │   ├── sg-one-north
+        │   │   └── 9.17.1964
+        │   │       └── map.gpkg
+        │   ├── us-ma-boston
+        │   │   └── 9.12.1817
+        │   │       └── map.gpkg
+        │   ├── us-nv-las-vegas-strip
+        │   │   └── 9.15.1915
+        │   │       └── map.gpkg
+        │   └── us-pa-pittsburgh-hazelwood
+        │       └── 9.17.1937
+        │           └── map.gpkg
+        └── nuplan-v1.1
+            ├── splits
+            │     ├── mini
+            │     │    ├── 2021.05.12.22.00.38_veh-35_01008_01518.db
+            │     │    ├── 2021.06.09.17.23.18_veh-38_00773_01140.db
+            │     │    ├── ...
+            │     │    └── 2021.10.11.08.31.07_veh-50_01750_01948.db
+            │     └── trainval
+            │          ├── 2021.05.12.22.00.38_veh-35_01008_01518.db
+            │          ├── 2021.06.09.17.23.18_veh-38_00773_01140.db
+            │          ├── ...
+            │          └── 2021.10.11.08.31.07_veh-50_01750_01948.db
+            └── sensor_blobs
+                  ├── 2021.05.12.22.00.38_veh-35_01008_01518
+                  │    ├── CAM_F0
+                  │    │     ├── c082c104b7ac5a71.jpg
+                  │    │     ├── af380db4b4ca5d63.jpg
+                  │    │     ├── ...
+                  │    │     └── 2270fccfb44858b3.jpg
+                  │    ├── CAM_B0
+                  │    ├── CAM_L0
+                  │    ├── CAM_L1
+                  │    ├── CAM_L2
+                  │    ├── CAM_R0
+                  │    ├── CAM_R1
+                  │    ├── CAM_R2
+                  │    └──MergedPointCloud
+                  │         ├── 03fafcf2c0865668.pcd
+                  │         ├── 5aee37ce29665f1b.pcd
+                  │         ├── ...
+                  │         └── 5fe65ef6a97f5caf.pcd
+                  │
+                  ├── 2021.06.09.17.23.18_veh-38_00773_01140
+                  ├── ...
+                  └── 2021.10.11.08.31.07_veh-50_01750_01948
+
+
+After downloading the data, you should add the following variables to ``~/.bashrc`` to make sure the ``nuplan-devkit`` can find the data::
+
+    export NUPLAN_DATA_ROOT="$HOME/nuplan/dataset"
+    export NUPLAN_MAPS_ROOT="$HOME/nuplan/dataset/maps"
+    export NUPLAN_EXP_ROOT="$HOME/nuplan/exp"
+
+After this step, the examples in ``nuplan-devkit`` is supposed to work well.
+Please try ``nuplan-devkit/tutorials/nuplan_scenario_visualization.ipynb`` and see if the demo code can successfully run.
+
+3. Build nuPlan Database
+============================
+
+With all aforementioned steps finished, the nuPlan data can be stored in our internal format and composes a database.
+Here we take converting raw data in ``nuplan-mini`` as an example::
+
+    python -m scenarionet.convert_nuplan -d /path/to/your/database --raw_data_path ~/nuplan/dataset/nuplan-v1.1/splits/mini
+
+The ``raw_data_path`` is the place to store ``.db`` files. Other arguments is available by using `-h` flag.
+Now all converted scenarios will be placed at ``/path/to/your/database`` and are ready to be used in your work.
+
+Known Issues: nuPlan
+======================
+
+N/A
--- a/documentation/nuscenes.rst
+++ b/documentation/nuscenes.rst
@@ -0,0 +1,111 @@
+#############################
+nuScenes
+#############################
+
+| Website: https://www.nuscenes.org/nuscenes
+| Download: https://www.nuscenes.org/nuscenes (Registration required)
+| Paper: https://arxiv.org/pdf/1903.11027.pdf
+
+The nuScenes dataset (pronounced /nuːsiːnz/) is a public large-scale dataset for autonomous driving developed by the team at Motional (formerly nuTonomy).
+Motional is making driverless vehicles a safe, reliable, and accessible reality.
+By releasing a subset of our data to the public,
+Motional aims to support public research into computer vision and autonomous driving.
+
+For this purpose nuScenes contains 1000 driving scenes in Boston and Singapore,
+two cities that are known for their dense traffic and highly challenging driving situations.
+The scenes of 20 second length are manually selected to show a diverse and interesting set of driving maneuvers,
+traffic situations and unexpected behaviors.
+The rich complexity of nuScenes will encourage development of methods that enable safe driving in urban areas with dozens of objects per scene.
+Gathering data on different continents further allows us to study the generalization of computer vision algorithms across different locations, weather conditions, vehicle types, vegetation, road markings and left versus right hand traffic.
+
+
+1. Install nuScenes Toolkit
+============================
+
+First of all, we have to install the ``nuscenes-devkit``.
+
+.. code-block:: bash
+
+    # install from github (Recommend)
+    git clone git@github.com:nutonomy/nuscenes-devkit.git
+    cd nuscenes-devkit/setup
+    pip install -e .
+
+    # or install from PyPI
+    pip install nuscenes-devkit
+
+    # or install with scenarionet
+    pip install -e .[nuscenes]
+
+By installing from github, you can access examples and source code the toolkit.
+The examples are useful to verify whether the installation and dataset setup is correct or not.
+
+
+2. Download nuScenes Data
+==============================
+
+The official instruction is available at https://github.com/nutonomy/nuscenes-devkit#nuscenes-setup.
+Here we provide a simplified installation procedure.
+
+First of all, please complete the registration on nuScenes website: https://www.nuscenes.org/nuscenes.
+After this, go to the Download section and download the following files/expansions:
+
+- mini/train/test splits
+- Can bus expansion
+- Map expansion
+
+We recommend to download the mini split first to verify and get yourself familiar with the process.
+All downloaded files are ``.tgz`` files and can be uncompressed by ``tar -zxf xyz.tgz``.
+
+Secondly, all files should be organized to the following structure::
+
+    /nuscenes/data/path/
+    ├── maps/
+    |   ├──basemap/
+    |   ├──prediction/
+    |   ├──expansion/
+    |   └──...
+    ├── samples/
+    |   ├──CAM_BACK
+    |   └──...
+    ├── sweeps/
+    |   ├──CAM_BACK
+    |   └──...
+    ├── v1.0-mini/
+    |   ├──attribute.json
+    |   ├──calibrated_sensor.json
+    |   ├──map.json
+    |   ├──log.json
+    |   ├──ego_pose.json
+    |   └──...
+    └── v1.0-trainval/
+
+
+The ``/nuscenes/data/path`` should be ``/data/sets/nuscenes`` by default according to the official instructions,
+allowing the ``nuscens-devkit`` to find it.
+But you can still place it to any other places and:
+
+- build a soft link connect your data folder and ``/data/sets/nuscenes``
+- or specify the ``dataroot`` when calling nuScenes APIs and our convertors.
+
+
+After this step, the examples in ``nuscenes-devkit`` is supposed to work well.
+Please try ``nuscenes-devkit/python-sdk/tutorials/nuscenes_tutorial.ipynb`` and see if the demo can successfully run.
+
+3. Build nuScenes Database
+===========================
+
+After setup the raw data, convertors in ScenarioNet can read the raw data, convert scenario format and build the database.
+Here we take converting raw data in ``nuscenes-mini`` as an example::
+
+    python -m scenarionet.convert_nuscenes -d /path/to/your/database --version v1.0-mini --dataroot /nuscens/data/path
+
+The ``version`` is to determine which split to convert. ``dataroot`` is set to ``/data/sets/nuscenes`` by default,
+but you need to specify it if your data is stored in any other directory.
+Now all converted scenarios will be placed at ``/path/to/your/database`` and are ready to be used in your work.
+
+
+Known Issues: nuScenes
+=======================
+
+N/A
--- a/documentation/operations.rst
+++ b/documentation/operations.rst
@@ -0,0 +1,500 @@
+###########
+Operations
+###########
+
+How to run
+~~~~~~~~~~
+
+We provide various basic operations allowing users to modify the built database for ML applications.
+These operations include building database from different data providers;aggregating datasets from diverse source;
+splitting datasets to training/test set;sanity check/filtering scenarios.
+All commands can be run with ``python -m scenarionet.[command]``, e.g. ``python -m scenarionet.list`` for listing available operations.
+The parameters for each script can be found by adding a ``-h`` flag.
+
+.. note::
+    When running ``python -m``, make sure the directory you are at doesn't contain a folder called ``scenarionet``.
+    Otherwise, the running may fail.
+    This usually happens if you install ScenarioNet or MetaDrive via ``git clone`` and put it under a directory you usually work with like home directory.
+
+List
+~~~~~
+
+This command can list all operations with detailed descriptions::
+
+    python -m scenarionet.list
+
+
+Convert
+~~~~~~~~
+
+.. generated by python -m convert.command -h | fold -w 80
+
+**ScenarioNet doesn't provide any data.**
+Instead, it provides converters to parse common open-sourced driving datasets to an internal scenario description, which comprises scenario databases.
+Thus converting scenarios to our internal scenario description is the first step to build the databases.
+Currently,we provide convertors for Waymo, nuPlan, nuScenes (Lyft) datasets.
+
+Convert Waymo
+------------------------
+
+.. code-block:: text
+
+    python -m scenarionet.convert_waymo [-h] [--database_path DATABASE_PATH]
+                            [--dataset_name DATASET_NAME] [--version VERSION]
+                            [--overwrite] [--num_workers NUM_WORKERS]
+                            [--raw_data_path RAW_DATA_PATH]
+                            [--start_file_index START_FILE_INDEX]
+                            [--num_files NUM_FILES]
+
+    Build database from Waymo scenarios
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            A directory, the path to place the converted data
+      --dataset_name DATASET_NAME, -n DATASET_NAME
+                            Dataset name, will be used to generate scenario files
+      --version VERSION, -v VERSION
+                            version
+      --overwrite           If the database_path exists, whether to overwrite it
+      --num_workers NUM_WORKERS
+                            number of workers to use
+      --raw_data_path RAW_DATA_PATH
+                            The directory stores all waymo tfrecord
+      --start_file_index START_FILE_INDEX
+                            Control how many files to use. We will list all files
+                            in the raw data folder and select
+                            files[start_file_index: start_file_index+num_files]
+      --num_files NUM_FILES
+                            Control how many files to use. We will list all files
+                            in the raw data folder and select
+                            files[start_file_index: start_file_index+num_files]
+
+
+
+This script converted the recorded scenario into our scenario descriptions.
+Detailed guide is available at Section :ref:`waymo`.
+
+Convert nuPlan
+-------------------------
+
+.. code-block:: text
+
+    python -m scenarionet.convert_nuplan [-h] [--database_path DATABASE_PATH]
+                         [--dataset_name DATASET_NAME] [--version VERSION]
+                         [--overwrite] [--num_workers NUM_WORKERS]
+                         [--raw_data_path RAW_DATA_PATH] [--test]
+
+    Build database from nuPlan scenarios
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            A directory, the path to place the data
+      --dataset_name DATASET_NAME, -n DATASET_NAME
+                            Dataset name, will be used to generate scenario files
+      --version VERSION, -v VERSION
+                            version of the raw data
+      --overwrite           If the database_path exists, whether to overwrite it
+      --num_workers NUM_WORKERS
+                            number of workers to use
+      --raw_data_path RAW_DATA_PATH
+                            the place store .db files
+      --test                for test use only. convert one log
+
+
+This script converted the recorded nuPlan scenario into our scenario descriptions.
+It needs to install ``nuplan-devkit`` and download the source data from https://www.nuscenes.org/nuplan.
+Detailed guide is available at Section :ref:`nuplan`.
+
+Convert nuScenes (Lyft)
+------------------------------------
+
+.. code-block:: text
+
+    python -m scenarionet.convert_nuscenes [-h] [--database_path DATABASE_PATH]
+                               [--dataset_name DATASET_NAME] [--version VERSION]
+                               [--overwrite] [--num_workers NUM_WORKERS]
+
+    Build database from nuScenes/Lyft scenarios
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            directory, The path to place the data
+      --dataset_name DATASET_NAME, -n DATASET_NAME
+                            Dataset name, will be used to generate scenario files
+      --version VERSION, -v VERSION
+                            version of nuscenes data, scenario of this version
+                            will be converted
+      --overwrite           If the database_path exists, whether to overwrite it
+      --num_workers NUM_WORKERS
+                            number of workers to use
+
+
+
+This script converted the recorded nuScenes scenario into our scenario descriptions.
+It needs to install ``nuscenes-devkit`` and download the source data from https://www.nuscenes.org/nuscenes.
+For Lyft datasets, this API can only convert the old version Lyft data as the old Lyft data can be parsed via `nuscenes-devkit`.
+However, Lyft is now a part of Woven Planet and the new data has to be parsed via new toolkit.
+We are working on support this new toolkit to support the new Lyft dataset.
+Detailed guide is available at Section :ref:`nuscenes`.
+
+Convert PG
+-------------------------
+
+.. code-block:: text
+
+    python -m scenarionet.convert_pg [-h] [--database_path DATABASE_PATH]
+                         [--dataset_name DATASET_NAME] [--version VERSION]
+                         [--overwrite] [--num_workers NUM_WORKERS]
+                         [--num_scenarios NUM_SCENARIOS]
+                         [--start_index START_INDEX]
+
+    Build database from synthetic or procedurally generated scenarios
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            directory, The path to place the data
+      --dataset_name DATASET_NAME, -n DATASET_NAME
+                            Dataset name, will be used to generate scenario files
+      --version VERSION, -v VERSION
+                            version
+      --overwrite           If the database_path exists, whether to overwrite it
+      --num_workers NUM_WORKERS
+                            number of workers to use
+      --num_scenarios NUM_SCENARIOS
+                            how many scenarios to generate (default: 30)
+      --start_index START_INDEX
+                            which index to start
+
+
+
+PG refers to Procedural Generation.
+Scenario database generated in this way are created by a set of rules with hand-crafted maps.
+These scenarios are collected by driving the ego car with an IDM policy in different scenarios.
+Detailed guide is available at Section :ref:`pg`.
+
+
+Merge
+~~~~~~~~~
+
+This command is for merging existing databases to build a larger one.
+This is why we can build a ScenarioNet!
+After converting data recorded in different format to this unified scenario description,
+we can aggregate them freely and enlarge the database.
+
+.. code-block:: text
+
+    python -m scenarionet.merge [-h] --database_path DATABASE_PATH --from FROM [FROM ...]
+                    [--exist_ok] [--overwrite] [--filter_moving_dist]
+                    [--sdc_moving_dist_min SDC_MOVING_DIST_MIN]
+
+    Merge a list of databases. e.g. scenario.merge --from db_1 db_2 db_3...db_n
+    --to db_dest
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    --database_path DATABASE_PATH, -d DATABASE_PATH
+                        The name of the new combined database. It will create
+                        a new directory to store dataset_summary.pkl and
+                        dataset_mapping.pkl. If exists_ok=True, those two .pkl
+                        files will be stored in an existing directory and turn
+                        that directory into a database.
+    --from FROM [FROM ...]
+                        Which datasets to combine. It takes any number of
+                        directory path as input
+    --exist_ok            Still allow to write, if the dir exists already. This
+                        write will only create two .pkl files and this
+                        directory will become a database.
+    --overwrite           When exists ok is set but summary.pkl and map.pkl
+                        exists in existing dir, whether to overwrite both
+                        files
+    --filter_moving_dist  add this flag to select cases with SDC moving dist >
+                        sdc_moving_dist_min
+    --sdc_moving_dist_min SDC_MOVING_DIST_MIN
+                        Selecting case with sdc_moving_dist > this value. We
+                        will add more filter conditions in the future.
+
+
+Split
+~~~~~~~~~~
+
+The split action is for extracting a part of scenarios from an existing one and building a new database.
+This is usually used to build training/test/validation set.
+
+.. code-block:: text
+
+    python -m scenarionet.split [-h] --from FROM --to TO [--num_scenarios NUM_SCENARIOS]
+                [--start_index START_INDEX] [--random] [--exist_ok]
+                [--overwrite]
+
+    Build a new database containing a subset of scenarios from an existing
+    database.
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --from FROM           Which database to extract data from.
+      --to TO               The name of the new database. It will create a new
+                            directory to store dataset_summary.pkl and
+                            dataset_mapping.pkl. If exists_ok=True, those two .pkl
+                            files will be stored in an existing directory and turn
+                            that directory into a database.
+      --num_scenarios NUM_SCENARIOS
+                            how many scenarios to extract (default: 30)
+      --start_index START_INDEX
+                            which index to start
+      --random              If set to true, it will choose scenarios randomly from
+                            all_scenarios[start_index:]. Otherwise, the scenarios
+                            will be selected sequentially
+      --exist_ok            Still allow to write, if the to_folder exists already.
+                            This write will only create two .pkl files and this
+                            directory will become a database.
+      --overwrite           When exists ok is set but summary.pkl and map.pkl
+                            exists in existing dir, whether to overwrite both
+                            files
+
+
+
+Copy (Move)
+~~~~~~~~~~~~~~~~
+
+As the the database built by ScenarioNet stores the scenarios with virtual mapping,
+directly move or copy an existing database to a new location with ``cp`` or ``mv`` command will break the soft link.
+For moving or copying the scenarios to a new path, one should use this command.
+When ``--remove_source`` is added, this ``copy`` command will be changed to ``move``.
+
+.. code-block:: text
+
+    python -m scenarionet.copy [-h] --from FROM --to TO [--remove_source] [--copy_raw_data]
+                   [--exist_ok] [--overwrite]
+
+    Move or Copy an existing database
+
+    optional arguments:
+      -h, --help       show this help message and exit
+      --from FROM      Which database to move.
+      --to TO          The name of the new database. It will create a new
+                       directory to store dataset_summary.pkl and
+                       dataset_mapping.pkl. If exists_ok=True, those two .pkl
+                       files will be stored in an existing directory and turn that
+                       directory into a database.
+      --remove_source  Remove the `from_database` if set this flag
+      --copy_raw_data  Instead of creating virtual file mapping, copy raw
+                       scenario.pkl file
+      --exist_ok       Still allow to write, if the to_folder exists already. This
+                       write will only create two .pkl files and this directory
+                       will become a database.
+      --overwrite      When exists ok is set but summary.pkl and map.pkl exists in
+                       existing dir, whether to overwrite both files
+
+
+Num
+~~~~~~~~~~
+
+Report the number of scenarios in a database.
+
+.. code-block:: text
+
+    python -m scenarionet.num [-h] --database_path DATABASE_PATH
+
+    The number of scenarios in the specified database
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            Database to check number of scenarios
+
+
+Filter
+~~~~~~~~
+
+Some scenarios contain overpasses, short ego-car trajectory or traffic signals.
+This scenarios can be filtered out from the database by using this command.
+Now, we only provide filters for ego car moving distance, number of objects, traffic lights, overpasses and scenario ids.
+If you would like to contribute new filters,
+feel free to create an issue or pull request on our `Github repo <https://github.com/metadriverse/scenarionet>`_.
+
+.. code-block:: text
+
+    python -m scenarionet.filter [-h] --database_path DATABASE_PATH --from FROM
+                          [--exist_ok] [--overwrite] [--moving_dist]
+                          [--sdc_moving_dist_min SDC_MOVING_DIST_MIN]
+                          [--num_object] [--max_num_object MAX_NUM_OBJECT]
+                          [--no_overpass] [--no_traffic_light] [--id_filter]
+                          [--exclude_ids EXCLUDE_IDS [EXCLUDE_IDS ...]]
+
+    Filter unwanted scenarios out and build a new database
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            The name of the new database. It will create a new
+                            directory to store dataset_summary.pkl and
+                            dataset_mapping.pkl. If exists_ok=True, those two .pkl
+                            files will be stored in an existing directory and turn
+                            that directory into a database.
+      --from FROM           Which dataset to filter. It takes one directory path
+                            as input
+      --exist_ok            Still allow to write, if the dir exists already. This
+                            write will only create two .pkl files and this
+                            directory will become a database.
+      --overwrite           When exists ok is set but summary.pkl and map.pkl
+                            exists in existing dir, whether to overwrite both
+                            files
+      --moving_dist         add this flag to select cases with SDC moving dist >
+                            sdc_moving_dist_min
+      --sdc_moving_dist_min SDC_MOVING_DIST_MIN
+                            Selecting case with sdc_moving_dist > this value.
+      --num_object          add this flag to select cases with object_num <
+                            max_num_object
+      --max_num_object MAX_NUM_OBJECT
+                            case will be selected if num_obj < this argument
+      --no_overpass         Scenarios with overpass WON'T be selected
+      --no_traffic_light    Scenarios with traffic light WON'T be selected
+      --id_filter           Scenarios with indicated name will NOT be selected
+      --exclude_ids EXCLUDE_IDS [EXCLUDE_IDS ...]
+                            Scenarios with indicated name will NOT be selected
+
+
+Build from Errors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This script is for generating a new database to exclude (include) broken scenarios.
+This is useful for debugging broken scenarios or building a completely clean datasets for training or testing.
+
+.. code-block:: text
+
+    python -m scenarionet.generate_from_error_file [-h] --database_path DATABASE_PATH --file
+                                   FILE [--overwrite] [--broken]
+
+    Generate a new database excluding or only including the failed scenarios
+    detected by 'check_simulation' and 'check_existence'
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            The path of the newly generated database
+      --file FILE, -f FILE  The path of the error file, should be xyz.json
+      --overwrite           If the database_path exists, overwrite it
+      --broken              By default, only successful scenarios will be picked
+                            to build the new database. If turn on this flog, it
+                            will generate database containing only broken
+                            scenarios.
+
+
+Sim
+~~~~~~~~~~~
+
+Load a database to simulator and replay the scenarios.
+We provide different render mode allows users to visualize them.
+For more details of simulation,
+please check Section :ref:`simulation` or the `MetaDrive document <https://metadrive-simulator.readthedocs.io/en/latest/>`_.
+
+.. code-block:: text
+
+    python -m scenarionet.sim [-h] --database_path DATABASE_PATH
+              [--render {none,2D,3D,advanced}]
+              [--scenario_index SCENARIO_INDEX]
+
+    Load a database to simulator and replay scenarios
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            The path of the database
+      --render {none,2D,3D,advanced}
+      --scenario_index SCENARIO_INDEX
+                            Specifying a scenario to run
+
+
+
+Check Existence
+~~~~~~~~~~~~~~~~~~~~~
+
+We provide a tool to check if the scenarios in a database are runnable and exist on your machine.
+This is because we include the scenarios to a database, a folder, through a virtual mapping.
+Each database only records the path of each scenario relative to the database directory.
+Thus this script is for making sure all original scenario file exists and can be loaded.
+
+If it manages to find some broken scenarios, an error file will be generated to the specified path.
+By using ``generate_from_error_file``, a new database can be created to exclude or only include these broken scenarios.
+In this way, we can debug the broken scenarios to check what causes the error or just ignore and remove the broke
+scenarios to make the database intact.
+
+.. code-block:: text
+
+    python -m scenarionet.check_existence [-h] --database_path DATABASE_PATH
+                              [--error_file_path ERROR_FILE_PATH] [--overwrite]
+                              [--num_workers NUM_WORKERS] [--random_drop]
+
+    Check if the database is intact and all scenarios can be found and recorded in
+    internal scenario description
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            Dataset path, a directory containing summary.pkl and
+                            mapping.pkl
+      --error_file_path ERROR_FILE_PATH
+                            Where to save the error file. One can generate a new
+                            database excluding or only including the failed
+                            scenarios.For more details, see operation
+                            'generate_from_error_file'
+      --overwrite           If an error file already exists in error_file_path,
+                            whether to overwrite it
+      --num_workers NUM_WORKERS
+                            number of workers to use
+      --random_drop         Randomly make some scenarios fail. for test only!
+
+Check Simulation
+~~~~~~~~~~~~~~~~~
+
+This is a upgraded version of existence check.
+It not only detect the existence and the completeness of the database, but check whether all scenarios can be loaded
+and run in the simulator.
+
+.. code-block:: text
+
+    python -m scenarionet.check_simulation [-h] --database_path DATABASE_PATH
+                           [--error_file_path ERROR_FILE_PATH] [--overwrite]
+                           [--num_workers NUM_WORKERS] [--random_drop]
+
+    Check if all scenarios can be simulated in simulator. We recommend doing this
+    before close-loop training/testing
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      --database_path DATABASE_PATH, -d DATABASE_PATH
+                            Dataset path, a directory containing summary.pkl and
+                            mapping.pkl
+      --error_file_path ERROR_FILE_PATH
+                            Where to save the error file. One can generate a new
+                            database excluding or only including the failed
+                            scenarios.For more details, see operation
+                            'generate_from_error_file'
+      --overwrite           If an error file already exists in error_file_path,
+                            whether to overwrite it
+      --num_workers NUM_WORKERS
+                            number of workers to use
+      --random_drop         Randomly make some scenarios fail. for test only!
+
+Check Overlap
+~~~~~~~~~~~~~~~~
+
+This script is for checking if there are some overlaps between two databases.
+The main goal of this command is to ensure that the training and test sets are isolated.
+
+.. code-block:: text
+
+    python -m scenarionet.check_overlap [-h] --d_1 D_1 --d_2 D_2 [--show_id]
+
+    Check if there are overlapped scenarios between two databases. If so, return
+    the number of overlapped scenarios and id list
+
+    optional arguments:
+      -h, --help  show this help message and exit
+      --d_1 D_1   The path of the first database
+      --d_2 D_2   The path of the second database
+      --show_id   whether to show the id of overlapped scenarios
--- a/documentation/requirements.txt
+++ b/documentation/requirements.txt
@@ -57,3 +57,5 @@ sphinxcontrib-serializinghtml==1.1.5
 urllib3==1.26.9
    # via requests
 sphinx_rtd_theme
+
+sphinxemoji
--- a/documentation/simulation.rst
+++ b/documentation/simulation.rst
@@ -0,0 +1,5 @@
+###########
+Simulation
+###########
+
+Coming soon!
--- a/documentation/waymo.rst
+++ b/documentation/waymo.rst
@@ -0,0 +1,87 @@
+#############################
+Waymo
+#############################
+
+| Website: https://waymo.com/open/
+| Download: https://waymo.com/open/download/
+| Paper: https://arxiv.org/abs/2104.10133
+
+The dataset includes:
+
+- 103,354, 20s 10Hz segments (over 20 million frames), mined for interesting interactions
+- 574 hours of data
+- Sensor data
+    - 4 short-range lidars
+    - 1 mid-range lidar
+- Object data
+    - 10.8M objects with tracking IDs
+    - Labels for 3 object classes - Vehicles, Pedestrians, Cyclists
+    - 3D bounding boxes for each object
+    - Mined for interesting behaviors and scenarios for behavior prediction research, such as unprotected turns, merges, lane changes, and intersections
+    - 3D bounding boxes are generated by a model trained on the Perception Dataset and detailed in our paper: Offboard 3D Object Detection from Point Cloud Sequences
+      Map data
+- 3D map data for each segment
+- Locations include: San Francisco, Phoenix, Mountain View, Los Angeles, Detroit, and Seattle
+- Added entrances to driveways (the map already Includes lane centers, lane boundaries, road boundaries, crosswalks, speed bumps and stop signs)
+- Adjusted some road edge boundary height estimates
+
+
+1. Install Waymo Toolkit
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+First of all, we have to install the waymo toolkit and tensorflow::
+
+    pip install waymo-open-dataset-tf-2-11-0
+    pip install tensorflow==2.11.0
+
+    # Or install with scenarionet
+    pip install -e .[waymo]
+
+.. note::
+    This package is only supported on Linux platform.
+
+2. Download TFRecord
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Waymo motion dataset is at `Google Cloud <https://console.cloud.google.com/storage/browser/waymo_open_dataset_motion_v_1_2_0>`_.
+For downloading all datasets, ``gsutil`` is required.
+The installation tutorial is at https://cloud.google.com/storage/docs/gsutil_install.
+
+After this, you can access all data and download them to current directory ``./`` by::
+
+    gsutil -m cp -r "gs://waymo_open_dataset_motion_v_1_2_0/uncompressed/scenario" .
+
+Or one just can download a part of the dataset using command like::
+
+    gsutil -m cp -r "gs://waymo_open_dataset_motion_v_1_2_0/uncompressed/scenario/training_20s" .
+
+The downloaded data should be stored in a directory like this::
+
+    waymo
+    ├── training_20s/
+    |   ├── training_20s.tfrecord-00000-of-01000
+    |   ├── training_20s.tfrecord-00001-of-01000
+    |   └── ...
+    ├── validation/
+    |   ├── validation.tfrecord-00000-of-00150
+    |   ├── validation.tfrecord-00001-of-00150
+    |   └── ...
+    └── testing/
+        ├── testing.tfrecord-00000-of-00150
+        ├── testing.tfrecord-00001-of-00150
+        └── ...
+
+
+3. Build Waymo Database
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run the following command to extract scenarios in any directory containing ``tfrecord``.
+Here we take converting raw data in ``training_20s`` as an example::
+
+    python -m scenarionet.convert_waymo -d /path/to/your/database --raw_data_path ./waymo/training_20s --num_files=1000
+
+Now all converted scenarios will be placed at ``/path/to/your/database`` and are ready to be used in your work.
+
+Known Issues: Waymo
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+N/A
--- a/scenarionet/check_existence.py
+++ b/scenarionet/check_existence.py
@@ -1,13 +1,23 @@
-import argparse
-
-from scenarionet.verifier.utils import verify_database, set_random_drop
+desc = "Check if the database is intact and all scenarios can be found and recorded in internal scenario description"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import argparse
+
+    from scenarionet.verifier.utils import verify_database, set_random_drop
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument(
        "--database_path", "-d", required=True, help="Dataset path, a directory containing summary.pkl and mapping.pkl"
    )
-    parser.add_argument("--error_file_path", default="./", help="Where to save the error file")
+    parser.add_argument(
+        "--error_file_path",
+        default="./",
+        help="Where to save the error file. "
+        "One can generate a new database excluding "
+        "or only including the failed scenarios."
+        "For more details, "
+        "see operation 'generate_from_error_file'"
+    )
    parser.add_argument(
        "--overwrite",
        action="store_true",
--- a/scenarionet/check_overlap.py
+++ b/scenarionet/check_overlap.py
@@ -1,15 +1,19 @@
 """
 Check If any overlap between two database
 """
-
-import argparse
-
-from scenarionet.common_utils import read_dataset_summary
+desc = "Check if there are overlapped scenarios between two databases. " \
+       "If so, return the number of overlapped scenarios and id list"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--database_1', type=str, required=True, help="The path of the first database")
-    parser.add_argument('--database_2', type=str, required=True, help="The path of the second database")
+
+    import argparse
+
+    from scenarionet.common_utils import read_dataset_summary
+
+    parser = argparse.ArgumentParser(description=desc)
+    parser.add_argument('--d_1', type=str, required=True, help="The path of the first database")
+    parser.add_argument('--d_2', type=str, required=True, help="The path of the second database")
+    parser.add_argument('--show_id', action="store_true", help="whether to show the id of overlapped scenarios")
    args = parser.parse_args()

    summary_1, _, _ = read_dataset_summary(args.database_1)
@@ -19,4 +23,8 @@ if __name__ == '__main__':
    if len(intersection) == 0:
        print("No overlapping in two database!")
    else:
-        print("Find overlapped scenarios: {}".format(intersection))
+        print("Find {} overlapped scenarios".format(len(intersection)))
+        if args.show_id:
+            print("Overlapped scenario ids:")
+            for id in intersection:
+                print(" " * 5, id)
--- a/scenarionet/check_simulation.py
+++ b/scenarionet/check_simulation.py
@@ -1,13 +1,24 @@
-import pkg_resources  # for suppress warning
-import argparse
-from scenarionet.verifier.utils import verify_database, set_random_drop
+desc = "Check if all scenarios can be simulated in simulator. " \
+       "We recommend doing this before close-loop training/testing"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import pkg_resources  # for suppress warning
+    import argparse
+    from scenarionet.verifier.utils import verify_database, set_random_drop
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument(
        "--database_path", "-d", required=True, help="Dataset path, a directory containing summary.pkl and mapping.pkl"
    )
-    parser.add_argument("--error_file_path", default="./", help="Where to save the error file")
+    parser.add_argument(
+        "--error_file_path",
+        default="./",
+        help="Where to save the error file. "
+        "One can generate a new database excluding "
+        "or only including the failed scenarios."
+        "For more details, "
+        "see operation 'generate_from_error_file'"
+    )
    parser.add_argument(
        "--overwrite",
        action="store_true",
--- a/scenarionet/convert_nuplan.py
+++ b/scenarionet/convert_nuplan.py
@@ -1,12 +1,14 @@
-import pkg_resources  # for suppress warning
-import argparse
-import os
-from scenarionet import SCENARIONET_DATASET_PATH
-from scenarionet.converter.nuplan.utils import get_nuplan_scenarios, convert_nuplan_scenario
-from scenarionet.converter.utils import write_to_directory
+desc = "Build database from nuPlan scenarios"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import pkg_resources  # for suppress warning
+    import argparse
+    import os
+    from scenarionet import SCENARIONET_DATASET_PATH
+    from scenarionet.converter.nuplan.utils import get_nuplan_scenarios, convert_nuplan_scenario
+    from scenarionet.converter.utils import write_to_directory
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument(
        "--database_path",
        "-d",
--- a/scenarionet/convert_nuscenes.py
+++ b/scenarionet/convert_nuscenes.py
@@ -1,12 +1,14 @@
-import pkg_resources  # for suppress warning
-import argparse
-import os.path
-from scenarionet import SCENARIONET_DATASET_PATH
-from scenarionet.converter.nuscenes.utils import convert_nuscenes_scenario, get_nuscenes_scenarios
-from scenarionet.converter.utils import write_to_directory
+desc = "Build database from nuScenes/Lyft scenarios"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import pkg_resources  # for suppress warning
+    import argparse
+    import os.path
+    from scenarionet import SCENARIONET_DATASET_PATH
+    from scenarionet.converter.nuscenes.utils import convert_nuscenes_scenario, get_nuscenes_scenarios
+    from scenarionet.converter.utils import write_to_directory
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument(
        "--database_path",
        "-d",
@@ -22,6 +24,7 @@ if __name__ == '__main__':
        default='v1.0-mini',
        help="version of nuscenes data, scenario of this version will be converted "
    )
+    parser.add_argument("--dataroot", default="/data/sets/nuscenes", help="The path of nuscenes data")
    parser.add_argument("--overwrite", action="store_true", help="If the database_path exists, whether to overwrite it")
    parser.add_argument("--num_workers", type=int, default=8, help="number of workers to use")
    args = parser.parse_args()
@@ -31,8 +34,7 @@ if __name__ == '__main__':
    output_path = args.database_path
    version = args.version

-    dataroot = '/home/shady/data/nuscenes'
-    scenarios, nuscs = get_nuscenes_scenarios(dataroot, version, args.num_workers)
+    scenarios, nuscs = get_nuscenes_scenarios(args.dataroot, version, args.num_workers)

    write_to_directory(
        convert_func=convert_nuscenes_scenario,
--- a/scenarionet/convert_pg.py
+++ b/scenarionet/convert_pg.py
@@ -1,16 +1,18 @@
-import pkg_resources  # for suppress warning
-import argparse
-import os.path
-
-import metadrive
-
-from scenarionet import SCENARIONET_DATASET_PATH
-from scenarionet.converter.pg.utils import get_pg_scenarios, convert_pg_scenario
-from scenarionet.converter.utils import write_to_directory
+desc = "Build database from synthetic or procedurally generated scenarios"

 if __name__ == '__main__':
+    import pkg_resources  # for suppress warning
+    import argparse
+    import os.path
+
+    import metadrive
+
+    from scenarionet import SCENARIONET_DATASET_PATH
+    from scenarionet.converter.pg.utils import get_pg_scenarios, convert_pg_scenario
+    from scenarionet.converter.utils import write_to_directory
+
    # For the PG environment config, see: scenarionet/converter/pg/utils.py:6
-    parser = argparse.ArgumentParser()
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument(
        "--database_path",
        "-d",
--- a/scenarionet/convert_waymo.py
+++ b/scenarionet/convert_waymo.py
@@ -1,17 +1,19 @@
-import pkg_resources  # for suppress warning
-import shutil
-import argparse
-import logging
-import os
-
-from scenarionet import SCENARIONET_DATASET_PATH, SCENARIONET_REPO_PATH
-from scenarionet.converter.utils import write_to_directory
-from scenarionet.converter.waymo.utils import convert_waymo_scenario, get_waymo_scenarios
-
-logger = logging.getLogger(__name__)
+desc = "Build database from Waymo scenarios"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import pkg_resources  # for suppress warning
+    import shutil
+    import argparse
+    import logging
+    import os
+
+    from scenarionet import SCENARIONET_DATASET_PATH, SCENARIONET_REPO_PATH
+    from scenarionet.converter.utils import write_to_directory
+    from scenarionet.converter.waymo.utils import convert_waymo_scenario, get_waymo_scenarios
+
+    logger = logging.getLogger(__name__)
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument(
        "--database_path",
        "-d",
--- a/scenarionet/copy_database.py
+++ b/scenarionet/copy_database.py
@@ -1,9 +1,11 @@
-import argparse
-
-from scenarionet.builder.utils import copy_database
+desc = "Move or Copy an existing database"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import argparse
+
+    from scenarionet.builder.utils import copy_database
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument('--from', required=True, help="Which database to move.")
    parser.add_argument(
        "--to",
--- a/scenarionet/filter_database.py
+++ b/scenarionet/filter_database.py
@@ -1,10 +1,12 @@
-import argparse
-
-from scenarionet.builder.filters import ScenarioFilter
-from scenarionet.builder.utils import merge_database
+desc = "Filter unwanted scenarios out and build a new database"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import argparse
+
+    from scenarionet.builder.filters import ScenarioFilter
+    from scenarionet.builder.utils import merge_database
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument(
        "--database_path",
        "-d",
--- a/scenarionet/generate_from_error_file.py
+++ b/scenarionet/generate_from_error_file.py
@@ -1,10 +1,13 @@
-import pkg_resources  # for suppress warning
-import argparse
-
-from scenarionet.verifier.error import ErrorFile
+desc = "Generate a new database excluding " \
+       "or only including the failed scenarios detected by 'check_simulation' and 'check_existence'"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import pkg_resources  # for suppress warning
+    import argparse
+
+    from scenarionet.verifier.error import ErrorFile
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument("--database_path", "-d", required=True, help="The path of the newly generated database")
    parser.add_argument("--file", "-f", required=True, help="The path of the error file, should be xyz.json")
    parser.add_argument("--overwrite", action="store_true", help="If the database_path exists, overwrite it")
--- a/scenarionet/list.py
+++ b/scenarionet/list.py
@@ -0,0 +1,33 @@
+import pkgutil
+import importlib
+import argparse
+
+desc = "List all available operations"
+
+
+def list_modules(package):
+    ret = [name for _, name, _ in pkgutil.iter_modules(package.__path__)]
+    ret.remove("builder")
+    ret.remove("converter")
+    ret.remove("verifier")
+    ret.remove("common_utils")
+    return ret
+
+
+if __name__ == '__main__':
+    import scenarionet
+
+    # exclude current path
+    # print(d)
+    parser = argparse.ArgumentParser(description=desc)
+    parser.parse_args()
+    modules = list_modules(scenarionet)
+    print("\nAvailable operations (usage python -m scenarionet.operation): \n")
+    for module in modules:
+        # module="convert_nuplan"
+        print(
+            "{}scenarionet.{}:   {} \n".format(
+                " " * 5, module,
+                importlib.import_module("scenarionet.{}".format(module)).desc
+            )
+        )
--- a/scenarionet/merge_database.py
+++ b/scenarionet/merge_database.py
@@ -1,10 +1,12 @@
-import argparse
-
-from scenarionet.builder.filters import ScenarioFilter
-from scenarionet.builder.utils import merge_database
+desc = "Merge a list of databases. e.g. scenario.merge --from db_1 db_2 db_3...db_n --to db_dest"

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import argparse
+
+    from scenarionet.builder.filters import ScenarioFilter
+    from scenarionet.builder.utils import merge_database
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument(
        "--database_path",
        "-d",
--- a/scenarionet/num.py
+++ b/scenarionet/num.py
@@ -0,0 +1,15 @@
+desc = "The number of scenarios in the specified database"
+
+if __name__ == '__main__':
+    import pkg_resources  # for suppress warning
+    import argparse
+    import logging
+    from scenarionet.common_utils import read_dataset_summary
+
+    logger = logging.getLogger(__file__)
+
+    parser = argparse.ArgumentParser(description=desc)
+    parser.add_argument("--database_path", "-d", required=True, help="Database to check number of scenarios")
+    args = parser.parse_args()
+    summary, _, _, = read_dataset_summary(args.database_path)
+    logger.info("Number of scenarios: {}".format(len(summary)))
--- a/scenarionet/num_scenarios.py
+++ b/scenarionet/num_scenarios.py
@@ -1,13 +0,0 @@
-import pkg_resources  # for suppress warning
-import argparse
-import logging
-from scenarionet.common_utils import read_dataset_summary
-
-logger = logging.getLogger(__file__)
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--database_path", "-d", required=True, help="Database to check number of scenarios")
-    args = parser.parse_args()
-    summary, _, _, = read_dataset_summary(args.database_path)
-    logger.info("Number of scenarios: {}".format(len(summary)))
--- a/scenarionet/run_simulation.py
+++ b/scenarionet/run_simulation.py
@@ -1,52 +0,0 @@
-import pkg_resources  # for suppress warning
-import argparse
-import os
-from metadrive.envs.scenario_env import ScenarioEnv
-from metadrive.policy.replay_policy import ReplayEgoCarPolicy
-from metadrive.scenario.utils import get_number_of_scenarios
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--database_path", "-d", required=True, help="The path of the database")
-    parser.add_argument("--render", action="store_true", help="Enable 3D rendering")
-    parser.add_argument("--scenario_index", default=None, type=int, help="Specifying a scenario to run")
-    args = parser.parse_args()
-
-    database_path = os.path.abspath(args.database_path)
-    num_scenario = get_number_of_scenarios(database_path)
-    if args.scenario_index is not None:
-        assert args.scenario_index < num_scenario, \
-            "The specified scenario index exceeds the scenario range: {}!".format(num_scenario)
-
-    env = ScenarioEnv(
-        {
-            "use_render": args.render,
-            "agent_policy": ReplayEgoCarPolicy,
-            "manual_control": False,
-            "show_interface": True,
-            "show_logo": False,
-            "show_fps": False,
-            "num_scenarios": num_scenario,
-            "horizon": 1000,
-            "vehicle_config": dict(
-                show_navi_mark=False,
-                no_wheel_friction=True,
-                lidar=dict(num_lasers=120, distance=50, num_others=4),
-                lane_line_detector=dict(num_lasers=12, distance=50),
-                side_detector=dict(num_lasers=160, distance=50)
-            ),
-            "data_directory": database_path,
-        }
-    )
-    for index in range(num_scenario if args.scenario_index is not None else 1000000):
-        env.reset(seed=index if args.scenario_index is None else args.scenario_index)
-        for t in range(10000):
-            env.step([0, 0])
-            if env.config["use_render"]:
-                env.render(text={
-                    "scenario index": env.engine.global_seed + env.config["start_scenario_index"],
-                })
-
-            if env.episode_step >= env.engine.data_manager.current_scenario_length:
-                print("scenario:{}, success".format(env.engine.global_random_seed))
-                break
--- a/scenarionet/sim.py
+++ b/scenarionet/sim.py
@@ -0,0 +1,76 @@
+desc = "Load a database to simulator and replay scenarios"
+
+if __name__ == '__main__':
+    import logging
+
+    import pkg_resources  # for suppress warning
+    import argparse
+    import os
+    from metadrive.envs.scenario_env import ScenarioEnv
+    from metadrive.policy.replay_policy import ReplayEgoCarPolicy
+    from metadrive.scenario.utils import get_number_of_scenarios
+
+    parser = argparse.ArgumentParser(description=desc)
+    parser.add_argument("--database_path", "-d", required=True, help="The path of the database")
+    parser.add_argument("--render", default="none", choices=["none", "2D", "3D", "advanced"])
+    parser.add_argument("--scenario_index", default=None, type=int, help="Specifying a scenario to run")
+    args = parser.parse_args()
+
+    database_path = os.path.abspath(args.database_path)
+    num_scenario = get_number_of_scenarios(database_path)
+    if args.scenario_index is not None:
+        assert args.scenario_index < num_scenario, \
+            "The specified scenario index exceeds the scenario range: {}!".format(num_scenario)
+
+    env = ScenarioEnv(
+        {
+            "use_render": args.render == "3D" or args.render == "advanced",
+            "agent_policy": ReplayEgoCarPolicy,
+            "manual_control": False,
+            "render_pipeline": args.render == "advanced",
+            "show_interface": True,
+            # "reactive_traffic": args.reactive,
+            "show_logo": False,
+            "show_fps": False,
+            "log_level": logging.CRITICAL,
+            "num_scenarios": num_scenario,
+            "interface_panel": [],
+            "horizon": 1000,
+            "vehicle_config": dict(
+                show_navi_mark=True,
+                show_line_to_dest=False,
+                show_dest_mark=False,
+                no_wheel_friction=True,
+                lidar=dict(num_lasers=120, distance=50, num_others=4),
+                lane_line_detector=dict(num_lasers=12, distance=50),
+                side_detector=dict(num_lasers=160, distance=50)
+            ),
+            "data_directory": database_path,
+        }
+    )
+    for index in range(2, num_scenario if args.scenario_index is not None else 1000000):
+        env.reset(seed=index if args.scenario_index is None else args.scenario_index)
+        for t in range(10000):
+            env.step([0, 0])
+            if env.config["use_render"]:
+                env.render(
+                    text={
+                        "scenario index": env.engine.global_seed + env.config["start_scenario_index"],
+                        "[": "Load last scenario",
+                        "]": "Load next scenario",
+                        "r": "Reset current scenario",
+                    }
+                )
+
+            if args.render == "2D":
+                env.render(
+                    film_size=(3000, 3000),
+                    target_vehicle_heading_up=False,
+                    mode="top_down",
+                    text={
+                        "scenario index": env.engine.global_seed + env.config["start_scenario_index"],
+                    }
+                )
+            if env.episode_step >= env.engine.data_manager.current_scenario_length:
+                print("scenario:{}, success".format(env.engine.global_random_seed))
+                break
--- a/scenarionet/split_database.py
+++ b/scenarionet/split_database.py
@@ -1,13 +1,15 @@
 """
 This script is for extracting a subset of data from an existing database
 """
-import pkg_resources  # for suppress warning
-import argparse
-
-from scenarionet.builder.utils import split_database
+desc = "Build a new database containing a subset of scenarios from an existing database."

 if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
+    import pkg_resources  # for suppress warning
+    import argparse
+
+    from scenarionet.builder.utils import split_database
+
+    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument('--from', required=True, help="Which database to extract data from.")
    parser.add_argument(
        "--to",
--- a/scenarionet/tests/local_test/combine_verify_generate.sh
+++ b/scenarionet/tests/local_test/combine_verify_generate.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash

-python ../../merge_database.py --overwrite --exist_ok --database_path ../tmp/test_combine_dataset --from ../../../dataset/waymo ../../../dataset/pg ../../../dataset/nuscenes ../../../dataset/nuplan --overwrite
+python ../../merge.py --overwrite --exist_ok --database_path ../tmp/test_combine_dataset --from ../../../dataset/waymo ../../../dataset/pg ../../../dataset/nuscenes ../../../dataset/nuplan --overwrite
 python ../../check_simulation.py --overwrite --database_path ../tmp/test_combine_dataset --error_file_path ../tmp/test_combine_dataset --random_drop --num_workers=16
 python ../../generate_from_error_file.py --file ../tmp/test_combine_dataset/error_scenarios_for_test_combine_dataset.json --overwrite --database_path ../tmp/verify_pass
 python ../../generate_from_error_file.py --file ../tmp/test_combine_dataset/error_scenarios_for_test_combine_dataset.json --overwrite --database_path ../tmp/verify_fail --broken
--- a/scenarionet/tests/local_test/convert_pg_large.sh
+++ b/scenarionet/tests/local_test/convert_pg_large.sh
@@ -32,8 +32,8 @@ done

 # combine the datasets
 if [ "$overwrite" = true ]; then
-  python -m scenarionet.scripts.merge_database --database_path $dataset_path --from $(for i in $(seq 0 $((num_sub_dataset-1))); do echo -n "${dataset_path}/pg_$i "; done) --overwrite --exist_ok
+  python -m scenarionet.scripts.merge --database_path $dataset_path --from $(for i in $(seq 0 $((num_sub_dataset-1))); do echo -n "${dataset_path}/pg_$i "; done) --overwrite --exist_ok
 else
-  python -m scenarionet.scripts.merge_database --database_path $dataset_path --from $(for i in $(seq 0 $((num_sub_dataset-1))); do echo -n "${dataset_path}/pg_$i "; done) --exist_ok
+  python -m scenarionet.scripts.merge --database_path $dataset_path --from $(for i in $(seq 0 $((num_sub_dataset-1))); do echo -n "${dataset_path}/pg_$i "; done) --exist_ok
 fi

--- a/setup.py
+++ b/setup.py
@@ -34,20 +34,40 @@ install_requires = [
    "matplotlib",
    "pandas",
    "tqdm",
-    "metadrive-simulator",
+    "metadrive-simulator>=0.4.1.1",
    "geopandas",
-    "yapf==0.30.0",
+    "yapf",
    "shapely"
 ]

+doc = [
+    "sphinxemoji",
+    "sphinx",
+    "sphinx_rtd_theme",
+]
+
 train_requirement = [
-                     "ray[rllib]==1.0.0",
-                     # "torch",
-                     "wandb==0.12.1",
-                     "aiohttp==3.6.0",
-                     "gymnasium",
-                     "tensorflow",
-                     "tensorflow_probability"]
+    "ray[rllib]==1.0.0",
+    # "torch",
+    "wandb==0.12.1",
+    "aiohttp==3.6.0",
+    "gymnasium",
+    "tensorflow",
+    "tensorflow_probability"]
+
+waymo = ["waymo-open-dataset-tf-2-11-0", "tensorflow==2.11.0"]
+
+nuplan = ["nuplan-devkit>=1.2.0",
+          "bokeh==2.4",
+          "hydra-core",
+          "chardet",
+          "pyarrow",
+          "aiofiles",
+          "retry",
+          "boto3",
+          "aioboto3"]
+
+nuscenes = ["nuscenes-devkit>=1.1.10"]

 setup(
    name="scenarionet",
@@ -61,37 +81,14 @@ setup(
    install_requires=install_requires,
    extras_require={
        "train": train_requirement,
+        "doc": doc,
+        "waymo": waymo,
+        "nuplan": nuplan,
+        "nuscenes": nuscenes,
+        "all": nuscenes + waymo + nuplan
    },
    include_package_data=True,
    license="Apache 2.0",
    long_description=long_description,
    long_description_content_type='text/markdown',
 )
-
-"""
-How to publish to pypi?  Noted by Zhenghao in Dec 27, 2020.
-
-0. Rename version in setup.py
-
-1. Remove old files and ext_modules from setup() to get a clean wheel for all platforms in py3-none-any.wheel
-    rm -rf dist/ build/ documentation/build/ scenarionet.egg-info/ docs/build/
-
-2. Rename current version to X.Y.Z.rcA, where A is arbitrary value represent "release candidate A". 
-   This is really important since pypi do not support renaming and re-uploading. 
-   Rename version in setup.py 
-
-3. Get wheel
-    python setup.py sdist bdist_wheel
-
-4. Upload to test channel
-    twine upload --repository testpypi dist/*
-
-5. Test as next line. If failed, change the version name and repeat 1, 2, 3, 4, 5.
-    pip install --index-url https://test.pypi.org/simple/ scenarionet
-
-6. Rename current version to X.Y.Z in setup.py, rerun 1, 3 steps.
-
-7. Upload to production channel 
-    twine upload dist/*
-
-"""