# Read Established ScenarioNet Dataset

Welcome to try out MetaDrive & ScenarioNet!

For the researchers how focus on motion prediction, scenario generation, and more, you probably will not
use the interactive environment provided by MetaDrive.

In this tutorial, we will navigate you through how to read established ScenarioNet dataset and give you a sense of the data format.

## Installation

Note that even we only need to load the data, you still need to install MetaDrive. But no worry, it's quite easy!

In [1]:
#@title Collect the MetaDrive & ScenarioNet
# NOTE: If you are running this notebook locally with installtion finished, this step is not required.
RunningInCOLAB = 'google.colab' in str(get_ipython()) # Detect if it is running in Colab
if RunningInCOLAB:
    %pip install git+https://github.com/metadriverse/metadrive.git
    %pip install git+https://github.com/metadriverse/scenarionet.git

Next, let's create a 2D visualization tool for recording the scenario in GIF.

In [2]:
# visualization
from IPython.display import Image as IImage
import pygame
import numpy as np
from PIL import Image

def make_GIF(frames, name="demo.gif"):
    print("Generate gif...")
    imgs = [pygame.surfarray.array3d(frame) for frame in frames]
    imgs = [Image.fromarray(img) for img in imgs]
    imgs[0].save(name, save_all=True, append_images=imgs[1:], duration=50, loop=0)

pygame 2.5.2 (SDL 2.28.2, Python 3.9.18)
Hello from the pygame community. https://www.pygame.org/contribute.html


## Configuration

Let's import some modules and specify the dataset directory.

In [3]:
#@title Make some configurations and import some modules
from metadrive.engine.engine_utils import close_engine
close_engine()
from metadrive.pull_asset import pull_asset
pull_asset(False)
# NOTE: usually you don't need the above lines. It is only for avoiding a potential bug when running on colab

from metadrive.engine.asset_loader import AssetLoader
from metadrive.policy.replay_policy import ReplayEgoCarPolicy
from metadrive.envs.scenario_env import ScenarioEnv
import os

os.environ["SDL_VIDEODRIVER"] = "dummy" # Hide the pygame window

Fail to pull. Assets already exists, version: 0.4.1.2. Expected version: 0.4.1.2. To overwrite existing assets and update, add flag '--update' and rerun this script


We prepare two demo datasets, splitting from Waymo Open Dataset and nuScenes Dataset. Here we demonstrate the files structure:

In [4]:
waymo_data =  AssetLoader.file_path(AssetLoader.asset_path, "waymo", return_raw_style=False) # Use the built-in datasets with simulator
os.listdir(waymo_data) # there are 3 waymo scenario file with a 'dataset_summary.pkl'

['sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl',
 'dataset_summary.pkl',
 'sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl',
 'sd_training.tfrecord-00000-of-01000_8a346109094cd5aa.pkl']

In [5]:
nuscenes_data =  AssetLoader.file_path(AssetLoader.asset_path, "nuscenes", return_raw_style=False) # Use the built-in datasets with simulator
os.listdir(nuscenes_data) # there are 10 nuscenes scenario file with a 'dataset_summary.pkl' and a 'dataset_summary.pkl'

['dataset_summary.pkl',
 'nuscenes_3',
 'nuscenes_6',
 'dataset_mapping.pkl',
 'nuscenes_0',
 'nuscenes_5',
 'nuscenes_7',
 'nuscenes_1',
 'nuscenes_4',
 'nuscenes_2']



## Read Data Easily with Scenario Description

A established ScenarioNet dataset is a folder containing `dataset_mapping.pkl` and `dataset_summary.pkl`. `dataset_mapping.pkl` contains the mapping from the scenario ID to the relative path. `dataset_summary.pkl` summarizes the meta information for each scenario.

You can find that for the Waymo dataset, as we put all scenarios in the same folder, we don't need the `dataset_mapping.pkl` that routes scenario ID to corresponding `.pkl` file. In the nuScenes dataset, we have both `dataset_mapping.pkl` and `dataset_summary.pkl` because we have a hierachy file structure to store the scenarios.

In this section, we demonstrate how to use the utilities from MetaDrive to easily access scenarios.



In [6]:
from scenarionet import read_dataset_summary, read_scenario

### Read the dataset summary

In [7]:
read_dataset_summary?

[0;31mSignature:[0m [0mread_dataset_summary[0m[0;34m([0m[0mdataset_path[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read the dataset and return the metadata of each scenario in this dataset.

Args:
    dataset_path: the path to the root folder of your dataset.

Returns:
    A tuple of three elements:
    1) the summary dict mapping from scenario ID to its metadata,
    2) the list of all scenarios IDs, and
    3) a dict mapping from scenario IDs to the folder that hosts their files.
[0;31mFile:[0m      ~/scenarionet/scenarionet/common_utils.py
[0;31mType:[0m      function

In [8]:
dataset_summary, scenario_ids, mapping = read_dataset_summary(dataset_path=waymo_data)

`dataset_summary` is the summary dict mapping from scenario ID to its metadata.

In [9]:
dataset_summary

{'sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl': {'coordinate': 'waymo',
  'ts': [0.0,
   0.09997999668121338,
   0.19995999336242676,
   0.2999500036239624,
   0.39994001388549805,
   0.49990999698638916,
   0.5999400019645691,
   0.6999300122261047,
   0.7999500036239624,
   0.8999699950218201,
   0.9999899864196777,
   1.0999799966812134,
   1.2000000476837158,
   1.3000199794769287,
   1.4000099897384644,
   1.5,
   1.5999799966812134,
   1.699970006942749,
   1.7999399900436401,
   1.899940013885498,
   1.999959945678711,
   2.099950075149536,
   2.1999800205230713,
   2.299999952316284,
   2.4000298976898193,
   2.5000200271606445,
   2.600029945373535,
   2.7000200748443604,
   2.8000400066375732,
   2.9000399112701416,
   3.0000100135803223,
   3.099950075149536,
   3.199889898300171,
   3.299799919128418,
   3.3996999263763428,
   3.49960994720459,
   3.5995099544525146,
   3.6994199752807617,
   3.799370050430298,
   3.899280071258545,
   3.9991800785064697,
   4.

`scenario_ids` is the list of all scenarios IDs.

In [10]:
scenario_ids

['sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl',
 'sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl',
 'sd_training.tfrecord-00000-of-01000_8a346109094cd5aa.pkl']

`mapping` is a dict mapping from scenario IDs to the folder that hosts their files.

In [11]:
mapping

{'sd_training.tfrecord-00000-of-01000_2a1e44d405a6833f.pkl': '',
 'sd_training.tfrecord-00000-of-01000_c403d5992cab9e0.pkl': '',
 'sd_training.tfrecord-00000-of-01000_8a346109094cd5aa.pkl': ''}

### Read specified scenario

`read_scenario` will return the ScenarioNet instance. Please refer to the ScenarioDescription class in MetaDrive for more information.

In [12]:
read_scenario?

[0;31mSignature:[0m [0mread_scenario[0m[0;34m([0m[0mdataset_path[0m[0;34m,[0m [0mmapping[0m[0;34m,[0m [0mscenario_file_name[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read a scenario pkl file and return the Scenario Description instance.

Args:
    dataset_path: the path to the root folder of your dataset.
    mapping: the dict mapping return from read_dataset_summary.
    scenario_file_name: the file name to a scenario file, should end with `.pkl`.

Returns:
    The Scenario Description instance of that scenario.
[0;31mFile:[0m      ~/scenarionet/scenarionet/common_utils.py
[0;31mType:[0m      function

In [13]:
scenario_file_name = scenario_ids[0]

scenario = read_scenario(dataset_path=waymo_data, mapping=mapping, scenario_file_name=scenario_file_name)

In [14]:
type(scenario)

metadrive.scenario.scenario_description.ScenarioDescription

In [15]:
scenario.keys()

dict_keys(['id', 'version', 'length', 'tracks', 'dynamic_map_states', 'map_features', 'metadata'])

In [16]:
scenario.sanity_check(scenario)  # Pass check if no error is raised.

In [None]:
scenario.to_dict()