Falsification of Driving Systems

We provide a falsification pipeline that allows users to test their driving systems against the Rulebook specifications using Scenic programs. To run the falsification pipeline, users need to conduct the following steps:

Interface the system under test with Scenic.
Define the Rulebook specifications that the system should satisfy.
Execute the falsification process to identify configurations where the system fails to meet the specifications.

Below we provide the detailed instructions for each step.

I. Interfacing with Scenic

To interface your driving system with Scenic, follow these steps:

A. Define a Policy Agent

Create a Python class that inherits the Agent class defined in src/evaluation/agent.py. The class must implement a run_step method, which generates the control actions (e.g., throttle, brake, steer) for the next timestep. Below is a template for the policy agent:

class ExampleAgent(Agent):
    def run_step(self, *args, **kwargs):
        ...
        throttle, brake, steer = ... # Generated by the driving policy under test
        return throttle, brake, steer

For a complete implementation example, please refer to MetaDrivePolicyAgent in src/evaluation/metadrive_expert_agent.py, which leverages a proximal policy optimization (PPO)-trained policy provided by the MetaDrive simulator.

B. Define a Scenic Car

Create a Scenic class that inherits from Scenic’s Car class. In the startDynamicSimulation method, assign an instance of your policy agent defined in step A (e.g., ExampleAgent) to self.controller. Below is a template for the car class:

class ExampleCar(Car):
    controller: None

    def startDynamicSimulation(self):
        self.controller = ExampleAgent()

For a complete implementation example, please refer to MetaDrivePPOPolicyCar in src/evaluation/metadrive_expert.scenic.

C. Define an Action

Create a Python class that inherits the AgentAction class in src/evaluation/agent.py. An action represents a single step of control that can be applied to the car. This class must implement an applyTo method, where the first argument agent is an instance of the car class defined in step B (e.g., ExampleCar). Inside applyTo, call agent.controller.run_step to retrieve control actions for the next timestep, then use Scenic’s built-in methods (setThrottle, setBraking, setSteering, etc.) to apply them to the car. Below is a template for the action class:

class ExampleAction(AgentAction):
    def applyTo(self, agent, *args, **kwargs):
        ...
        throttle, brake, steer = agent.controller.run_step(...)
        agent.setThrottle(throttle)
        agent.setBraking(brake)
        agent.setSteering(steer)

For a complete implementation example, please refer to MetaDrivePolicyAction in src/evaluation/metadrive_expert_agent.py.

D. Define a Scenic Behavior

Create a Scenic behavior that repeatedly takes the action defined in step C (e.g., ExampleAction). A behavior represents a sequence of actions that the car will execute during the simulation. In the behavior, use the take specifier to call the applyTo method defined in the action. Below is a template for the behavior:

behavior ExampleBehavior():
    action = ExampleAction()
    take action
    while True:
        take action

For a complete implementation example, please refer to MetaDrivePPOPolicyBehavior in src/evaluation/metadrive_expert.scenic.

E. Pass State Information

Most policies require access to the ego’s current state or environment state to determine the next action.
To facilitate this, follow these steps to supplement the classes defined above:

E-1. Create Global Variables and Monitor

Create global variables (e.g., SPEED) to store the state information needed by the policy. Then, define a monitor that continuously updates this variable with the current state at each timestep.

SPEED = []

monitor UpdateState():
    while True:
        SPEED.append(ego.speed)
        wait

Note

The global variables should be lists, and the most recent state can be accessed by querying the last element (e.g., SPEED[-1]). This is due to Scenic’s internal handling of variables.

For a complete implementation example, please refer to the METADRIVE_ACTOR variable and MetaDrivePPOUpdateState monitor in src/evaluation/metadrive_expert.scenic.

E-2. Store State in the Car Class

Add a member variable in the car class (e.g., ExampleCar) to store the current state and update it at each timestep.

class ExampleCar(Car):
    speed: None
    controller: None

    def update_actor(self):
        self.speed = SPEED[-1]

    def startDynamicSimulation(self):
        self.controller = ExampleAgent()

For a complete implementation example, please refer to MetaDrivePPOPolicyCar in src/evaluation/metadrive_expert.scenic.

E-3. Pass State to the Policy Agent

Modify the applyTo method of the action class (e.g., ExampleAction) to pass the current state to run_step in the policy agent (e.g., ExampleAgent). Optionally, you can pass the entire Scenic agent (e.g., ExampleCar) if needed.

class ExampleAction(AgentAction):
    def applyTo(self, agent, *args, **kwargs):
        ...
        agent.update_actor()
        throttle, brake, steer = agent.controller.run_step(agent.speed, agent, ...)
        agent.setThrottle(throttle)
        agent.setBraking(brake)
        agent.setSteering(steer)

For a complete implementation example, please refer to MetaDrivePolicyAction in src/evaluation/metadrive_expert_agent.py.

F. Link the Behavior to the Car in the Scenic Program

In the main Scenic program where the scenario is defined, create an instance of the car class defined in step B (e.g., ExampleCar) and assign the behavior defined in step D (e.g., ExampleBehavior) to it. If your policy requires state information, ensure that the monitors defined in step E-1 are active throughout the simulation. Below is a template:

ego = new ExampleCar at ...,
        with ...,
        with behavior ExampleBehavior()

require monitor UpdateState()

Using the MetaDrive PPO policy as an example:

ego = new MetaDrivePPOPolicyCar at egoSpawnPt,
        with blueprint MODEL,
        with behavior MetaDrivePPOPolicyBehavior(egoTrajectory)

require monitor MetaDrivePPOUpdateState()

II. Defining Rulebook Specifications

To define the Rulebook specifications that your system should satisfy, follow the instructions in the Rules and Rulebooks sections of the documentation.

For complete examples, please refer to with_vru.graph and no_vru.graph in src/evaluation/assets/, which specify the Rulebook specifications for scenarios with and without VRUs, respectively. You can define your own specifications by modifying these graph files or creating new ones based on the same format.

III. Running Falsification

Falsification is a simulation-based formal analysis method used to systematically identify failure cases in a system under test. Before running the falsification flow, you need to prepare the following input files:

Scenic Program (*.scenic): Describes the scenario and links to the driving system under test (see I. Interfacing with Scenic).
Rulebook Specification (*.graph): Defines the hierarchical rulebook specifications (see II. Defining Rulebook Specifications).

The falsification process follows an iterative loop:

Parameter Generation: The falsifier generates a vector of input parameters.
Simulation: The simulator executes a simulation run based on these input parameters.
Evaluation: The resulting simulation trace is evaluated against the provided Rulebook specification.
Feedback: For active falsifier, the evaluation result serves as feedback to guide the falsifier in selecting the next set of parameters.
Iteration: Repeat Steps 1 to 4.

Note

In our framework, the input parameter space is defined directly within the main Scenic program using the param specifier.

Below we first provide a quick start on how to run the falsification flow, and then we detail the implementation of the flow.

Quick Start

Users can run the falsification flow on a Scenic scenario using the following command in src/evaluation:

python run_evaluation.py \
        --config-name=eval.yaml \
        hydra.job.chdir=False \
        hydra.output_subdir=null \
        scenic.file_path=SCENARIO_PATH

where eval.yaml defines the configurations of the falsification flow, and SCENARIO_PATH is the path to the Scenic program.

The eval.yaml file should locate at src/evaluation/cfgs. Users can modify the provided eval.yaml based on their needs. Here is an example alongside the description of the configuration file:

hydra:
  run:
    dir: ./outputs/${now:%Y-%m-%d}/ # The output folder of the log file
  job_logging:
    handlers:
      file:
        class: logging.FileHandler
        formatter: simple
        filename: ${hydra:run.dir}/run_evaluation.log # The output log file name
agent:
  type: 'built_in' # The driving system under test. In our scenarios, we provide three driving policies: 'built_in', 'metadrive_ppo', and 'ppo_with_built_in'. If you define your own driving policy, remember to change the type here.
scenic:
  scenario_name: 'crash_apple_082321' # Name of the scenario
  file_path: '../../scenarios/crash/crash_apple_082321.scenic' # The path to the Scenic scenario program
  map_file_path: '../../maps/Town04.net.xml' # The path to the map file
  simulator: 'metadrive' # The simulator. Our framework is mainly developed and tested on 'metadrive', but other simulators that are interfaced with Scenic can also be adopted, including 'newtonian' and 'carla'.
  max_steps_per_simulation: 180 # The maximum simulation steps of a simulation run.
  max_rejection_sampling_iterations: 2000 # The maximum rejection sampling iterations for Scenic.
  timeout: 40 # Timeout (in seconds) for each simulation run. The simulation terminates after this duration has elapsed.
  max_retries: 5 # Max retries for each simulation. The entire falsification flow terminates after this number of retries is reached.
rulebook:
  file_path: 'assets/with_vru.graph' # The path to the Rulebook specification file.
  add_reaching_goal_rule: True # Whether to add a reaching-goal rule to the Rulebook. As the reaching-goal rule is a highly scenario-dependent rule, we treat is as a standalone option in our implementation.
falsification:
  sampler_type: 'mab' # The falsification sampler. Options include 'random', 'halton', 'ce' (cross-entropy), and 'mab' (multi-armed bandit).
  active: True # Whether the sampler is active. True for 'ce' and 'mab', False for 'random' and 'halton'.
  params: # Paratemers for active samplers. Only effective when active is True.
    alpha: 0.9
    thres: 0.0
    buckets: 5
    exploration_ratio: 2 # This option is only for 'mab'.
experiment:
  rng_seed: 1 # The random seed for the entire falsification flow. 
  num_seeds: 1 # The number of random seeds to run. 
  num_samples: 30 # The number of samples to generate for each random seed.
  # Note that the total number of simulation runs will be num_seeds * num_samples.
visualization:
  record_simulation: False # Whether to record the simulation. If True, the simulation will be recorded and saved as a video file in the record_dir.
  ids: ['egoPoly', 'egoLanePoly', 'advPoly', 'advLanePoly'] # The IDs of the objects to be visualized. Each object should be recorded with the "record" statement in the Scenic program using the same ID. For example, "record ego._boundingPolygon as egoPoly".
  record_dir: ./outputs/${now:%Y-%m-%d}/ # The output folder for the recorded videos. 

Detailed Implementation of the Flow

The falsification flow is implemented in src/evaluation/run_evaluation.py. The main falsification process is implemented in the run_evaluation function. The process includes the following steps:

Initialization: The function initializes the simulator, the input parameter space and the falsification sampler, and the Rulebook specification.
Falsification Loop: The function enters a loop that iterates for a specified number of samples. In each iteration, it generates a vector of input parameters using the falsification sampler, runs a simulation with these parameters, evaluates the resulting trajectory against the Rulebook specification, and collects the evaluation results. If the sampler is active, it also updates the sampler with the evaluation results to guide the next parameter generation.
Logging: The function logs the evaluation results and other relevant information.

Falsifier

We provide several sampling algorithms for the falsifier, including random sampling (random), Halton sequence sampling (halton), cross-entropy method (ce), and multi-armed bandit (mab). The samplers are imported from the VerifAI toolkit or implemented in src/rulebook_benchmark/samplers.py.

Agent

We provide three driving policies for evaluation: built_in, metadrive_ppo, and ppo_with_built_in. The built_in policy uses the Scenic built-in rule-based policy with PID control. The metadrive_ppo policy uses a PPO-trained policy provided by MetaDrive. The ppo_with_built_in policy uses the rule-based policy for high-level decision making and the PPO policy for low-level control. To use the metadrive_ppo or ppo_with_built_in policy, users need to download the PPO policy weights from the MetaDrive repository and place the weights file in the src/evaluation/assets folder. To use your own driving policy, you can create a new agent class following the instructions in I. Interfacing with Scenic and update the agent.type field in the configuration file accordingly.