Selection of Valuable Data in Autonomous Driving’s Data Closed Loop Engine

Yu Huang
18 min readApr 28, 2023

Introduction

As is well known, autonomous driving is a “long tail” problem that requires solutions for safety-critical scenarios, but such scenarios are rare and difficult to collect. One solution is to build a “data closed-loop” system, where data selection and annotation become prerequisites for efficient operation of the closed-loop. The SOTIF standard divides security issues into four domains, namely “known safe”, “known unsafe”, “unknown safe”, and “unknown unsafe”. Here, the unknown part is often a rare case, that is, corner or anomaly cases; Especially the last domain, also known as the “Black Swan” event, requires continuous acquisition of valuable data to train autonomous driving models in order to ultimately solve safety hazards.

Tesla was the first company to explicitly propose the selection of valuable data on mass-produced vehicles, known as the “shadow mode”. It can be seen that data selection is divided into two ways: one is online style, where the trigger mode for data collection is set on the human driven vehicle, which can collect the required data most economically. It is mostly used during the mass production and business stages (note: business vehicles with safety operators usually manually trigger the collection directly); Another is the database model, which generally adopts a data mining model to clean and select incremental data in cloud server. This model is commonly used in the research and development stage, and even the data collected in the mass production stage will be subjected to secondary screening in the server-side data center. In addition, in cases where there is a significant lack of data for known scenarios or targets, a “content search” mode can also be set on the vehicle or server side to search for similar objects, scenes or scenarios data to enhance the diversity of training data and the generalization of the model.

Safety-critical scenarios generally come from expert knowledge, mostly from accident analysis [2]. Basically, methods for finding safety critical scenarios can be divided into the following categories [2]: 1) Methods of parameterless trajectory; 2) Methods of parametric trajectory; 3) Inductive reasoning; 4) Deductive reasoning; 5) Scene understanding based on sensor data. The first two methods mostly simulate accomplished cases, the third is accident analysis, the fourth is mainly analyzed from a knowledge perspective, and the fifth depends on how the scenario elements (extracted by the algorithm) is defined and evaluated in criticality.

In autonomous driving, there are equivalent or similar concepts for corner cases, such as anomaly data, novelty, outliers, and out of distribution (OOD) data. The detection for corner cases can be divided as online or offline [4]. The online catagory is generally used as a safety monitoring and warning system, while the offline catagory is generally used for developing new algorithms in the laboratory to choose suitable training and testing data. Corner cases can be defined at several different levels [4]: 1) pixel/voxel; 2) domain; 3) object; 4) scene; 5) scenario. The corner case of the last scenario level is often not only related to perception, but also involves prediction and decision planning.

The detection methods for corner cases can be divided into the following catagories [4]: 1) Reconstruction method; 2) Prediction method; 3) Generative method; 4) Confidence scores method; 5) Feature extraction method. The Confidence scoring method is further divided into three routes [4]: the learning confidence scores method, Bayesian approaches, and post-processing methods. The reconstruction method and the generative method are relatively similar, basically applied for each level. The confidence scoring method is generally supervised learning, and the Bayesian method needs to estimate the uncertainty.

From an application perspective, corner cases can be divided into as follows [5]: sensor layer, content layer, time domain layer (scenario), and application layer. The sensor layer is divided into physical and hardware levels, while the content layer is divided into domain, object, and scene levels (note: detecting corner cases in real world driving data collection and generation ofcorner cases in simulation environments are closely related [6], which involves the definition and description of simulation system scenarios and the construction of scenario libraries). Recently, some databases for corner case detection have also been published in open-source [7], facilitating research and development in this field.

OOD detection and corner case detection are similar [8], directly corresponding to the effectiveness and safety application of the algorithm model. Broadly speaking, OOD detection methods can be divided into as follows [9]: density based, reconstruction based, classification based (including one-class classification and self supervised learning), distance based (clustering and graph theory) and gradient based (meta learning) methods.

Similarly, anomaly data detection is also a means of data selection. Recently, deep learning has been applied in this area [10], learning feature representation or anomaly scoring to train detectors, but the required computational power will be relatively large, making it more suitable for work on the database server side. Anomaly data is basically divided into several as follows [10]: point anomalies, conditional (contextual) anomalies, and group (collective) anomalies. The anomaly detection methods based on deep learning can be roughly divided into as follows[10]: feature extraction, learning normal feature representation, and end-to-end anomaly scoring learning methods. The data anomaly scoring methods can be divided into sorting, prior driven, softmax likelihood model, and end-to-end one-class classification methods. On the other hand, anomaly detection can be divided into unsupervised learning and supervised learning [11]. Since anomaly samples are very rare and difficult to collect, and there is no effective supervisory information, so unsupervised learning methods are more common in real system.

The detected nomaly data can be divided into different sensor categories [12]: such as camera, LiDAR, millimeter wave radar and multi-modal data; we can abstractly define one type of anomaly in driving behavior. The recently released dataset of autonomous driving perception anomalies is basically divided into three categories [13]: real world anomaly data, anomaly data enhanced to real world from simulation, and fully simulation anomaly data.

Based on the detection methods of corner cases, OOD or anomaly data, the training platform of the autonomous driving machine learning model can adopt reasonable methods to absorb these incremental data. Among them, active learning is the most common method [14], which can efficiently use these valuable data. Among them, the value of the data is judged based on certain criteria, such as popularly used uncertainty estimates. There are two main kinds of uncertainty [15]: epistemic and aleatoric uncertainty. Epistemic uncertainty is commonly referred to as model uncertainty, and its estimation methods mainly include Ensemble method and Monte Carlo dropout method; aleatoric uncertainty is referred to as data uncertainty, and the mostly used estimation method is probabilistic machine learning (ML) based on Bayesian theory [15].

In addition, the data closed-loop development process in autonomous driving is seen as a “open world” learning process, which is a continuous learning/lifelong learning mode. The addition of newly chosen data to model training also needs to consider the “Catastrophic Forgetting or Catastrophic Interference” and “stability plasticity dilemma” issues. Here, “stability” refers to the ability to retain previous knowledge during coding, while “plasticity” refers to the ability to integrate new knowledge [16]. In a sense, nomaly data often indicates the possibility of model failure, so detecting anomalies is also a means of detecting failures/errorss. This can be used for the L3 level autonomous driving system’s early warning function.

Based on data selection and cleaning, the construction of scenario database (database of important scenarios) is the basis for providing automatic driving method test and evaluation. The test based on scenario database is a generalized safety validation & verification method of autonomous driving system (ADS). The PEGASUS project in Germany [17] provides a process for building a scenario library: from functional scenarios, logical scenarios, to concrete scenarios, three levels in total. Specifically, functional scenarios typically use natural language to describe the entities and their behaviors involved; The logical scenario specifies the state space of the functional scenario, as well as the relevant parameters, parameter ranges, and distributions; Concrete scenarios assign specific values to parameters to instantiate logical scenarios. However, this type of method often overlooks the fact that most current autonomous driving algorithms (especially perception and prediction modules) train and deploy neural network models based on the data-driven idea, and their safety-critical testing scenarios naturally come from corner cases or anomaly data in driving, which needs to be considered in building a scenario library.

The coverage and criticality of scenarios are the main indicators for safety testing in scenario libraries. In the process of constructing scenario libraries, the methods for generating safety critical scenarios can be divided into three as: knowledge-based, data-driven, and adversarial learning. The data-driven method only uses the collected data to sample directly or through the generative model. The adversarial learning method uses the autonomous driving vehicle feedback in the simulation environment deployment. The knowledge-based method mainly uses the information from external knowledge as the constraint or guidance of the generation. For example, the scenario definition is divided into three parts: scene, environment and dynamic elements. The scene is the static driving road and surrounding conditions, and the environment includes weather and road conditions, dynamic elements refers to traffic participants, traffic flow, and dynamic behavior. At present, the scenario generation guided by AI theory generally adopts adversarial methods or reinforcement learning methods, but which are not simple and practical as data-driven methods that directly use AI models to screen in the data collection stage, and the former ones also has poor interpretability.

The existing shortcomings in current data selection mechanism are listed as follows:

1. Lack of effective data collection mechanism for mass production vehicles, unable to bear the cost of collecting massive user data;

2. Most of them collect valuable corer case data for the “perception” module, and rarely involve the data requirements of downstream modules such as “prediction”, “decision-making”, and “planning” in model development and upgrading;

3. The scenario library building style of “functional scenario-to-logical scenario-to conncrete scenario” proposed by the PEGASUS project, is commonly used, howeover which lacks support for data-driven based scenario labeling and classification;

4. Lack of powerful sensor data understanding tools to effectively assist in ontime accurate detection of normaly data or corner cases.

In this article, we propose a unified data selection mechanism in autonomous driving:

1. Adopting a data selection mechanism that combines the work from both vehicle side and cloud server side, triggering “valuable” data collection online in a lightweight manner on the vehicle, and providing offline data mining and corner case data detection based on deep learning on the cloud;

2. Provide corner case detection mechanism that not only perceives, but also predicts, makes decisions, and plans, choosing “valuable” data from different stages of an autonomous driving pipeline;

3. Establish different screening mechanisms for user data in mass production vehicles and self collected data in research and development vehicles, especially taking into account of different collection costs and data modes;

4. Fully utilize automatic/semi-automatic annotation models on the server database to provide value evaluation, such as BEV perception models for LiDAR and multi camera data;

5. Based on the selection mechanism, a combination of knowledge based (predefined classification tree) and data driven methods is used to efficiently establish a scenario library for simulation testing of various modules in self driving, such as perception, prediction, decision-making, and planning.

Details of The Proposed Data Selection Method

As an autonomous driving data acquisition system, sensors such as GNSS, IMU, camera, millimeter wave radar, and LiDAR need to be set up, as shown in Figure 1. Among them, LiDAR may not be a standard configuration for mass production vehicles, so the data obtained on mass production vehicles does not include 3D LiDAR point clouds. The configuration of the cameras can observe the 360 degree environment near the vehicle, with 6 cameras, 5 millimeter wave radars, and one 360 degree scanning LiDAR (or a combination of multiple limited angle scanning LiDARs can be used to cover the 360 degree).

Figure 1 Nuscenes’ vehicle sensor configuration

If multi-modal sensor data needs to be collected, sensor calibration is required, which involves determining the coordinate system relationships between each sensor data, such as camera calibration [19], camera — LiDAR calibration [20–21], LiDAR — IMU calibration [22], and camera — radar calibration [23]. In addition, a unified clock needs to be used between sensors (GNSS as an example), and then a certain signal is used to trigger the operation of the sensor. For example, the transmission signal of the lidar can trigger the exposure time of the camera, which is time synchronized [24].

As an automated driving development platform, it requires supporting the entire data closed-loop system from the vehicle end and server cloud, including data collection and preliminary screening at the vehicle side, mining based on active learning in the cloud side database, automatic tagging, model training and simulation testing (simulation data can also be used for model training), and model deployment back to the vehicle side, as shown in Figure 2. Data selection and data annotation are key modules that determine the efficiency of data closed-loop.

Figure 2 Data closed loop architecture

The work of data screening is arranged separately on the vehicle side and the server side. The former requires triggering lightweight programs for data collection in an “online” mode on the vehicle side, while the latter performs secondary screening on the collected data on the server side, which can run towards autonomous driving algorithms in an “offline” mode. As shown in Figure 3, a block diagram of the on-vehicle data selection mechanism is provided:

1) Adopting multiple screening paths, such as content search, shadow mode, driving operations, and one-class classification;

2) Content search mode: Based on a given query, the “Scene/Scene Search” module extracts features (spatial or temporal information) from images or consecutive frames for pattern matching [25–26] to discover certain objects, contexts, or traffic behaviors, such as motorcycles appearing on the streets at night, large trucks on highways in adverse weather, vehicles and pedestrians in roundabouts, lane changing at high way, and U-turn behavior at street intersections;

3) The “Shadow Mode” module makes judgments based on the results of on-board autonomous driving system (ADS), such as detecting object matching errors in different cameras in the perception module, shaking or sudden disappearing of continuous frame detection, and strong lighting changes at entrance and exit of the tunnel, as well as the behavior of vehicles cutting in but accelerating or vehicles cutting out but decelerating in decision planning, anomaly cases such as detecting obstacles at the front but not trying to avoid them, and approaching and almost colliding with vehicles detected by rear side cameras during lane changing;

4) The “Driving Operation” module will detect anomaly from data such as yaw rate, speed, etc. obtained from the vehicle’s CAN bus, such as weird zig-zag phenomenon, excessive acceleration or braking, large angle steering or turning angle, even triggering Abrupt Emergent Breaking (AEB);

5) The “one-class classification” module generally trains anomaly detectors for the data in perception, prediction, and planning, and is kind of a generalized data-driven “shadow mode”; It trains one-class classifiers based on the normal driving data, i.e. perception features, predicted trajectories and planned paths, respectively [27–28]; For lightweight running at the vehicle end, the One-Class SVM model is used;

6) Finally, label each captured data in “Data Capture” module according to its collection path.

Figure 3 Block diagram of the on-vehicle data selection mechanism

So on the server database side, Figure 4 shows a block diagram of an offline data screening mechanism:

1) No matter whether the new data is collected from a R&D data collection vehicle or a mass-production sold user vehicle, it will be stored in a “temporary storage” hard disk for the second selection;

2) To further screen data, an autonomous driving software can run step-by-step on it (like the logsim style) and anomaly can be detected based on a serie of designed checking points; Here, autonomous driving adopts a modular pipeline, which includes the “perception/localization/fusion” module (see details in Figure 6–7), the “prediction/time-domain fusion” module (see details in Figure 8), and the “planning & decision-making” module (see details in Figure 9); Each module’s output is ia checking point for anomaly detectiion through a “one-class classification” module, which model architecture is different from that at the vehicle side; This kind of anomaly detector is more complicated because there is no real-time restriction; On the server side, a deep neural network for one-class classification can be performed, as shown in Figure 5 (details will be introduced later);

3) Similarly, like the vehicle side shown in Figure 3, another “scenario/scenario search” module directly retrieves data according to query which defines a certain kind of scenario. The algorithm/model applied will be larger in size and more time-consuming in computation, without the real time limitation;

4) In addition, data mining technology can be used; The “clustering” module will perform some unsupervised grouping methods [29] or density estimation methods [30] to generate scenario clusters; So certain data far away from the cluster centroids creates anomaly.

Figure 4 Block diagram of an offline data screening mechanism

As a one-class classifier, Figure 5 shows a typical deep learning architecture: the data passes through an encoder-decoder architecture, which consists of two modules: “feature encoding” and “feature decoding”, and then enters a CNN based “classifier” module to detect anomaly data.

Figure 5 A one-class classifier with a typical deep learning architecture

Figure 6 shows the flowchart of the autonomous driving perception (camera+LiDAR) algorithm running on the server side, which is a bird’s-eye view (BEV) network architecture that outputs map elements and road obstacles respectively. The architecture is similar to [31–32]:

1) Multi camera images are encoded through the “backbone” module, such as EfficientNet or RegNet and FPN/Bi FPN;

2) The LiDAR point cloud enters the “voxelizate” module [33] and the “feature encod” module [34] to obtain 3D point cloud features; The point cloud features are projected onto BEV through the “view transform” module [33];

3) Image features go through another “view transform” module, which is based on depth distribution [35] or Transformer [36]; Then we merge both the features together in the “Feature Concat” module;

4) Next, output from two heads: one head passes through the “BEV Obj Detector” module, similar to the PointPillar architecture [34], and outputs the BEV object box; The other head outputs the vector of map elements through the “Map Ele Detection” module and the “Polyline Generat” module [31].

Figure 6 BEV perception with LiDAR and camera

Removing the LiDAR sensor part from Figure 6 (including point cloud branch and feature merging modules), the block diagram of the pure visual perception framework is shown in Figure 7:

Figure 7 BEV perception with multi cameras

Based on BEV feature learning, taking as an example the LiDAR+camera perception in Figure 6 (pure visual perception without LiDAR is optional, shown in Figure 7), similar to BEVerse [37], the prediction module serves as an additional output head, shown in the dashed box in Figure 8:

1) The features enter the “Temporal Encod” module, which architecture can be designed either similar to the RNN model in [38] or the interaction modeler in [39], fusing multiple frame features;

2) The “Motion Decod” module comprehends spatiotemporal features, with a network architecture similar to [37] or [39], and outputs predicted trajectories.

Figure 8 prediction module based on BEV perception

On the basis of perception and prediction, taking as an example the LiDAR+camera perception in Figure 6 (pure visual perception without LiDAR is optional, shown in Figure 7), the block diagram of the planning and decision-making algorithm is is shown in the dashed box in Figure 9, similar to ST-P3 [38]: choosing a sampling based planning approach, based on the spatio-temporal BEV features output by the “temporal encod” module, train a cost function [40] in the “Plan Decod” module to calculate various trajectories generated by the sampler, and find the one with minimal cost in the “ArgMin” module; The cost function includes some terms of safety (avoiding obstacles), traffic rules, and trajectory smoothness (acceleration and curvature); Finally, the global loss function is optimized for the whole perception-prediction-planning pipeline.

Figure 9 The planning and decison making architecture based on BEV perception+prediction

Note: Mostly the control modules use mature traditional algorithms [41] and do not directly use AI-based approach for data mining and filtering.

The Scenario Library Construction Based on Data Selection Mechanism

Finally, based on the above data selection mechanisms, the flowchart of constructing a scenario library on the server side is shown in Figure 10:

  1. Knowledge-based. Specific scenario data extraction is performed on “scene/scene search” module from both the vehicle side and the server side , while the vehicle side “driving operation” module collects data for specific scenarios based on CAN bus data which depicts drivinng pattern;
  2. Data driven. Based on the AI method, special scenarios are extracted through the “one-class classification” module on both the vehicle end and the server end, and new scenarios are obtained through unsupervised learning of the collected anomaly data in the “clustering” module, and the “shadow mode” module is used on the vehicle end to extract safety critical scenarios.
Figure 10 Flowchart of constructing a scenario library based on data selection mechanism

References

  1. Tesla Tech day, April 2019

2. X Zhang, et al., “Finding Critical Scenarios for Automated Driving Systems: A Systematic Literature Review”, arXiv 2110.08664, 2021

3. J-A Bolte et al., “Towards Corner Case Detection for Autonomous Driving”, arXiv 1902.09184, 2019

4. J Breitenstei, et al., “Corner Cases for Visual Perception in Automated Driving: Some Guidance on Detection Approaches”, arXiv 2102.05897, 2021

5. F Heidecker, et al., “An Application-Driven Conceptualization of Corner Cases for Perception in Highly Automated Driving”, arXiv 2103.03678, 2021

6. D Bogdoll et al., “Description of Corner Cases in Automated Driving: Goals and Challenges”, arXiv 2109.09607, 2021

7. K Li et al., “CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving”, arXiv 2203.07724, 2022

8. J Nitsch et al., “Out-of-Distribution Detection for Automotive Perception”, arXiv 2011.01413, 2020

9. J Yang, et al.,“Generalized Out-of-Distribution Detection: A Survey”, arXiv 2110.11334, 2021

10. G Pang et al., “Deep Learning for Anomaly Detection: A Review”, arXiv 2007.02500, 2020

11. J Yang et al., “Visual Anomaly Detection for Images: A Survey”, arXiv 2109.13157, 2021

12. D Bogdoll et al., “Anomaly Detection in Autonomous Driving: A Survey”, arXiv 2204.07974, 2022

13. D Bogdoll et al., “Perception Datasets for Anomaly Detection in Autonomous Driving: A Survey”, arXiv 2302.02790, 2023

14. R Peng, et al., “A Survey of Deep Active Learning”, arXiv 2009.00236, 2020

15. D Feng et al., “A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving”, arXiv 2011.10671, 2020

16. M D Lange et al., “A continual learning survey: Defying forgetting in classification tasks”, arXiv 1909.08383, 2019

17. Menzel, T., Bagschik, G., Maurer, M. “Scenarios for development, test and validation of automated vehicles”. IEEE Intelligent Vehicles Symposium (IV), 2018.

18. W Ding et al,“A Survey on Safety-Critical Scenario Generation for Autonomous Driving — A Methodological Perspective”,arXiv 2202.02215,2022

19. Z ZHANG,“A Flexible New Technique for Camera Calibration”,IEEE Trans. on Pattern Analysis and Machine Intelligence,2000, 22(11): 1330–1334.

20. J. Levinson and S. Thrun, “Automatic online calibration of cameras and lasers.” Robotics: Science and Systems, vol. 2, 2013.

21. G. Pandey, J. R. McBride, S. Savarese, and R. M. Eustice, “Automatic targetless extrinsic calibration of a 3d lidar and camera by maximizing mutual information.” AAAI, 2012.

22. T Qin, S Shen, “Online Temporal Calibration for Monocular Visual-Inertial Systems”,IEEE IROS,2018

23. X Wang, L Xu, H Sun, et al. “On road Vehicle Detection and Tracking Using MMW Radar and Monovision Fusion“, IEEE Trans. on Intelligent Transportation Systems, 2016, 17(7):1–10.

24. H Caesar,V Bankiti,A H. Lang,et al,“nuScenes: A multimodal dataset for autonomous driving”,arXiv 1903.11027,2019

25. M Gaillard, E Egyed-Zsigmond, “Large scale reverse image search: A method comparison for almost identical image retrieval”, INFORSID, 2017

26. C. A. Ghuge, et al,“Systematic analysis and review of video object retrieval techniques”,Control and Cybernetics,49(4),2020

27. M Sabokrou et al., “Adversarially learned one-class classifier for novelty detection”. IEEE CVPR, 2018

28. P Wu, J Liu, and F Shen. “A Deep One-Class Neural Network for Anomalous Event Detection in Complex Scenes”. IEEE Transactions on Neural Networks and Learning Systems, 2019.

29. F Kruber, J Wurst, M Botsch. “An Unsupervised Random Forest Clustering Technique for Automatic Traffic Scenario Categorization”. ITSC. 2018.

30. B Nachman1 and D Shih, “Anomaly Detection with Density Estimation”, arXiv 2001.04990, 2020

31. Y Liu, Y Wang, Y Wang, H Zhao, “VectorMapNet: End-to-end Vectorized HD Map Learning”, arXiv 2206.08920, 6, 2022

32. J Huang,G Huang,Z Zhu,D Du,“BEVDet: High-Performance Multi-Camera 3D Object Detection in Bird-Eye-View”,arXiv 2112.11790, 12,2021

33. Q Li, Y Wang, Y Wang, H Zhao, “HDMapNet: An Online HD Map Construction and Evaluation Framework”, arXiv 2107.06307, 7, 2021

34. A. H. Lang, S. Vora, H. Caesar, et al. “Pointpillars: Fast encoders for object detection from point clouds”. IEEE CVPR, 2019.

35. Y Li, Z Ge, G Yu, et-al., “BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection”,arXiv 2206.10092, 6, 2022

36. B Liao, S Chen, X Wang, et al., “MapTR: Structured Modeling And Learning For Online Vectorized HD Map Construction”, arXiv 2208.14437, 8, 2022

37. Y Zhang et al., “BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving”, arXiv 2205.09743, 2022

38. S Hu et al.,“ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning”,arXiv 2207.07601,2022

39. B Jiang et al., “Perceive, Interact, Predict: Learning Dynamic and Static Clues for End-to-End Motion Prediction”, arXiv 2212.02181, 2022

40. J Philion S Fidler , “Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D”, arXiv 2008.05711, 2018

41. K Chitta,A Prakash,A Geiger,“NEAT: Neural Attention Fields for End-to-End Autonomous Driving”,arXiv 2109.04456,2021

--

--

Yu Huang

Working in Computer vision, deep learning, AR & VR, Autonomous driving, image & video processing, visualization and large scale foundation models.