New Book “Autonomous Driving System Development “ in Chinese will be published soon

Yu Huang
7 min readApr 26, 2024

A book written by Professor James Yang and I in Chinese “Autonomous Driving System Development”, will be published by Tsinghua University (Beijing) Press, after May 1st Labor Day.

Professor Keqiang LI of the School of Automotive Engineering of Tsinghua University and also academicians of the Chinese Academy of Engineering, wrote the preface for us.

The book has been kindly recommended by Academician of Academy of Sciences (both China and America), Dr. Andrew Chi-Chih Yao, the unique Chinese Turing Award winner, Professor Xiaohong Guan of Xi’an Jiaotong University and academicians of the Chinese Academy of Sciences, and Professor Qiang Yang and academicians of both Canada royal academy of Science and Canada academy of Engineering, as well as Mr. Xiang LI, Chairman of Li Auto, Dr. Kai Yu, founder of Horizon Robotics Technology, and Mr. Frank (Sanchu) Han, CEO of Cariad at China (EVP of Volkswagen Motors at China).

Some top level leaders in autonomous driving field also strongly recommend this book, like Dr. Tony Han, CEO of Weride.AI, Dr. Lei Xu, CEO of Nullmax Inc, Dr. Yuanqiang (Evan) Dong, Senior Engineering Director of Autonomous Driving at Xpeng Motors, Dr. Qi Luo, Engineering Director of Autonomous Driving at Nvidia, Dr. Guang Chen, CTO of China FAW Motors AI, as well as Dr. Xuewen Chen, Chief Scientist of Xlabs at China GAC Motors.

The book includes 15 chapters, from basic theory to software and hardware architecture, core modules of autonomous driving such as perception, mapping, positioning, prediction, planning and control, as well as simulation, safety and Vehicle to X (vehicle, infrastructure and VRUs), and finally the latest neural rendering theory and generative diffusion model, etc. It is very suitable for graduate students’ thesis work and R&D engineers to solve engineering problems, a good reference book and experimental teaching material.

Some content samples extracted from the several chapters are translated in English as below:

Forword of the Book

There are basically two technical routes for the development of autonomous driving. One is jumping style, that is, direct development of L4-level autonomous driving systems, led by high-tech companies Google and Baidu. They do not have a deep understanding of the car industry, and then apply the robotic research and development methods to autonomous vehicles. Note: It is said that Google also wanted to start from the L3 level, but experiments found that testers easily trust the computer system after a period of time and lose their patience in monitoring. Most of this technical route is cost-effective and uses high-definition maps and high-precision inertial navigation positioning. However, it is difficult to operate in places without maps (there is no need to interface with navigation maps, and high-definition maps are directly used for planning). For large-scale navigation areas, high-definition maps are difficult to operate. There was a lack of understanding of the computational complexity of HD map downloading and global route planning), installing lidar, the best and most expensive sensor (visual deep learning technology was relatively rudimentary at the time), and using the most powerful computing platform (because it was not mass production, some L4 autonomous driving the company directly uses industrial computers as support and does not need to consider the engineering difficulty of platform transfer). Although it has not been targeted at any scenario for the implementation of robotaxi, in the tested high-speed and urban street scenarios, based on relatively reliable perception performance, it does provide the strongest planning and decision-making capabilities in complex traffic environments. Most of the planning and decision-making algorithms so far are based on data-driven development come from L4 companies, and several ones of them have also held competitions in this area (based on the open source data sets they provide). L4 is also far ahead in the construction of simulation platforms (Tesla basically has L4 development and implementation capabilities), including data replay (logsim) and single-point test visualization capabilities. In terms of high-definition maps, L4’s technology is also relatively mature. The sensor data collected by L4 companies, including lidar and vehicle positioning trajectories, is also higher than that of ordinary L2 companies.

The other route is progressive development, that is, starting with L2-level assisted autonomous driving systems (ADAS), and then gradually adapting to more complex traffic environments, slowly transforming from high-speed elevated highways to scenes with gates and toll stations, and then entering urban streets and local narrow roads, the level of autonomous driving systems has gradually evolved to L2+, L3, L3+ and L4 levels. Note: Recently we have seen a new trend, that is, L4 autonomous driving level development companies cooperate with L2 level autonomous driving OEMs and Tier-1 to jointly develop L2+ mass production level vehicles. The technical route of incremental development is generally adopted by OEMs and Tier-1. They will first consider cost, vehicle specifications and the ODD definition of mass production users. In the early days, Mobileye was the main supplier, and then the development models of Tesla and Nvidia became mainstream. Due to the reason of cost, most methods on this route uses cameras as the main sensor, supplemented by millimeter-wave radars that have been accepted by automobile companies. Relatively speaking, the cutting-edge level of this route (such as Tesla) basically has outstanding visual perception capabilities. Tesla even abandoned the radar in the autonomous driving perception module because of a large number of false alarm signals from the radar during fusion. In the past, ultrasonic sensors were used for automatic parking, but they are gradually being combined with fish-eye cameras to provide parking assistance, automatic parking and even valet parking applications. Tesla also uses ultrasound to provide perception of crowded traffic scenes. Perhaps due to pressure from budget limitation, the progressive development route is generally a “overestimate on perception, underestimate on high-definition map” style. Even Tesla has achieved end-to-end integration of perception and online maps and positioning in the BEV network model. As an industry leader, Tesla has slowly improved its data-driven development tool chain and implemented a data closed loop including data screening, data annotation, simulation, model iteration, scenario testing and evaluation, and model deployment. Based on the long-tail problem of autonomous driving and the uncertainty of AI models, Google has a similar framework, but Tesla has taken it to the extreme and achieved a virtuous loop on mass-produced user vehicles. It has also launched L4 level Autonomous driving version FSD.

There are basically two development stages of autonomous driving: 1.0 and 2.0. In the era of autonomous driving 1.0, a variety of sensors are used to form sensory inputs, such as Lidar, vision camera, radar, inertial navigation IMU, wheel speedometer, GPS/differential GPS, etc. Each sensor behaves differences in perception capabilities. Targeted use of multi-modal sensor fusion architecture currently uses a post-fusion strategy to filter the results of each sensor on related tasks to achieve complementary or redundant effects. There are two routes in this respsect. One is to rely on LiDAR plus high-definition maps, which is expensive and is mainly used by L4 companies such as robotaxi. The other is a road based on vision and light high-definition maps, which is low-cost and mostly L2/L2+ Autonomous driving companies’ ideas for mass production. Both routes will have highly traditional post-processing steps (especially vision), and a lot of debugging work and problems also come from this. In addition, most planning decisions at this stage adopt a rule-based approach, and there is actually no data-driven model, such as open source Autoware and Baidu Apollo. Since the L4 company operates in some fixed areas with high-definition maps, the accuracy of its sensor input is relatively high, and it has already explored data training planning and decision-making models. Relatively speaking, the L2/L2+ company has not yet established a data-driven planning and decision-making model. Module development models mostly use optimization theory solutions, generally starting from high-speed scenarios and upgrading to Tesla’s “gateway-to-gateway” model, which rarely supports complex urban autonomous driving scenarios (such as Roundabouts and unprotected left turns, etc.).

The era of autonomous driving 2.0 should be marked by data-driven, and at the same time, the perception framework on the era of autonomous driving 1.0 has also been greatly improved. The data-driven development model tends to end-to-end model design and training. For planning and decision-making, a large amount of driving data is needed to learn the driving behavior of “elder drivers”, including imitation learning of behavioral cloning, and model-based reinforcement learning ( M-RL) estimates the joint distribution of behavior-policy, etc., and no longer relies on solving optimal problems under various constraints. Trajectory prediction is an important prelude, which requires good modeling of intelligent-body interaction behavior and analysis of the impact of existing uncertainties. For perception, the era of autonomous driving 2.0 needs to consider machine learning models to replace the traditional vision or signal processing (filtering) parts, and truly achieve a development model of collecting data to solve problems. For example, Tesla’s recent BEV and Occupancy Network directly achieve the required information output through deep learning models, rather than using traditional vision and fusion theories to secondary process model output. Sensor fusion theory has also been upgraded from post-fusion to feature-level fusion in the model and even data fusion (if there is certain prior knowledge for synchronization and calibration). It can be seen here that the Transformer network plays an important role in this perception framework, and also puts forward higher requirements for the computing platform. Based on the demand for this data-driven autonomous driving platform, the design idea of ​​large models has also been introduced. Because of the acquisition of large amounts of data, including efficient data screening, automatic annotation and simulation technology assistance, it is necessary to maintain a large model as teacher on the server to support various training and iterative upgrade of small models as student deployed to the vehicle.

--

--

Yu Huang

Working in Computer vision, deep learning, AR & VR, Autonomous driving, image & video processing, visualization and large scale foundation models.