The first part of the blog explained what the standards say about machine learning in safety critical systems. In this part I will share my thoughts on some problems using traditional approaches of ensuring software integrity when using machine learning in safety critical systems. Traditional approaches include a creating a detailed specification, exhaustive unit testing to achieve structural coverage of the source code and comprehensive functional testing to cover the specification.
The first problems with machine learning methods has its origins in that it is based on data and learning. It seems quite self-evident that the machine learning system can only perform as well as the data we have available. Ensuring that the data is complete, unbiased and has not been intentionally tampered with are a few the challenges we face when applying machine learning. A related challenge is that we neither have a specification nor do we understand why we get a particular output for a given input. Machine learning learns from data, and in a way it is a benefit that a specification is not needed, as machine learning works really well in cases where we are unable to write detailed specification. This, however, goes against decades of best practice in safety engineering. The learning approach also obfuscates the implementation of the system and our understanding of why we get certain result. Again, this is a problem for traditional safety engineering where fully understanding the system is an important criterion for determining its safety. The research into explainability for machine learning is still at its early stages.
A second set of problems with machine learning and safety critical systems is related to robustness and scalability. The feasibility of the traditional methods of exhaustive structural testing and comprehensive functional tests is that it is desirable that safety critical systems have a limited input space. For machine learning systems, the input space is usually extremely large making any sort of exhaustive testing essentially impossible. This scalability problem is already a challenge for complex deterministic control systems, but for machine learning systems the problem is an order of magnitude worse. A related problem to the scalability problem is the robustness problem. We have many examples where given an input x, we get a completely different output result from the machine learning system if we perturb x with a small d. This has implication for both safety and security of the machine learning system.
Based on the above it is clear that without extension, the traditional safety engineering methods will not provide a path forward for using machine learning in safety critical systems. There are a few uncontroversial ways to use machine learning in safety critical systems, where probably the most well-known method is the safety bag approach. In this technique you have an independent external monitor checking the results of machine learning component. If the machine learning component makes an illegal action or exceeds some pre-defined safety limits the monitor component intervenes. With this approach the monitor component is the ultimate guarantee of safety and the requirements on machine learning component can be relaxed. The downside is that you loose most of the flexibility that comes with a machine learning approach and therefore this approach is feasible in only few cases.
Currently, applying machine learning in a safety critical system is risky, and it takes an in-depth understanding of both the problem that is being solved and machine learning technology. In limited contexts there have been successes, and there is a big push to go further. This is a very hot topic for safety researchers and I am encouraged by initiatives such as the AI Safety Landscape .
Huld is and will remain committed to discovering the limits of machine learning, in a safe way.
Timo Latvala