Skip to main content

ASIL B vs ASIL D Operating System – What is the difference?


What is the difference between an operating system that is ASIL B Compliant vs ASIL D Compliant? What does an ASIL D Operating System additionally need to provide in terms of “features” compared to an ASIL B Operating System?

Let us keep aside the process aspects of ASIL B vs ASIL D development and focus only on the technical aspects. To keep the focus on Safety, we have discussed in the context of RTOSs and not HPC OSs.

Irrespective of the ASIL level that needs to be achieved by an Operating System, there are some basic aspects that an RTOS needs to provide such as:
  • High availability and reliability - Guaranteed and correct execution of Safety tasks
  • Maximum Performance - minimal latencies for interrupts, events, tasks etc
  • Guaranteed Isolation of Safety related processes and its memory
  • Guaranteed freedom from Interference (FFI) for Safety related tasks/threads
  • Safe and reliable inter-process/inter-task/inter-thread communication
  • Error handling related to Application’s use of the OS and Internal errors in the OS and notification about the error, to take the most appropriate safe state based on the error.
Assuming a mixed criticality System, this means that this System will have higher ASIL SW Components co-existing with lower ASIL SW Components. An RTOS typically achieves FFI by  
  1. Providing the feature of Memory partitioning such that the Memory (Code, Variables, Stack) of Higher ASIL SW cannot be accessed by lower ASIL SW. The OS may use the underlying hardware in the CPU core such as the MMU or MPU to implement Process Address partitioning.
  2. Providing a rate monotonic pre-emptive scheduling policy which enables Safety tasks to be executed with a higher priority, thereby limiting its Interference from lower priority non-safety tasks.
  3. Implementing a temporal partitioning by setting budgets for execution times of tasks/interrupts and not letting the task/interrupt run beyond its budget. For e.g., Timing protection Unit (TPU) that AUTOSAR OS implements or the Adaptive partitioning scheduler (APS) in QNX OS.
If the mechanisms implemented in the RTOS are not sufficient for the System to achieve FFI, or if the mechanism cannot be enabled due to other trade-offs, the System needs to implement a solution outside the OS to achieve the same. For e.g., typically Systems using an AUTOSAR OS for ASIL-B Systems do not use the TPU and instead monitor the health of its tasks using a Watchdog Manager in AUTOSAR.

If the typical ‘safety’ expectations of an ASIL OS is the same, irrespective of whether it is an ASIL-B or ASIL-D, does it mean the exact same OS can either be developed as ASIL B or ASIL D, by choosing the corresponding methods given in the standard? Or are there subtler variations to an ASIL-B or ASIL-D that we are overlooking?
  1. ASIL-D Systems demand higher risk reduction than ASIL-B. This means that the OS must have significantly higher rigor in mitigating its internal faults as well as faults arising due to an incorrect use of the OS. To achieve this, the OS needs to implement sufficient design & verification measures to guarantee the highest possible availability, reliability and determinism, which inturn leads to a much more robust OS design and implementation.
  2. ASIL-D Systems are typically ‘fail-operational’. This means that there is a redundancy built into the system such that even if the primary system fails, a secondary system takes over or the System is still able to provide partial functionality to keep the system running till the car reaches a safe stop. Hence, in the event of a fault, the OS cannot become unavailable and needs to continue to provide the infrastructure to keep the system running. On the contrary, ASIL-B Systems are typically ‘fail-safe’. This means that if a fault occurs, the System needs to move to a defined Safe state in which case the nominal functionality of the OS may not be available. Note: The ISO26262 Standard does not state anywhere about the correlation between fail-operational, fail-safe and ASIL level. We are stating what we have commonly observed based on our experience with different Systems, and there may be exceptions to this.
Considering the aspects of fail-operation behavior, highest possible availability and reliability means than an ASIL-D OS must ‘prevent’ faults from compromising the nominal functioning of the OS, which means not to take the System to a safe state. In other words, the prevent:detect-react ratio should be significantly higher for ASIL D compared to ASIL B.

To give an example, it is usually sufficient for an ASIL B OS (depending on its System needs) to detect if a Safety priority task is interfered/starved and move to a safe state, whereas for an ASIL D OS, it is typically not sufficient to only detect, but even prevent the problem from happening in the first place. To state this in an over-simplified way, the OS kicks out the culprit task that has exceeded its budget and is delaying the Safety task and lets the Safety task run as per its expected periodicity requirements. 

This is one of the reasons why solutions like TPU or APS are used a lot in ASIL-D Systems, while we have not seen them used in ASIL-B Systems, though there is no rule that one shouldn’t use these mechanisms for ASIL-B. The Integrator of OS also needs to evaluate the ‘Safety benefits’ of integrating a ‘highly prevent’ approach for an ASIL B System against the trade-offs of doing so. For e.g., will a ‘highly prevent’ approach incur a lot more performance cost to the System, and will that affect the reliability of the System? If so, there is no point in integrating such a solution because an unreliable but safe system is finally of no use to the end customer.

Conclusion

To summarize, an RTOS developed for ASIL-D Systems needs to guarantee the highest possible availability and reliability together with spatial and temporal partitioning as compared to an RTOS developed for upto ASIL-B. ‘Highest possible’ in this context means that the ASIL-D OS should completely prevent Systematic Software faults [arising from within the OS and Application’s use of the OS] from affecting the availability and reliability of the System, and the responsibility of this lies with both the RTOS developer as well as Integrator. 

However, nothing stops an RTOS developer from developing the same set of requirements and providing the same source code for ASIL-B and ASIL-D and differentiating only based on process aspects. It is finally a question of whether it meets the needs of the System that is using it.