In this blog, we present a simple yet comprehensive approach of how to write a high-quality Technical Safety Requirement (TSR) document, which considers all the Safety relevant aspects of the System and sets the right direction for the Software and Hardware teams.
To begin with, it is important to understand what is the difference between an FSR and TSR. FSR (Functional Safety Requirements) describe the WHAT, i.e., WHAT must be done to achieve Safety Goals. TSR describes the HOW. i.e., How the Safety requirements should be achieved. It describes the technical realization of the Functional Safety Requirements of the project. TSR is the starting point for SW and HW Safety.
For a specific item, there are 8 topics that TSR should cover. The topics are:
1. Intended Functionality
2. Fault Handling
3. Graceful degradation and Safe State
4. Freedom from Interference and Independence
5. HW Metrics
6. Special cases
7. Production and Service
8. Fault Injection Testing
1. Intended Functionality - What are the Functional Requirements that must work as expected?
These are the requirements that are provided directly and/or agreed with the OEM on what is the Intended functionality related to a Safety goal.
For e.g., if a Safety goal is that "Brake telltale should be activated as requested", then the Intended functionality requirements describe when the telltale should be turned ON or OFF depending on the Vehicle Network, Vehicle State and/or HW Input conditions.
4. Freedom from Interference and Independence
These are the requirements for achieving Freedom from Interference (FFI) from QM/Lower ASIL parts of the System and Independence for ASIL parts of the System.
Typical examples include the use of Memory protection, timing protection, peripheral protection, SW Watchdogs to perform logical and temporal monitoring and E2E protection.
FFI and Independence requirements must specify the following aspects:
For e.g., if a Safety goal is that "Brake telltale should be activated as requested", then the Intended functionality requirements describe when the telltale should be turned ON or OFF depending on the Vehicle Network, Vehicle State and/or HW Input conditions.
Intended functionality requirements must specify the following aspects:
- What are the Input conditions relevant to the Safety goal? Consider the System boundary while specifying the Input conditions. Consider different System inputs that may contribute to different behavior. E.g., Voltage level, startup or run time, etc
- Under the specified input conditions, what is the Expected output for the Safety goal? What is the required activation/response time from Input until Output?
- What are the Confirmation requirements for input for e.g., if an input needs a de-bouncing
- What is the required ASIL for the path of the Intended functionality?
TSR document must contain the Intended functionality requirements for all the Safety goals.
2. Fault Handling - What are the Safety mechanisms that must be implemented to detect faults?
These are the requirements for the Safety mechanisms that have to be implemented to detect the failures in the forward functional path that achieves the Safety goal.
For the above mentioned e.g. of the brake telltale Safety goal, the Safety Mechanism requirements describe what is the mechanism by which we detect whether the telltale is actually ON or OFF.
Fault Handling requirements must specify the following aspects:
- How is the fault in the expected output for the Safety goal detected?
- How is the specified fault mitigated/prevented? How is the fault confirmed? Here, we consider aspects such as de-bouncing of the fault, redundant cross-check of the fault against another input, etc
- How are common cause faults detected?
- How are faults in the Input conditions of the System detected?
- If there are multiple faults simultaneously and related Safety mechanisms running, what is the arbitration strategy?
TSR document must contain the fault handling requirements for all the Safety goals.
3. Graceful degradation and Safe State - What is the Safe state if the Safety mechanisms detect a fault?
These are the requirements for the Safe state/Degraded state for the Safety goal and related timings.
In the case of our example, the Safe State could be to indicate an Alternate telltale, a display message, a chime, etc. It is a requirement that is given by/agreed with the OEM.
Safe State requirements must specify the following aspects:
- For faults in the expected output, e.g., if a telltale is not turned ON as expected, what is the Safe state? Does the System enter a degraded state first before triggering the Safe State?
- For faults in the expected output, what is the fault detection time interval? Or how many samples of the confirmed fault should be checked before throwing a Safe state?
- For faults in the System input conditions, what is the Safe state? Does the System enter a degraded state first before triggering the Safe State?
- For faults in the System input conditions, what is the associated fault detection time interval?
- What are the safe state(s) for common cause faults?
- What is the fault detection time interval for every common cause fault?
- What is the Safe state for latent faults?
- What is the multi-point fault detection time interval (MPF-DTI) for latent faults?
- If there are multiple faults simultaneously with different safe states, what must be the final state?
These are the requirements for achieving Freedom from Interference (FFI) from QM/Lower ASIL parts of the System and Independence for ASIL parts of the System.
Typical examples include the use of Memory protection, timing protection, peripheral protection, SW Watchdogs to perform logical and temporal monitoring and E2E protection.
FFI and Independence requirements must specify the following aspects:
- How can the System achieve FFI in Timing and Execution?
- How can the System achieve FFI in Communication? Here, all aspects of communication must be considered. i.e., communication with external ECUs, Other complex HW devices that communicate with the MCU, communication between different MCUs within the System, communication within an MCU between different layers of SW, etc
- How can the System achieve FFI in Memory? Consider different types of memories in which Safety relevant data will be stored and accessed.
- How can the System achieve Independence?
5. HW Metrics
These are the requirements that are needed to achieve the required HW Metrics needed for that ASIL or is required by the OEM. Note that these are in addition to the earlier defined requirements for fault handling.
The first question to be asked is "What are the HW Metrics for SPFM, LPFM and FIT that must be achieved?" With the current fault handling requirements, what is the HW Metrics achieved? What are the requirements that must be implemented over and above what is existing to reach the required HW Metrics?
Depending upon which faults contribute to a high FIT as per FMEDA, the Safety mechanisms have to be chosen. Consider the Safety mechanisms provided by the micro-controller vendor such as self-tests, startup/shutdown tests for CPU, Power, Memory, etc. to detect latent faults.
Depending upon which faults contribute to a high FIT as per FMEDA, the Safety mechanisms have to be chosen. Consider the Safety mechanisms provided by the micro-controller vendor such as self-tests, startup/shutdown tests for CPU, Power, Memory, etc. to detect latent faults.
6. Special Cases
This is 1 topic one might want to consider describing in the TSR, but quite often it is not. There are usually some system scenarios defined by the OEM in which a functional requirement takes priority over a Safety requirement, resulting in over-riding the Safety output state. Such use cases must be clearly identified and the required course of action discussed with the OEM. Depending on the criticality of the use case, if functionality has to be still prioritized, the consequence of compromising Safety must be agreed with the OEM and must be stated as requirements in the TSR.
Consider the following areas for identification of these use cases:
- Diagnostics/Special Test Modes in the manufacturing line
- Power failure (Overvoltage, Undervoltage)
- Higher priority output overwriting Safety output. E.g., a higher priority QM/ASIL Warning output in HMI based Systems
- HMI Requirements such as animation while we are expected to show safety icons/warnings on display
- Special system modes and/or wake up scenarios
7. Production and Service
Even if the Hardware and Software are developed in ASIL compliance and all the necessary mechanisms put in place, things can still fail during the production process, for e.g., during HW assembling or programming of SW. Hence it must be ensured that requirements are in place to ensure that there are no failures introduced during the production process; that the finally produced item is Safety compliant. The requirements must be defined taking into account every step in the production process, identifying what potentially could go wrong there and adding additional safety mechanisms if none exists already. There shall also be fault injection tests in place to purposely introduce faults and check if the production process detects and rejects the faulty item.
If there are Safety failures during the actual operation in the vehicle that could not be corrected and the item has to be taken to the service station for diagnosis, there shall be requirements in place to help the Service Engineer to read out the details of the fault, to ease diagnosis. There shall also be requirements to ensure that the personnel performing the service or decommissioning is not harmed while doing the process.
8. Fault Injection Testing
While defining requirements, it becomes extremely important to think also about how to test them. Fault Injection tests are extremely crucial to verify the Safety mechanisms defined in the TSR and must be performed during every integration of Software. It is a myth that if no changes have been made in the Safety SW or HW, fault injection tests need not be repeated. However, the fact is that even if the Safety-related elements have not changed, the rest of the system has changed and may inject a fault.
Hence, TSR has the responsibility to define requirements that help and enable teams to execute fault injection tests more frequently, and potentially to automate the whole testing process. A commonly used approach is by introducing various diagnostic identifiers to test the different Safety mechanisms.
Over and top of all that was said so far, TSR must also document
- the ASIL level of the requirements. If ASIL Decomposition was performed, the corresponding decomposition should be reflected in the ASIL level of the decomposed requirement(s)
- Assignment of every requirement to HW or SW
- At which level every requirement is verified, and how it should be verified
By considering the guidelines stated in the blog, one will be able to develop a high-quality TSR, with the correct and complete hardware and software safety requirements identified right in the beginning of the project, setting the right course for development!