Interview with Guillaume Rémond, co-founder and CTO of Life Plus
Guillaume Rémond is the co-founder and CTO of Life Plus. Life Plus offers prevention and assistance solutions for elderly people. The main product is a smart watch, which continuously measures a person’s activity and can detect unusual behaviors or falls. It can also trigger manual alerts and locate the person inside or outside a predefined area.
At Life Plus we have two main roadmaps. One is driven by technical features such as security, and improvements in terms of stability and product autonomy. The other includes additional functionalities driven by the customer which could be for example a new user interface or a new feature. We then have a process to prioritize both and dispatch the new development tasks to developers.
However, before the development stage, we run an impact analysis, to assess the impact of the feature on the new development on the product’s software base. Depending on the outcome, all development is done inside a pull request. This way another developer can check the new development proposed before committing the changes. It is a good way for us to understand what it does and double check before committing.
Then, everything is committed to a dedicated branch of development. The code is pushed every night and all the products are upgraded every day with the new features. Our developers are constantly using the product so if there is a major issue we will see it. For small issues we have an analysis every week to make sure we get all the messages sent by the watch properly, have the correct product autonomy and expected behavior, and so on.
Yes, we are currently focusing on functional tests with one test bench which is one test watch running continuously and upgrading, and the tests are done automatically so we can simulate a person walking, or triggering the button. It is an automatic test in fact, with a dedicated script that will set into motion a defined set of actions each time a new function is added to validate that the program runs according to specifications. This is done not only on the software but also the hardware, directly on the product . For example, we can trigger the button automatically.
In addition to automated tests, at the end of each screening, we always run manual tests. It takes one or two more days but we do it every time, before committing the new development. That’s because the more tests you have, the more likely you will be to find errors that could adversely affect your product.
I think C language is the most common language in embedded systems, so for me it’s natural to use the C language more than another. Also I am already very familiar with the C language and I like it a lot because it does what it is expected to do. It is not object-oriented or event-driven and for me it was a logical choice for this product.
It’s true that with the C language, there can be memory management issues that show up as undefined behaviors later on in the development process (or worse, in production). That’s why it’s important to take precautions to limit these kinds of errors and any undefined behavior that could lead to an adverse impact on the product and thus on the client.
Because we have been using this language for several years, we have a lot of libraries for operating systems, file systems, memory management and so on. C is the base but we added a lot of libraries on top of it helping us to develop the product very easily. For example we are using FreeRTOS for the operating system, LittleFS for the file systems and we have in-house libraries to handle memory management protection, and also for timers. We also test regularly before each commit to ensure the proper functioning of new code.
Unfortunately, yes. We are always facing issues. The most common issue in embedded systems is hard fault, which is usually based on a bad pointer issue. To handle this we implemented hard fault detection procedures. When we trigger a hard fault we are able to go back to the program and check what is triggering it. If it’s not possible we have an assert protection so we can check all the addresses and buffers to make sure the addresses are in the right range and the program is not accessing other locations that it is not supposed to write or read.
Another common issue is allocating a wrong buffer size in a malloc. It’s very difficult to handle because it does not trigger a hard fault. It just behaves strangely: if for example you allocate 10 bytes in a malloc and you write 12 bytes it will work, but you will write on the next buffer that you might allocate later for something else. So, this is very difficult to manage but as I mentioned at Life Plus we built libraries on top of the malloc and freeRTOS to be sure we do not write before or after the allocated malloc buffer.
We catch it generally during the development stage and during the tests later because if we just want to catch a particular kind of issue we log it and we analyze the log later to see where it happened and fix it.
I think that code security is more based on good practices. If you follow them you should avoid major security issues, of course not all of them but you should cover most of them. For example for the malloc problem, to avoid allocating the wrong buffer size, we built a library on top of it so this library helps us to avoid this kind of issue. So to make sure a problem will not happen later, each time we face a new problem we build something, whether it’s an automated test for the issue to be sure it will not be reproduced later, or a protection like a library.
We try to follow MISRA C 99. In addition to MISRA we try to follow good practices internally. There should be some form of code verification to ensure there are no undefined behaviors left in the code. Code format is very important also, for example how to use the curly brackets properly. Even if you write only one line you should put a curly bracket.
Compared to 10 years ago, we now have many tools to analyze memory process charges, so it’s easier right now. What is painful I think is more so how to simulate code locally on your computer before running it on the device, which is for me the more challenging part. I will give you an example. If you have a machine learning algorithm in Python and you are just putting it in C so you have a translation, it’s very difficult to guarantee that the algorithm in C is doing the same thing that you have written in Python just before.
The problem is that it will be run on a PC where you have a lot of memory and a big processor, but when you put it on an embedded system it’s a problem because you have a very small RAM memory. You do not have not the same latency when you’re writing on the flash, you don’t have the same processor, you are depending on the behavior around it to signal an interruption, etc. This is for me the most challenging, when you have many components involved. For example when you have a UI like the one on the smart watch, and you push the button, you want it to respond immediately. When the user is triggering the button you don’t want the product to vibrate one second later or the screen to change one second later.
It is difficult to perfectly replicate on a computer the way a device behaves in real life. There are so many different parameters to consider: acceleration, vibrations, temperatures, humidity etc. Everything can impact the hardware in ways that are difficult to reproduce exhaustively in a simulated software environment.
That’s something that comes when you are mature with embedded systems. Running code once on a computer is easy, but running it 24/7 on several thousand products is another thing, because the products must never fail in the real world. Even if they fail, they need to recover immediately, especially for medical devices. Therefore it is essential to implement good watchdog functionalities to detect failures and immediately recover.Yes, we are facing all the time new bugs in the field which are not happening here when we test. If we have very big issues that we are not able to reproduce, the process at Life Plus is such that we ask for the product back and then we analyze it, because sometimes it could be a hardware issue, which is why it could be happening only on one product.
It is difficult to test for every possible input that could lead to an undefined behavior in the code that causes an adverse behavior in the product. It could take a lifetime to test everything without a tool that can generalize these tests. TrustInSoft for example offers a tool that allows for an exhaustive testing of test values, going further than testing tools alone and maximizing the utility of unit and integration tests. This approach can help eliminate strange behaviors due to errors in the C code.
The good advice I should give is that when you think it won’t happen, it will always end up happening! The most difficult part in embedded systems is triggering and handling future uncommon errors. So it could be like you said unexpected values from a sensor. Probably it will happen. So how can we handle it and what should we do to recover from this problem if the sensor is providing these kinds of unexpected values? Reset? Reboot? Trigger an error so the process stops? The most difficult thing is how to react properly when we are facing strange behavior.
Yes, but the easiest way should be to also have a kind of recovery state if you are facing unexpected behavior somewhere, to be able to go back to the recovery state, reset everything, and restart. You should not be stuck somewhere.
With unit tests, we may not include all values because there could be all sorts of strange behavior but we are limited to test what seems logical to us, although it is clearly not exhaustive.
Coding something as a demo is completely different from coding something connected that needs to work 24/7. That’s the point of TrustInSoft, to apply more critical security analysis with unlimited test values to be sure that people don’t code mistakes that could lead to security issues.
Thank you Guillaume and all the best to Life Plus.