Hey, real time !

Among the goals of the X15 operating system is real time. This expression is undoubtedly one of the vaguest buzzwords out there in the computer industry, which means I can’t really say anything about it without first attempting to provide a decent definition.

Usually, real time refers to a property of a system where the time between the occurrence of an event and the processing of that event, called the latency or lag, is guaranteed to be both short and bounded. That kind of real time is called “hard” real time, in contrast with “soft” real time which means that the design provides some ways to achieve short latencies but makes no guarantees. Simple enough, right ? So tell me, how do you prove whether a system is either “hard” or “soft” real time ?…

Far from saying that it’s not possible, I mean to insist on the fact that it’s most often extremely difficult. Unless the hardware is completely predictable (e.g. no caches, no DMA), and unless the system is extremely simple (e.g. a single task), there will necessarily be delays that do not systematically occur. In order to truly talk about hard real time, it is required that any event processing, including all possible delays, is complete before some deadline. Needless to say, most “real time” applications don’t bother proving that. Paul McKenney, one of the gurus behind Linux RCU, real time and parallelism, goes into some details to describe possible definitions of “hard” real time in the industry.

From now on, let’s just forget about this distinction, and try to come up with a reasonable definition. From what we can infer by looking at the industry, it seems that real time software needs to :

  • react as quickly as possible to events
  • keep the highest priority activity running as much as possible

Reacting quickly to events is fairly straightforward : just run a fully preemptible system. Preemption means tasks can be suspended at almost any time so that another is dispatched on the processor. It’s important to note that no practical system guarantees that a task can be preempted at any single time. System designers use preemption-based critical sections when atomicity is required across multiple instructions. It’s also important to note that, unfortunately, many systems still mix preemption with interrupts, i.e. they do not provide direct, explicit control over preemption, and disable it by also disabling interrupts.

Keeping the highest priority activity running as much as possible is harder. Systems normally provide tasks to isolate activities and make their implementation sequential, and a scheduler with at least a fixed priority policy. In addition, they provide tools such as mutexes to build preemptible critical sections, where tasks attempting to acquire a mutex already locked are suspended until the mutex is unlocked. The major problem with the combination of preemptible critical sections and tasks of different priorities is that it makes priority inversion possible, adding more potential delays to an already chaotic picture. The common reply is to extend mutexes so they support priority inheritance or ceiling, in order to prevent unbounded priority inversion. Priority inheritance boosts the priority of mutex owners to the highest priority among waiters, whereas priority ceiling unconditionally boosts the priority of tasks to a fixed value every time they acquire a mutex.

There are still quite a few things to note, however. The first I’d like to point out is the often incorrectly perceived cost of mutexes and sleep locks in general. These are considered so essential in any real time system that developers don’t question their cost, only their use against alternatives such as message queues, which is already a very good thing. It’s important to realize that most of these facilities require disabling preemption. Naturally, if critical sections in applications are comparable to those in the system, merely disabling preemption becomes an alternative to consider. If you agree that critical sections should generally be short, then you should also agree that most of them should be protected by disabling preemption. Victor Yodaiken, previously project leader of the RTLinux project (not to be confused with the PREEMPT-RT Linux patch), published a technical report in 2004 in which he argues against blindly relying on priority inheritance. I admit that I felt enlightened as I rarely did after reading this report, which I found because of doubts I had personally developed over time on the subject, leading me to actively look for answers and pointers. As a side note, the fact that an operating system can control preemption, and userspace applications usually cannot, is one of the big differences that explain why execution is considered faster in the kernel.

Another problem, quite troubling to me, is the plain lack of honesty from many system vendors who claim they provide real time systems which actually don’t conform to the two conditions mentioned earlier. This isn’t too hard to check for yourself, since the majority make their code publicly available to read. Most often, the priority inheritance implementation is incomplete in more than one way. Some don’t check the full priority inheritance tree, and merely boost the owner of a mutex, without following it to boost other potentially blocking owners of other mutexes. Others don’t track all waiters, and don’t deboost as early as possible, which means unbounded priority inversion is still possible. Others claim to do it right, yet the code looks suspiciously over-simplistic, and there is no test available to check the behaviour. The reason they seem to work is because many applications have simple locking needs, e.g. rarely need to hold multiple locks at once, which keeps complexity low and is actually good practice.

In the end, it seems many projects implement a tiny system without an explicit preemption control interface, an extremely simple scheduling policy, but with a huge number of priority levels (as if “the bigger the better” applied), a broken priority inheritance implementation, and say “Hey, real time !”.

Among those which really do it right is Linux (with or without the PREEMPT-RT patch), where the same code is also used for deadlock detection. Although the size and complexity of the well known open source kernel is often enough to keep many from using it for “embedded real time” applications, I’m personally not too surprised that it gets things right. The real problem is that there are still too many long critical sections that disable preemption, something the PREEMPT-RT patch tries to fix by essentially turning a big bunch of spin locks into mutexes, among other things.

After a few weeks thinking hard about all this, I finally chose to implement priority inheritance in the X15 microkernel, as an expensive tool against unbounded priority inversion for even more expensive critical sections. The result is an implementation of turnstiles, based on the original design found in Solaris 10. The work was indeed difficult and error-prone, making me hit two walls and rethink everything completely until I got something that looked reliable and trustworthy, something I’m now actually quite proud of, despite the associated recommendation not to use it if possible.