Skip to main content.

Multi threading basics

By Giancarlo Niccolai

This document is released under the "GNU Free Documentation License 1.2" that can be found at http://www.gnu.org/copyleft/fdl.html

To use this article or parts of it under different terms, contact the author at his web site: http://www.niccolai.ws

During my frequenting of the Usenet newsgroup comp.programming.threads, I had witnessed how it can be easy to misunderstand the basic parallel programming principles. Even the most expert developers fumble on some basic errors about the conceptual model; when we welcome a new user in the newsgroup, the questions are often about the same topics. This proves that there are some widespread misunderstandings about multi threading theory that, if not preventing from writing working applications by using system APIs, causes the resulting software to display some behavior that is correct to define as strange (and we'll see the reason why).

This article is the first of four issues in which we'll see the multi threading theory in general, the correct usage of the MS-Windows API, the basic concepts of the pthread library (which is the POSIX standard for multi threading management), and at last the Wefts++ library, an open source project developed by the author, aiming to provide a C++ isomorphic multi threading API.

With the term multi threading, we intend a programming technique based on the subdivision of a problem in discrete units, each having a lower complexity compared to the problem as a whole. The thread is an agent that manages one or many of these units, in coordination with, or independently by the main process. If a problem is so complex that its solution requires the usage of different processes, we are then in the domain of the multi process approach, which is quite dissimilar from multi threading.

With respect to the sequential programming, the multi threading model has three main advantages:

  • By breaking up the problem in simpler parts, each one being possibly treated separately and partially or completely in parallel, the multi threading program is generally simpler, more maintainable and more scalable than an analogue sequential software.

  • On modern multiprocessor architectures, a multi threading application can exploit the possibility to use more processors to solve the same problem.

  • A well structured multi threading program will exploit the "dead times" that can be encountered in various problem solution sub-phases by making other part of the process to advance, and this in a transparent way, without the need for the developers to explicitly coding this ability in the software.

These advantages may be obtained also by using multi process techniques, but multi threading is less expensive in terms of system resource usage, and is able to coordinate the activity of every agent with a set of quite efficient means. On the other hand, in multi threading all the memory of a process is seen as a unique blackboard, where every agent is free to read or write without other agents being able to know that. A thread has no mean to know if another thread is reading or writing a memory location, unless particular cares are taken, while multi process agents see virtually separated memory and resources, except for some well defined spots in which the Operating System is generally called in to act as a referee (or to provide support for concurrent operations). Because of this, it is necessary to treat multi threading programming as a separate model with respect to multi process techniques.

When to use multi threading

Before to decide to develop a software by using a sequential, multi threading or multi process approach, it is necessary to perform a deep analysis on how the parallel processing may improve the final code. There are basically three types of parallelism to be confronted against equivalent serial approaches:

  • Computational parallelism: when more than a physical processor is available, if some of the calculations may be computed in parallel, the total execution time of the application is lower. In example, this is the case of rendering software, where it is possible to render different parts of the same image in a completely parallel fashion. Generalizing, the computational parallelism can be applied when the process activity requires an heavy use of mathematical calculation applied on data structures that, although related, can be temporarily seen as independent.

  • Resource usage parallelism: when a process is in charge to access different resources that have different operational speeds, using a parallel approach may cause the process to transparently dead times to achieve progresses in other parts. In example a GUI application directly controlling a printer may transparently continue to serve the user interface while the printer processing progress. As another example, a DBMS sees the network, the local disk storage and the computational power as three different resources with different work speeds. Using parallelism, the DBMS may transparently compute results while waiting for data coming from the network and data read from disk, or it may use pauses in network I/O to flush or fetch data to/from mass storage.

  • Service parallelism: when the target application has to fulfill some service duty that may be invoked by several sources at the same time, using parallelism improves performances. We'll see in the next articles that the correct model when implementing service parallelism is not that of creating an agent for every request that must be evaded. This solution (one task, one agent) would seem natural, but in reality this would be a sure way to destabilize the application and reduce the overall system performance, unless it can be demonstrated that there is a maximum (low) amount of possible concurrent requests.

  • Control parallelism: complex applications usually require a periodic call to some functions that must be accomplished in order to maintain the process operational. In example garbage collection, log file flushing, periodic checks about the status of the operational environment (as the change of configuration files that must then be reloaded), background user interface update and so on. It is not always easy to frame this control routines in the main loop of the program, and if the temporal requirements of a certain task is critical, it is not even possible to do so. To use one or more thread to perform this control operations transparently helps to simplify the main program, and it often makes the program more responsive to external events.

If the application can gain an evident benefit from the use of one of this kind of parallelism, then it's useful to develop a multi threading application; otherwise, or when strong doubts arise, it is preferable to use sequential programming.

Once the choice to develop a multi threading application is done, it is necessary to plan the development based on rules that are quite different from the traditional ones; this because of the specificities of this methodology, that we'll see in the rest of this article. In the next ones, we'll see how this methodology can be implemented using the routines that are present in the Windows and UNIX APIs.

Multi threading principles

Multi threading programming is based on two fundamental principles:

  • Indetermination principle: in a multi threading application, the discrete state of a process (in its dynamics of concurrent agents) is undetermined.

  • Uncertainty principle: the value (or the state) of a given resource shared across agents cannot be punctually determined1.

There's a substantial difference with respect to the sequential programming: contrarily to what happens in the Von Neumann machine, it is impossible to determine the state of a process as a linear function of the previous state and the inputs.

In multi threading programming each single agent may still be seen as a Von Neumann machine, but the interaction between agents makes another complexity level to emerge, a complexity that is unknown to the previous computational theory. Complex systems are erratic, hysteric and non forecasteable by their very nature; some complexity studious (like E. Morin, I. Prigogine, J. L. Casti) define scientifically their behavior as strange2. The "strangeness" is a characteristic that is bound to complexity, so you should not be worried about the fact that your multi threading program behaves in a way that appears to be strange: it's that way because of its very nature.

Obviously, the duty of a software developer is that of writing a code that has certain functionalities, and whose behavior can be forecast in every possible situation; then how can be possible to live with intrinsic complexity that emerges in the interaction of concurrent agents making so that a process works coherently?

As every complex system existing in nature, as a cell, an organism, a society of individuals and so on, a process made up of concurrent agents becomes manageable in the moment in which the agents and the process as a whole are forced into some behaviors with bounds3. In multi threading theory, those bounds are called "synchronization primitives", or more simply "primitives". Through the action of the primitives it is possible synchronize, or coordinate4, the agents that can then form up a process whose behavior as a whole becomes determined.

There are exactly two fundamental (or primary) primitives: the mutex and the event5. Other than the two fundamental primitives a set of secondary primitives is available to grant a higher implementation variety and higher abstractions on some contexts. Those primitives are called secondary, as it is possible to express them in terms of primary primitive specialization or usage6; we'll see the most useful ones in the course of the next articles.

The mutexes (short for Mutual Exclusion) are objects that allow exactly one one agent to gain exclusive access to certain code portion. A thread tries to acquire7 a mutex before trying to execute a code sequence, and when the sequence is complete, the mutex is released. If another thread tries to acquire a mutex while another thread have already acquired it, the second one is not allowed to proceed until the mutex can be successfully acquired. This implies that the former owner must release the mutex before another thread can acquire it, but the theoretical model does not require the waiting threads to be immediately elected as the new mutex owners. There may be a random but finite delay as the threads waiting to acquire the mutex are not necessarily woken up or otherwise notified about the mutex status change; just, the OS or implementation library usually provides an efficient way to make one waiting thread to progress as soon as possible when the mutex it was waiting for is released.

The mutex allows to protect different code sections at the same time. By protecting all the code slices that are accessing to a certain shared resource with a given mutex it is possible to effectively protect that resource from concurrent access. It is evident that the correct mutex usage nullifies the effect of uncertainty: in fact this can grant that every access to a shared resource has a certain effect, which is safe (that is, not uncertain), under the hypothesis that all the accesses to a given shared resource are protected by the same mutex.

Events are objects signaling that a certain fact has happened. The agents can put themselves in wait for a certain event to happen, and/or to signal that a certain event occurred, allowing the waiting threads to proceed. In this way agents can communicate that the data they are in charge to treat are ready to be evaluated by other agents, or in other words, that their data is in a coherent state. By this, the waiting agents are informed that the process can continue only when the data they are expecting to receive is ready for evaluation. From this it can be deduced that the correct usage of the events nullifies the effects of the indetermination principle, as they grant that the calling agent is in a certain status at a certain critical moment.

Correct multi threading programming can be achieved by correct usage of events and mutexes, so to reduce the chaotic interaction of the different agents down to the deterministic Von Neumann model. In other words, Mutexes and Events are necessary and sufficient to write multi threading programs that are not behaving strangely8. Programs behaving correctly by the correct application of this principle are called threadsafe.

Threadsafe programs

There are some rules that, if correctly applied, can grant a code to be threadsafe. Some of this rules are mandatory, that is, breaking them causes the involved program to result unsafe in threading environment. Breaking of other rules is a "venial sins" that is better to be avoided, yet sinful code may not to be automatically unsafe; anyhow, the code would be put at risk, and cause errors if some particular condition arises. I use to say that "there's no shortcut to threading": breaking some of the rules may seem "appealing" as it may simplify some parts of a program, but unless the developer is extremely cautious and the rule breaking and its motivations are well documented, this will sooner or later cause problems.

Rules to write a threadsafe code may be summarized as follows:

  • All the routines that may be used concurrently by more than one thread must be reentrant. A routine is said to be reentrant when all of its operating data are a) parameters, b) local variables in the stack or c) privately allocated and deallocated heap data. So, in example, C functions using global or "static" variables are surely not reentrant, and so not threadsafe.
  • Every access to shared data must be guarded with a mutex acquisition (and subsequent release).
  • Every thread must terminate on its own initiative9. However, it is also possible for the thread to declare spots in which kind requests to be terminated are fulfilled, or even code slices in which it is possible for other threads to terminate it on their own decision. In every spot where the thread accepts kind cancellations requests incoming from other threads, and in every code slice where unkind cancellation is allowed, the thread must present itself in a coherent state. With an heavy simplification, a coherent state can be defined as a state in which every pending operation is complete and every common resource released or ready to be released.
  • Cross dependencies conditions must be avoided. If an agent depends from data prepared by another agent, that in turn depends on data deriving from the first, it is possible that both the agents will stall, waiting for the other one to prepare the needed data. The double stall is the most common situation leading to deadlock, which is a "block beyond any recovery hope". The stall may also occur when there are long chains of interdependent agents, where the first agent depends on the second agent's data, which depends on the third agent's job and so on, while finally the last agent activity depends on first agent's data. To avoid this problem it is necessary to project the application so that every agent is largely independent, or so that agent dependencies are concentrated in well defined spots and regulated by central agents acting as "referees".
  • Nested mutex acquisitions must be avoided. Sometimes, while working with one set of shared data, there's the need to act in some way on other shared resources while still holding the lock on the first data set. The temptation to acquire a second mutex when already holding one must be resisted, and substituted with other techniques; in example, caching the changes that must be done on the second data set in local variables and applying them after having released the first mutex, or by instructing another thread to do the required changes by sending it a message10, or even by enlarging the former mutex scope so to merge the two distinct data sets (if the necessity arises very often this would signal an early design flaw, as probably the two data set were not really distinct in the first place). If absolutely unavoidable, syntactic oriented usage patterns (i.e. defines or inline functions) should be used so to enforced that the roles of the inner and outer locks are never reversed. Reversing the acquisition order in two different threads would cause them both to wait for the other to release the outer mutex while waiting for the inner one, causing another deadlock condition.
  • Mutex ownership must be maintained for the smallest time possible. Namely, mutexes should acquired only to read or change one or more variables in the guarded dataset and then immediately released. Particular care must be taken to avoid crossing function call while holding mutexes, especially function calls that are known to be slow or blocking, as file or network reads. This implies that resources representing files or network sockets must not be shared among agents or, if shared, threads must coordinate to use them safely with more complex techniques than just acquiring a mutex to guard the file representation. Also, as waiting for an event would block the caller, it is not correct to wait for an event and maintaining a mutex ownership in the meanwhile. We'll see that this is a crucial aspect of the problems that Windows process SDK has in the threading area.
  • Partially deriving from the last rule: it's necessary to raise threading control to the highest possible level in the routine call hierarchy. This is rule is very difficult to be applied correctly, and it's even quite tricky to understand why this is highly desirable. With a mentality often bent to OOP, the modern developer may wish to bury the details of thread synchronization deep in a class accessor so that it can handle mutexes acquisition and release while accessing or altering shared data in a target object. Pitifully, while there may be some case where this is right, this generally leads to unsafe and inefficient code, when not to totally broken programs. The general philosophy of threading is that the deepest routines should run free from synchronization bounds. The coordination must be managed at the highest decisional levels of each agent, or very near to it, say at distance of one or two depth levels at worst.

In the fulfilling of this last rule lies the very difference between a well written multi threading program and a faulty one. While it is hard to express it here in a few words, just think about what happens in your brain when you're playing chess: you may and should think while the other player is doing its move. You two (you and your opponent) are thinking together; so, the chessboard is the shared data, your reasoning is the parallel activity that every thread must do, and moving pieces is the effect of this reasoning. You do lot of calculations to decide which piece to move and how, but there's just one thought that crosses your mind surface from time to time: "ok, my turn now". That is the only high level time spot in which coordination with your opponent plays a real role. It would be quite confusing for your reasoning if you had to check at every reasoning step if this was really your turn or not.

In the same way, it would be a quite heavy activity for a thread to be in charge to check if it has the right to mangle the data at every step. The deep activity of a thread should be free from synchronization, that should be enforced upon agent coordination which should be sporadic by hypothesis. If coordination of two or more agents is not sporadic, then probably they are better abstracted as just one agent doing more than one task in sequential fashon.

A program failing to fulfill the last rule may even work, and may even not fail, but would be extremely fragile and extremely hard to extend, fix or maintain. And anyhow, the likelihood that such a program really works right in every operating situation is quite an utopia, as it turns down the important fact that coordination is a sporadic and high-level decision for an agent against it's normal reasoning that may include deep level computations.

Copyright Giancarlo Niccolai. Rights to use the content of this article or parts of it is granted under the GNU Free Documentation License.


Footnotes

1 In other words, the value of shared variables is uncertain. There isn't any guarantee about the facts regarding a variable (or a resource state change) to be consistently known by all the agents in the moment they happen, and the values that are read at a certain moments cannot be considered valid by the moment in which is possible to evaluate them (even if this moment happens at the very next CPU cycle).

2 John L. Casti defines this behaviors exactly with the word "surprising".

3 E. Morin defines the bounds as those characteristics that a part of a system has when it is parted from it, but that it is forced not to use when it becomes part of it. In our case, the freedom of a thread to be the sole owner of the memory and other system resources must be limited when a thread becomes part of a coordinated process.

4 The term synchronization is widely used in textbooks because of the fact that, generally, primitives may force the various agents to wait, and so they "synchronize" the activity of the agents. Anyhow, there are also some primitives that can't possibly block the agents (or whose have some uses that may not block them), so I do prefer the term "coordination" to describe the primitive activity.

5 The "event" primitive must not be confused with the "Event" object found in the MS-Windows process SDK, or with the corresponding condition variables of the POSIX API. They are both special implementations of the theoretical primitive, and have special usage patterns.

6 In some API, or in some systems, some secondary primitives are implemented directly, without being based on the primary ones; the fact that the theoretical model may express the secondary primitives using the primary ones does not force implementors to follow this model.

7 The operations that can be applied to primitives are called "synchronization primitives usage semantics", or semantics for short. They could be considered akin to methods in OOP paradigm, but the focus here is more on the operation rather than on the object being involved, hence the different name. We'll see more about semantics in the next articles.

8 Actually, the two fundamental primitives are not exactly sufficient by themselves to do correct threading; there's also the need of some specific usage patterns (semantics) that we'll see in the next articles. Anyhow it is correct to say that mutexes and events are necessary and sufficient do do correct threading, just remember that there's also the involvement of specific ways to use them for this assertion to be correct.

9 It's quite tricky to understand why a thread that is running can't just be killed by the owning process. The fact is that, for the indetermination principle, the state of a thread is unknown outside it. Terminating a thread would force the caller to freeze the thread state and then preventing the target thread to proceed; but being that state unknown, it is impossible for the calling thread to know if the target thread is engaged in some crucial task, or is holding exclusively some resource that, being killed, it will never be able to release.

10 We'll see more about messages in the next articles; they look like data which is prepared by a thread and then published to a listener via event signaling.

Bibliography

  1. La Methode 1: La Nature de la Nature, E. Morin, Seuil, Paris (1977)
  2. Complexification: Explaining a Paradoxical World Through the Science of Surprise, John L. Casti, Harper Collins Publishers, New York (1995)
  3. La sfida della complessitÓ: Bocchi G. - Ceruti M. (a cura di) [1985], Feltrinelli, Milano 1994
  4. http://www.santafe.edu - The Santa Fe institute of complexity science
  5. Multithreaded Computer Architecture: R. A. Iannucci - G. R. Gao - R. H. Halstead, Kluwer International, Boston 1994.
  6. http://www.capsl.udel.edu/ - Computer Architecture and Parellel Systems Laboratory