The Need for Thread Synchronization
Chapter 7 showed how to create and manage worker threads, where each worker thread accessed its own resources. In the Chapter 7 examples, each thread processes a separate file or a separate area of storage, yet simple synchronization during thread creation and termination is still required. For example, the grepMT worker threads all run independently of one another, but the boss thread must wait for the workers to complete before reporting the results generated by the worker threads. Notice that the boss shares memory with the workers, but the program design assures that the boss will not access the memory until the worker terminates.sortMT is slightly more complicated because the workers need to synchronize by waiting for adjacent workers to complete, and the worker threads are not allowed to start until the boss thread has created all the workers. As with grepMT, synchronization is achieved by waiting for one or more threads to terminate.In many cases, however, it is necessary for two or more threads to coordinate execution throughout each thread's lifetime. For instance, several threads may access the same variable or set of variables, and this raises the issue of mutual exclusion. In other cases, a thread cannot proceed until another thread reaches a designated point. How can the programmer assume that two or more threads do not, for example, simultaneously modify the same global storage, such as the performance statistics? Furthermore, how can the programmer ensure that a thread does not attempt to remove an element from a queue before there are any elements in the queue?Several examples illustrate situations that can prevent code from being thread-safe. (Code is thread-safe if several threads can execute the code simultaneously without any undesirable results.) Thread safety is discussed later in this chapter and the following chapters.Figure 8-1 shows what can happen when two unsynchronized threads share a resource such as a memory location. Both threads increment variable N, but, because of the particular sequence in which the threads might execute, the final value of N is 5, whereas the correct value is 6. Notice that the particular result shown here is neither repeatable nor predictable; a different thread execution sequence could yield the correct results. Execution on an SMP system can aggravate this problem.
Figure 8-1. Unsynchronized Threads Sharing Memory

Critical Code Sections
Incrementing N with a single statement such as N++ is no better because the compiler will generate a sequence of one or more machine-level instructions that are not necessarily executed atomically as a single unit.The core problem is that there is a critical section of code (the code that increments N in this example) such that, once a thread starts to execute the critical section, no other thread can be allowed to enter until the first thread exits from the code section. This critical section problem can be considered a type of race condition because the first thread "races" to complete the critical section before any other thread starts to execute the critical code section. Thus, we need to synchronize thread execution in order to ensure that only one thread at a time executes the critical section.
Defective Solutions to the Critical Section Problem
Similarly unpredictable results will occur with a code sequence that attempts to protect the increment with a polled flag.
while (Flag) Sleep (1000);
Flag = TRUE;
N++;
Flag = FALSE;
Even in this case, the thread could be preempted between the time Flag is tested and the time Flag is set to trUE; the first two statements form a critical code section that is not properly protected from concurrent access by two or more threads.Another attempted solution to the critical section synchronization problem might be to give each thread its own copy of the variable N, as follows:
DWORD WINAPI ThFunc (TH_ARGS pArgs);
{ volatile DWORD N;
... N++; ...
}
This approach is no better, however, because each thread has its own copy of the variable on its stack, where it may have been required to have N represent, for example, the total number of threads in operation. Such a solution is necessary, however, in the case in which each thread needs its own distinct copy of the variable. This technique occurs frequently in the examples.Notice that such problems are not limited to threads within a single process. They can also occur if two processes share mapped memory or modify the same file.
volatile Storage
Yet another latent defect exists even after we solve the synchronization problem. An optimizing compiler might leave the value of N in a register rather than storing it back in N. An attempt to solve this problem by resetting compiler optimization switches would impact performance throughout the code. The correct solution is to use the ANSI C volatile storage qualifier, which ensures that the variable will be stored in memory after modification and will always be fetched from memory before use. The volatile quantifier informs the compiler that the variable can change value at any time.
Interlocked Functions
If all we need is to increment, decrement, or exchange variables, as in this simple initial example, then the interlocked functions will suffice. The interlocked functions are simpler and faster than any of the alternatives and will not block the thread. The two members of the interlocked function family that are important here are InterlockedIncrement and InterlockedDecrement. They apply to 32-bit signed integers. These functions are of limited utility, but they should be used wherever possible.The task of incrementing N in Figure 8-1 could be implemented with a single line:
InterlockedIncrement (&N);
N is a signed long integer, and the function returns its new value, although another thread could modify N's value before the thread that called InterlockedIncrement can use the returned value.Be careful, however, not to call this function twice in succession if, for example, you need to increment the variable by 2. The thread might be preempted between the two calls. Instead, use the InterlockedExchangeAdd function described near the end of the chapter.
Local and Global Storage
Another requirement for correct thread code is that global storage not be used for local purposes. For example, the ThFunc function example presented earlier would be necessary and appropriate if each thread required its own separate copy of N. N might hold temporary results or retain the argument. If, however, N were placed in global storage, all processes would share a single copy of N, resulting in incorrect behavior no matter how well your program synchronized access. Here is an example of such incorrect usage. N should be a local variable, allocated on the thread function's stack.
DWORD N;
DWORD WINAPI ThFunc (TH_ARGS pArgs);
{
...
N = 2 * pArgs->Count; ...
}
Summary: Thread-Safe Code
Before we proceed to the synchronization objects, here are five initial guidelines to help ensure that the code will run correctly in a threaded environment.
- Variables that are local to the thread should not be static and should be on the thread's stack or in a data structure or TLS that only the individual thread can access directly.
- If a function is called by several threads and a thread-specific state value, such as a counter, is to persist from one function call to the next, store the state value in TLS or in a data structure dedicated to that thread, such as the data structure passed to the thread when it is created. Do not store the persistent value on the stack. Program 12-4 and 12-5 show the required techniques when building thread-safe DLLs.
- Avoid race conditions such as the one that would occur in Program 7-2 (sortMT) if the threads were not created in a suspended state. If some condition is assumed to hold at a specific point in the program, wait on a synchronization object to ensure that, for example, a handle references an existing thread.