Tuning SMP Performance with CS Spin Counts
CRITICAL_SECTION locking (enter) and unlocking (leave) are efficient because CS testing is performed in user space without making the kernel system call required by a mutex. Unlocking is performed entirely in user space, whereas ReleaseMutex requires a system call. CSs operate as follows.
- A thread executing EnterCriticalSection (ECS) tests the CS's lock bit. If the bit is off (unlocked), then ECS sets it atomically as part of the test and proceeds without ever waiting. Thus, locking an unlocked CS is extremely efficient, normally taking just one or two machine instructions. The owning thread identity is maintained in the CS data structure, as is a recursion count.
- If the CS is locked, ECS enters a tight loop on an SMP system, repetitively testing the lock bit without yielding the processor (of course, the thread could be preempted). The CS spin count determines the number of times ECS repeats the loop before giving up. A single-processor system gives up immediately; spin counts are useful only on an SMP system.
- Once ECS gives up testing the lock bit (immediately on a single-processor system), ECS enters the kernel and the thread goes into a wait state, using a semaphore wait. Hence, CS locking is efficient only when contention is low or when the spin count gives another processor time to unlock the CS.
- LeaveCriticalSection is implemented by turning off the lock bit, after checking that the thread actually owns the CS. The kernel must also be notified if there are any waiting threads, using ReleaseSemaphore.
Consequently, CSs are efficient on single-processor systems if the CS is likely to be unlocked, as shown by the CS version of Program 9-1. The SMP advantage comes from the fact that the CS can be unlocked by a thread running on a different processor while the waiting thread spins.The next steps are to show how to set spin counts and how to tune an application by determining the best spin count value. Again, spin counts are useful only on SMP systems; they are ignored on single-processor systems.
Setting the Spin Count
CS spin counts can be set at CS initialization or dynamically. In the first case, replace InitializeCriticalSection with InitializeCriticalSectionAndSpinCount, where a count parameter is added. There is no way to read a CS's spin count, however.
VOID InitializeCriticalSectionAndSpinCount (
LPCRITICAL_SECTION lpCriticalSection,
DWORD dwCount)
You can change a spin count at any time.
VOID SetCriticalSectionSpinCount (
LPCRITICAL_SECTION lpCriticalSection,
DWORD dwCount)
The Microsoft documentation mentions that 4,000 is a good spin count for heap management. The best value is, however, application specific, so spin counts should be adjusted with the application running in a realistic SMP environment. The best values will vary according to the number of processors, the nature of the application, and so on.TimedMutualExclusionSC is on the book's Web site. It is a variation of the familiar TimedMutualExclusion program, and it includes a spin count argument on the command line. You can run it on your host processor to find a good value for this particular test program on your SMP systems, as suggested in Exercise 92.
