[COLUG] Apache 1.3.29 vs Apache 2.0.48

Josh Glover colug at jmglov.net
Tue Mar 16 17:07:32 EST 2004


Quoth Rob Funk:

> tom hanlon wrote:
>
>> The negative is that all modules need to be
>> thread safe. As a non programmer I do not have a grasp on how big of a
>> headache this is.
>
> Huge headache.

In my experience, code that is hard to make thread-safe is not so well
designed. For those who are not familiar with the difference between a
multi-threaded program and a multi-process one, it boils down to this:

Threads share the same address space, and child processes do not. [1]

What this means it that multi-process apps must communicate via IPC (Inter
Process Communication) methods such as pipes, shared memory, the filesystem,
etc, whereas multi-threaded apps can communicate via data structures. However,
this ability to share data between threads means race conditions emerge. A
race condition is a point in your program where two (or more) separate threads
are attempting to access the same data structure. Consider this simple
example: [2]

state = machines[i]->state;
state++;
machines[i]->state = state;

Suppose thread A and thread B both attempt to execute this line of code. Say
machines[i]->state starts out as 3. One possible outcome is that thread A
fetches the state (state == 3), adds one to it (state == 4), and stores it
(state == 4). Then thread B does the same thing, so machines[i]->state ends up
as 5. But in multi-threaded programming, you must consider *all possible
interleavings* of non-atomic instructions. So another possible outcome is
this:

A: state = machines[i]->state; // 3
B: state = machines[i]->state; // 3
A: state++;                    // 4
A: machines[i]->state = state; // 4
B: state++;                    // 4
B: machines[i]->state = state; // 4

This is a very simple example of a race, but one that illustrates the basic
concept.

To prevent races, you must have some way of locking a variable (or data
structure) so that one thread can execute a series of instructions that must
be atomic. Luckily, modern operating systems give us synchronisation
constructs that allow us to write thread-safe code.

Writing thread-safe code is a huge pain when you have a lot of global data
structures. If you write modular code, you will find thread safety not so
difficult to implement. In fact, Java makes it down right simple: there is a
keyword ("synchronized", IIRC) that you add to a method and then the mutex is
done for you.

Multi-threaded code *is* faster, if done right, because it tends to keep the
data structures "cache-hot", and IPC has more overhead than direct
communication.

-Josh


[1] Of course, Linux (and probably other Unices) has a performance hack called
    "copy on write", where child processes start out sharing the parent's
    address space until either the parent or the child writes to that space,
    at which time a separate space is created for the child. However, this does
    not change the way multi-process programs work.
[2] Which could also be written as machines[i]->state++, but that does nothing
    to alleviate the race. Three lines of code just makes the race easier to
    see.

-- 
Josh Glover

GPG keyID 0xDE8A3103 (C3E4 FA9E 1E07 BBDB 6D8B  07AB 2BF1 67A1 DE8A 3103)
gpg --keyserver pgp.mit.edu --recv-keys DE8A3103


More information about the colug mailing list