5.3 Synchronization
My
shelves are overflowing with books, including many duplicate books,
out-of-date books, and books I haven't looked at for
10 years and probably never will again. Over the years, these books
have cost me tens of thousands of dollars, maybe more, to acquire. By
contrast, two blocks down the street from my apartment,
you'll find the Central Brooklyn Public Library. Its
shelves are also overflowing with books; and over its 150 years,
it's spent millions on its collection. But the
difference is that its books are shared among all the residents of
Brooklyn, and consequently the books have very high turnover. Most
books in the collection are used several times a year. Although the
public library spends a lot more money buying and storing books than
I do, the cost per page read is much lower at the library than for my
personal shelves. That's the advantage of a shared
resource.Of course, there are disadvantages to shared resources, too. If I
need a book from the library, I have to walk over there. I have to
find the book I'm looking for on the shelves. I have
to stand in line to check the book out, or else I have to use it
right there in the library rather than bringing it home with me.
Sometimes, somebody else has checked the book out, and I have to fill
out a reservation slip requesting that the book be saved for me when
it's returned. And I can't write
notes in the margins, highlight paragraphs, or tear pages out to
paste on my bulletin board. (Well, I can, but if I do, it
significantly reduces the usefulness of the book for future
borrowers; and if the library catches me, I may lose my borrowing
privileges.) There's a significant time and
convenience penalty associated with borrowing a book from the library
rather than purchasing my own copy, but it does save me money and
storage space.A thread is like a borrower at a
library; the thread borrows from a central pool of resources. Threads
make programs more efficient by sharing memory, file handles,
sockets, and other resources. As long as two threads
don't want to use the same resource at the same
time, a multithreaded program is much more efficient than the
multiprocess alternative, in which each process has to keep its own
copy of every resource. The downside of a multithreaded program is
that if two threads want the same resource at the same time, one of
them will have to wait for the other to finish. If one of them
doesn't wait, the resource may get corrupted.
Let's look at a specific example. Consider the
run( ) method of Example 5-1 and Example 5-2. As
previously mentioned, the method builds the result as a
String, and then prints the
String on the console using one call to
System.out.println( ). The output looks like this:
DigestThread.java: 69 101 80 -94 -98 -113 29 -52 -124 -121 -38 -82 39Four threads run in parallel to produce this output. Each writes one
-4 8 -38 119 96 -37 -99
DigestRunnable.java: 61 116 -102 -120 97 90 53 37 -14 111 -60 -86 -112
124 -54 111 114 -42 -36 -111
DigestThread.class: -62 -99 -39 -19 109 10 -91 25 -54 -128 -101 17 13
-66 119 25 -114 62 -21 121
DigestRunnable.class: 73 15 7 -122 96 66 -107 -45 69 -36 86 -43 103
-104 25 -128 -97 60 14 -76
line to the console. The order in which the lines are written is
unpredictable because thread scheduling is unpredictable, but each
line is written as a unified whole. Suppose, however, we used this
variation of the run( ) method, which, rather than
storing intermediate parts of the result in the
String variable result, simply
prints them on the console as they become available:
public void run( ) {When you run the program on the same input, the output looks
try {
FileInputStream in = new FileInputStream(input);
MessageDigest sha = MessageDigest.getInstance("SHA");
DigestInputStream din = new DigestInputStream(in, sha);
int b;
while ((b = din.read( )) != -1) ;
din.close( );
byte[] digest = sha.digest( );
System.out.print(input + ": ");
for (int i = 0; i < digest.length; i++) {
System.out.print(digest[i] + " ");
}
System.out.println( );
}
catch (IOException ex) {
System.err.println(ex);
}
catch (NoSuchAlgorithmException ex) {
System.err.println(ex);
}
}
something like this:
DigestRunnable.class: 73 15 7 -122 96 66 -107 -45 69 -36 86 -43 103 -104 25The digests of the different files are all mixed up!
-128 DigestRunnable.java: DigestThread.class: DigestThread.java:
61 -62 69 116 -99 101 -102 -39 80 -120 -19 -94 97 109 -98 90 -97 10 -113 53
60 -91 29 37 14 25 -52 -14 -76 -54 -124 111
-128 -121 -60 -101 -38 -86 17 -82 -112 13 39 124 -66 -4 -54 119 8 111 25 -38
114 -114 119 -42 62 96 -36 -21 -37 -111 121 -99
There's no telling which number belongs to which
digest. Clearly, this is a problem.The reason this mix-up occurs is that System.out
is shared between the four different threads. When one thread starts
writing to the console through several System.out.print() statements, it may not finish all its writes before
another thread breaks in and starts writing its output. The exact
order in which one thread preempts the other threads is
indeterminate. You'll probably see slightly
different output every time you run this program.We need a way to assign exclusive access to a shared resource to one
thread for a specific series of statements. In this example, that
shared resource is System.out, and the statements
that need exclusive access are:
System.out.print(input + ": ");
for (int i = 0; i < digest.length; i++) {
System.out.print(digest[i] + " ");
}
System.out.println( );
5.3.1 Synchronized Blocks
Java's means of
assigning exclusive access to an object is the
synchronized keyword. To indicate that these five
lines of code should be executed together, wrap them in a
synchronized block that synchronizes on the
System.out object, like this:
synchronized (System.out) {Once one thread starts printing out the values, all other threads
System.out.print(input + ": ");
for (int i = 0; i < digest.length; i++) {
System.out.print(digest[i] + " ");
}
System.out.println( );
}
will have to stop and wait for it to finish before they can print out
their values. Synchronization is only a partial lock on an object.
Other methods can use the synchronized object if they do so blindly,
without attempting to synchronize on the object. For instance, in
this case, there's nothing to prevent an unrelated
thread from printing on System.out if it
doesn't also try to synchronize on
System.out. Java provides no means to stop all
other threads from using a shared resource. It can only prevent other
threads that synchronize on the same object from using the shared
resource.
|
resources. These threads may be instances of the same
Thread subclass or use the same
Runnable class, or they may be instances of
completely different classes. The key is the resources they share,
not what classes they are. In Java, all resources are represented by
objects that are instances of particular classes. Synchronization
becomes an issue only when two threads both possess references to the
same object. In the previous example, the problem was that several
threads had access to the same PrintStream object,
System.out. In this case, it was a static class
variable that led to the conflict. However, instance variables can
also have problems.For example, suppose your web server keeps a log file. The log file
may be represented by a class like the one shown in Example 5-12. This class itself doesn't
use multiple threads. However, if the web server uses multiple
threads to handle incoming connections, then each of those threads
will need access to the same log file and consequently to the same
LogFile object.
Example 5-12. LogFile
import java.io.*;In this class, the writeEntry() method finds the current date and time,
import java.util.*;
public class LogFile {
private Writer out;
public LogFile(File f) throws IOException {
FileWriter fw = new FileWriter(f);
this.out = new BufferedWriter(fw);
}
public void writeEntry(String message) throws IOException {
Date d = new Date( );
out.write(d.toString( ));
out.write('\t');
out.write(message);
out.write("\r\n");
}
public void close( ) throws IOException {
out.flush( );
out.close( );
}
protected void finalize( ) {
try {
this.close( );
}
catch (IOException ex) {
}
}
}
then writes into the underlying file using four separate invocations
of out.write( ). A problem occurs if two or more
threads each have a reference to the same LogFile
object and one of those threads interrupts another in the process of
writing the data. One thread may write the date and a tab, then the
next thread might write three complete entries; then, the first
thread could write the message, a carriage return, and a linefeed.
The solution, once again, is synchronization. However, here there are
two good choices for which object to synchronize on. The first choice
is to synchronize on the Writer object
out. For example:
public void writeEntry(String message) throws IOException {This works because all the threads that use this
synchronized (out) {
Date d = new Date( );
out.write(d.toString( ));
out.write('\t');
out.write(message);
out.write("\r\n");
}
}
LogFile object also use the same
out object that's part of that
LogFile. It doesn't matter that
out is private. Although it is used by the other
threads and objects, it's referenced only within the
LogFile class. Furthermore, although
we're synchronizing here on the
out object, it's the
writeEntry( ) method that needs to be protected
from interruption. The Writer classes all have
their own internal synchronization, which protects one thread from
interfering with a write( ) method in another
thread. (This is not true of input and output streams, with the
exception of PrintStream. It is possible for a
write to an output stream to be interrupted by another thread.) Each
Writer class has a lock field
that specifies the object on which writes to that writer synchronize.The second possibility is to synchronize on the
LogFile object itself. This is simple enough to
arrange with the this keyword. For example:
public void writeEntry(String message) throws IOException {
synchronized (this) {
Date d = new Date( );
out.write(d.toString( ));
out.write('\t');
out.write(message);
out.write("\r\n");
}
}
5.3.2 Synchronized Methods
Since synchronizing the entire method
body on the object itself is such a common thing to do, Java provides
a shortcut. You can synchronize an entire method on the current
object (the this reference) by adding the
synchronized modifier to the method declaration.
For example:
public synchronized void writeEntry(String message)Simply adding the synchronized modifier to all
throws IOException {
Date d = new Date( );
out.write(d.toString( ));
out.write('\t');
out.write(message);
out.write("\r\n");
}
methods is not a catchall solution for synchronization problems. For
one thing, it exacts a severe performance penalty in many VMs (though
more recent VMs have improved greatly in this respect), potentially
slowing down your code by a factor of three or more. Second, it
dramatically increases the chances of deadlock. Third, and most
importantly, it's not always the object itself you
need to protect from simultaneous modification or access, and
synchronizing on the instance of the method's class
may not protect the object you really need to protect. For instance,
in this example, what we're really trying to prevent
is two threads simultaneously writing onto out. If
some other class had a reference to out completely
unrelated to the LogFile, this attempt would fail.
However, in this example, synchronizing on the
LogFile object is sufficient because
out is a private instance variable. Since we never
expose a reference to this object, there's no way
for any other object to invoke its methods except through the
LogFile class. Therefore, synchronizing on the
LogFile object has the same effect as
synchronizing on out.
5.3.3 Alternatives to Synchronization
Synchronization is not always the best
solution to the problem of inconsistent behavior caused by thread
scheduling. There are a number of techniques that avoid the need for
synchronization entirely. The first is to use local variables instead
of fields wherever possible. Local variables do not have
synchronization problems. Every time a method is entered, the virtual
machine creates a completely new set of local variables for the
method. These variables are invisible from outside the method and are
destroyed when the method exits. As a result, it's
impossible for one local variable to be used in two different
threads. Every thread has its own separate set of local variables.Method arguments of primitive types are also safe from modification
in separate threads because Java passes arguments by value rather
than by reference. A corollary of this is that methods such as
Math.sqrt( ) that simply take zero or more
primitive data type arguments, perform some calculation, and return a
value without ever interacting with the fields of any class are
inherently thread-safe. These methods often either are or should be
declared static.Method arguments of object types are a little trickier because the
actual argument passed by value is a reference to the object.
Suppose, for example, you pass a reference to an array into a
sort( ) method. While the method is sorting the
array, there's nothing to stop some other thread
that also has a reference to the array from changing the values in
the array.String arguments are safe because
they're immutable; that is,
once a String object has been created, it cannot
be changed by any thread. An immutable object never changes state.
The values of its fields are set once when the constructor runs and
never altered thereafter. StringBuffer arguments
are not safe because they're not immutable; they can
be changed after they're created.A constructor normally does not have to worry about issues of thread
safety. Until the constructor returns, no thread has a reference to
the object, so it's impossible for two threads to
have a reference to the object. (The most likely issue is if a
constructor depends on another object in another thread that may
change while the constructor runs, but that's
uncommon. There's also a potential problem if a
constructor somehow passes a reference to the object
it's creating into a different thread, but this is
also uncommon.)You can take advantage of immutability in your own classes.
It's often the easiest way to make a class
thread-safe, often much easier than determining exactly which methods
or code blocks to synchronize. To make an object immutable, simply
declare all its fields private and don't write any
methods that can change them. A lot of classes in the core Java
library are immutable, for instance,
java.lang.String,
java.lang.Integer,
java.lang.Double, and many more. This makes these
classes less useful for some purposes, but it does make them a lot
more thread-safe.A third technique is to use a thread-unsafe class but only as a
private field of a class that is thread-safe. As long as the
containing class accesses the unsafe class only in a thread-safe
fashion and as long as it never lets a reference to the private field
leak out into another object, the class is safe. An example of this
technique might be a web server that uses an unsynchronized
LogFile class but gives each separate thread its
own separate log so no resources are shared between the individual
threads.