11.3 Garbage Collection
As explained in Chapter 4, JavaScript uses
garbage collection
to reclaim the memory occupied by strings, objects, arrays, and
functions that are no longer in use. This frees you, the programmer,
from having to explicitly deallocate memory yourself and is an
important part of what makes JavaScript programming easier than, say,
C programming.
A key feature of garbage collection is that the garbage collector
must be able to determine when it is safe to reclaim memory.
Obviously, it must never reclaim values that are still in use and
should collect only values that are no longer reachable; that is,
values that cannot be referred to through any of the variables,
object properties, or array elements in the program. If you are the
curious type, you may be wondering just how a garbage collector
distinguishes between garbage to be collected and values that are
still being used or that could potentially be used. The following
sections explain some of the gory details.
11.3.1 Mark-and-Sweep Garbage Collection
The computer science literature on
garbage collection is large and
technical; the actual operation of the garbage collector is really an
implementation-specific detail that may vary in different
implementations of the language. Still, almost all serious garbage
collectors use some variation on a basic garbage-collection algorithm
known as "mark and sweep."
A mark-and-sweep garbage collector periodically traverses the list of
all variables in the JavaScript
environment and marks any values referred to by these variables. If
any referenced values are objects or arrays, it recursively marks the
object properties and array elements. By recursively traversing this
tree or graph of values, the garbage collector is able to find (and
mark) every single value that is still reachable. It follows, then,
that any unmarked values are unreachable and are therefore garbage.
Once a mark-and-sweep garbage collector has finished marking all
reachable values, it begins its sweep phase. During this phase, it
looks through the list of all values in the environment and
deallocates any that are not marked. Classic mark-and-sweep garbage
collectors do a complete mark and a complete sweep all at once, which
causes a noticeable slowdown in the system during garbage collection.
More sophisticated variations on the algorithm make the process
relatively efficient and perform collection in the background,
without disrupting system performance.
The details of garbage collection are implementation-specific, and
you should not need to know anything about the garbage collector to
write JavaScript programs. All modern JavaScript implementations use
some kind of mark-and-sweep garbage collection. However, JavaScript
1.1, as implemented in
Netscape 3, used a somewhat simpler
garbage-collection scheme that has some shortcomings. If you are
writing JavaScript code to be compatible with Netscape 3, the
following section explains the shortcomings of the garbage collector
in that browser. Netscape 2 used an even simpler garbage-collection
technique with serious flaws. Since that browser is now entirely
obsolete, the details are not described here.
11.3.2 Garbage Collection by Reference Counting
In JavaScript 1.1,
as implemented in Netscape 3,
garbage collection is
performed by reference counting. This means that
every object (whether a user object created
by JavaScript code or a built-in HTML object created by the browser)
keeps track of the number of references to it. Recall that objects
are assigned by reference in JavaScript, rather than having their
complete values copied.
When an object is created and a reference to it is stored in a
variable, the object's reference count is one. When the
reference to the object is copied and stored in another variable, the
reference count is incremented to two. When one of the two variables
that holds these references is overwritten with some new value, the
object's reference count is decremented back to one. If the
reference count reaches zero, there are no more references to the
object. Since there are no references to copy, there can never again
be a reference to the object in the program. Therefore, JavaScript
knows that it is safe to destroy the object and garbage collect the
memory associated with it.
Unfortunately, there are shortcomings to using reference counting as
a garbage-collection scheme. In fact, some people don't even
consider reference counting to be true garbage collection and reserve
that term for better algorithms, such as mark-and-sweep garbage
collection. Reference counting is a simple form of garbage collection
that is easy to implement and works fine in many situations. There is
an important situation, however, in which reference counting cannot
correctly detect and collect all garbage, and you need to be aware of
it.
The
basic flaw with reference counting has to do with cyclical
references. If object A contains a reference to object B
and object B contains a reference to object A, a cycle of references
exists. A cycle would also exist, for example, if A referred to B, B
referred to C, and C referred back to A. In cycles such as these,
there is always a reference from within the cycle to every element in
the cycle. Thus, even if none of the elements of the cycle has any
remaining outside references, their reference counts will never drop
below one and they can never be garbage collected. The entire cycle
may be garbage if there is no way to refer to any of these objects
from a program, but because they all refer to each other, a
reference-counting garbage collector cannot detect and free this
unused memory.
This problem with cycles is the price that must be paid for a simple
garbage-collection scheme. The only way to prevent this problem is by
manual intervention. If you create a cycle of objects, you must
recognize this fact and take steps to ensure that the objects are
garbage collected when they are no longer
needed. To allow a cycle of objects
to be garbage collected, you must break the cycle. You can do this by
picking one of the objects in the cycle and setting the property of
it that refers to the next object to null. For
example, suppose that A, B, and C are objects that each have a
next property, and the
value of this property is set so that these objects refer to each
other and form a cycle. When these objects are no longer in use, you
can break the cycle by setting A.next to
null. This means that object B no longer has a
reference from A, so its reference count can drop to zero and it can
be garbage collected. Once it has been garbage collected, it will no
longer refer to C, so C's reference count can drop to zero and
it can be garbage collected. Once C is garbage collected, A can
finally be garbage collected.
Note, of course, that none of this
can happen if A, B, and C are stored in global variables in a window that is still
open, because those variables A, B, and C still refer to the objects.
If these were local variables in a function and you
broke their cycle before the function returned, they could be garbage
collected. But if they are stored in global variables, they remain
referenced until the window that contains them closes. In this case,
if you want to force them to be garbage collected, you must break the
cycle and set all the variables to null:
A.next = null; // Break the cycle
A = B = C = null; // Remove the last remaining external references