6.2 Railroad Diagrams
In the film Planes, Trains, and Automobiles,
characters played by Steve Martin and John Candy are faced with one
improbable obstacle after another as they struggle to arrive home in
time for the Thanksgiving holiday. Having compared this chapter with
a sea voyage, it's reminiscent of that film to
consider yet another mode of transportation, railways, as a means of
understanding the SELinux policy language.However, unlike many of the decisions of the film characters, my
decision to introduce railroad diagrams is not capricious. Such
diagrams were used in the 1970s by famous computer scientist Niklaus
Wirth to
develop and explain Pascal, one of the most successful programming
languages. Since then, they've been used to explain
many other programming languages. Although they can be cumbersome to
create, they're quick to learn as well as easy to
read and understand, so they're just about ideal as
a means of explanation. Let's further mix our
metaphors by diving into an exposition of railroad diagrams.
6.2.1 What Railroad Diagrams Do
Railroad diagrams are also known as syntax diagrams or
syntax charts. They present the grammar of a
formal language, such as one used for programming. However, formal
languages also underlie the files used to configure systems and
applications, such as the files that specify the SELinux security
policy, so these diagrams are well suited to our immediate purpose.Railroad diagrams specify two kinds of symbols:Literal
A literal is a symbol that consists of
one or more specific characters. Literals are generally punctuation
marks, operators, or keywords of some sort.
Replaceable text
Replaceable text consists of text
that has variable content.
These definitions will become clearer in the context of several small
examples given in the following section.
6.2.2 How Railroad Diagrams Work
Figure 6-1 shows a railroad diagram that defines a
literal representing the letter
"a."
Figure 6-1. The letter "a"

The line guides you in reading the
railroad diagram. If you're disappointed that the
line doesn't more closely resemble a railroad track,
I apologize. But, it's customary to draw the line in
the simple fashion shown in the figure. You read the railroad diagram
by following the line from left to right.
Oval
The oval represents a literalthat
is, a specific character, namely the letter
"a."
Literal symbols (text that appears in the file exactly as shown in
the diagram) appear in lightly shaded boxes, while replaceable
symbols (which should be replaced with appropriate values by the
administrator) appear in darkly shaded boxes.One way to use a railroad diagram is as a means of parsing sentences
following the grammar represented by the diagram. To do so, follow
the line from left to right and attempt to match each symbol you
encounter with a corresponding token in the sentence. If you can do
so, the sentence is grammatical; otherwise it's not.The railroad diagram for the letter
"a" is trivially simple, and
therefore it is not much fun or valuable for practice. So,
let's consider Figure 6-2, which
presents a somewhat more sophisticated sentence, one that has the
same form as an SELinux security policy attribute declaration. This
sentence consists of three components that must appear in the
indicated order:A literal representing the keyword attribute
Replaceable text known as an id
A literal representing a semicolon
Figure 6-2. Attribute declaration

doesn't definethe replaceable text
id
, which would be defined by another
diagram. I'll present a diagram defining
id
shortly. For now,
let's simply understand
id
as representing an identifier of the
sort used in many programming languages, consisting of a letter
followed by zero or more letters or digits.Given our understanding of the replaceable text
id
, our railroad diagram tells us that
sentences such as the following are grammatical attribute
declarations:
attribute x;Similarly, our railroad diagram tells us that sentences such as the
attribute xyz;
attribute xyz123;
following are not grammatical attribute declarations:
attribute x # lacks final semicolonLet's now consider a somewhat more complex railroad
attrib x; # abbreviates required literal "attribute"
attribute 123; # contains integer rather than id
diagram, shown in Figure 6-3, which represents a
digit. Notice how multiple tracks branch off the main line, so that
you can completely traverse the railroad diagram by matching any of
the ten literals representing a digit.
Figure 6-3. Digit

realize that the syntax represented by Figure 6-3
could easily be represented by the regular expression:
[0123456789]or, more concisely:
[0-9]Let's now consider a set of three railroad diagrams
that, together with Figure 6-3, define the
composition of a signed integer. Figure 6-4 tells
us that a signed integer consists of a sign, consisting of the
literal +
or -
, followed by
replaceable text named Unsigned_Integer
, which
obviously represents an unsigned integer.
Figure 6-4. Signed_Integer

Unsigned_integer
in terms of
Digit
, the replaceable text defined in Figure 6-3. In Figure 6-5, notice the
track that leads from the right side of the second instance of
Digit
to the left side of the same instance. This
track makes it possible to include multiple occurrences of the second
instance of Digit
. The railroad diagram tells us
that an unsigned integer consists of a digit, followed by zero or
more digits. Put more plainly, it tells us that an unsigned integer
consists of one or more digits.
Figure 6-5. Unsigned_Integer

regular expression:
[+-]\d\d*or the equivalent:
[+-]\d+Figure 6-6 puts together the definitions of
Signed_Integer
and
Unsigned_Integer
, telling us that an
Integer
consists of either a
Signed_Integer
or an
Unsigned_Integer
.
Figure 6-6. Integer

understand the railroad diagrams we'll use to
describe the SELinux security policy language. But,
let's consider a couple additional railroad diagrams
for good measure. Figure 6-7 shows another way of
defining an Integer
, one that consists of only a
single railroad diagram. I suggest that you study the figure and
convince yourself that the definition it offers is indeed equivalent
to the previous definition that required three distinct railroad
diagrams. Of course, like the three-diagram definition, the newer one
references the diagram defining a Digit
. So,
it's not fully self-contained. But, including a
definition of Digit
rather than a reference to
Digit
in the newer definition would hopelessly
clutter the definition.
Figure 6-7. Another way of specifying an Integer

given in Figure 6-8. This railroad diagram defines
the composition of an Identifier
, which consists
of a letter, followed by zero or more digits, letters, or
underscores, which can be freely intermingled. I trust that
it's evident that the railroad is much faster and
easier to read and understand than the equivalent English sentence.
Figure 6-8. Identifier

Letter
, which is used in the railroad diagram, is
defined neither by the diagram nor earlier in this chapter. I could
give a railroad diagram defining this replaceable text, but it would
be rather badly cluttered, since a letter can be any one of 52
literals: the lowercase and uppercase letters of the Roman alphabet.