Linux Server Security (2nd Edition( [Electronic resources] نسخه متنی

10.3. Web Content

After you've thoroughly configured
Apache's configuration, you can finally deal with
web content.

10.3.1. Static Content

Static content includes HTML, JavaScript,
Flash, images, and other files that are served directly by the web
server without interpretation. The files and their directories need
to be readable by the user ID running Apache
(apache, in our examples).

Static files don't pose much of a security threat on
the server side. The web server just reads them and sends them to the
requesting browser. Although there are many security issues with web
browsers, client security is outside the scope of this chapter. Watch
your browser vendor's web site for security news,
patches, and new versions.

10.3.2. Dynamic Content: Server-Side Includes (SSI)

A
step up from purely static pages, server-side
includes allow inclusion of other static content, special
dynamic content such as file-modification times, and even the output
from the execution of external programs. Unlike CGI scripts, there is
no way to pass input arguments to an SSI page.

10.3.2.1 SSI configuration

Apache needs to be told that an SSI file is not a lump of inert HTML,
but should be parsed for SSI directives. First, check that includes
are permitted for at least some files in this directory. Add this to
httpd.conf or access.conf:

<Location /ssi_dir>
Options IncludesNoExec
</Location>

One way to differentiate HTML from SSI files is to use a special
suffix such as .shtml and associate it with
Apache's built-in MIME type for parsable content:

AddType application/x-server-parsed .shtml or just assign the Apache handler directly:

AddHandler server-parsed .shtml Using this tells the world that your pages use server-side includes.
If you'd like to conceal this fact, use another
suffix. One trick I've seen is to use
l for static text and
for SSI text:

AddHandler server-parsed A little-known feature of Apache is its ability to use the execute
bit of a file to indicate that it should be parsed.
I've used this to mix static and parsed HTML files
in the same directory with the same suffix. The directive is as
follows:

<Location /ssi_dir>
Options +IncludesNoExec
XBitHack full
</Location>

The extra attribute full tells Apache to check the
modification time of the included file rather than the including
file. To change an HTML file into an SSI file, make it executable:

chmod +x changelingl A visitor to the web site can't tell if the file is
plain HTML or SSI.

10.3.2.2 Including files

The most basic use of SSI is for inclusion of static files. For
example, a site can include a standard header and footer on each
page:

. . . variable content goes here . . .

You can also include the output of a local CGI script by giving its
relative URL:

10.3.2.3 Executing commands

If Options Includes is set, you can also execute
any external command on the web server, which is
quite dangerous. The following is a benign example:

SSI can't get arguments from the client, so any
command and arguments are fixed. Since you specify the commands, you
might feel safe. However, anyone with write access to
/ssi_dir could upload an HTML file containing an
SSI #exec string:

If you allow people to upload HTML (say, in a guestbook application),
you should forbid SSI execution in the target directory and untaint
the input (see the Section 10.4.1
section).

Similar vulnerabilities have been seen in utilities that create HTML,
such as email digesters and web-log analyzers. If you must have SSI
but don't need executable external commands, always
exclude them:

<Location /ssi_dir>
Options IncludesNoExec
</Location>

Options Includes permits all SSI, including
executable commands, so use Options IncludesNoExec.

10.3.3. Dynamic Content: Common Gateway Interface (CGI)

The CGI is a protocol for
sending queries and data via HTTP to a program on the web server. A
CGI program can be written in any language, interpreted or compiled.
Surprisingly, there is still no final RFC that defines CGI. CGI 1.1
is described at http://hoohoo.ncsa.uiuc.edu/cgi/interfacel.
Also, see The CGI Programming MetaFAQ
(http://www.perl.org/CGI_MetaFAQl).

PHP, JSP, mod_perl, and other active web technologies all use the CGI
standard for web client-server communication.

10.3.3.1 Standalone and built-in CGI interpreters

The CGI protocol doesn't specify how the web server
should communicate with the CGI program. There have been two main
solutions:

Standalone CGI programs

Apache receives a CGI request, opens a two-way pipe to an external
program, sends it the CGI input data, and returns the
program's output to the client. As a separate
process, the program can crash without bringing down the web server.
The downside is that it's relatively slow to start a
new process.

Built-in CGI programs

The program is rewritten as an Apache module and incurs its startup
cost only when an Apache process starts. This is
much faster than an external program and has
access to Apache's internals and other modules. The
most popular modules for CGI in Apache are the interpreter engines
for Perl (mod_perl) and PHP
(mod_php).

Whether run in-process (built-in) or independently, CGI programs
represent a large security risk. We'll cover a
number of them, starting with the problem of securing CGI programs
for different users.

Normally, CGI programs will all be run with Apache's
user ID and group. If you have multiple users and virtual hosts, this
lets them run each other's scripts and access each
other's data. A web-hosting service might want to
let its customers run their own CGI scripts but no one
else's. Another site might restrict database access
to certain users, requiring scripts to be run as those users. The
most common solutions are suEXEC and
cgiwrap.

10.3.3.2 suEXEC

suEXEC is a setuid
root program that wraps scripts to run with a
specified user ID and group ID, rather than the Apache server user
and group. Scripts need to pass a number of security guidelines
before they will be accepted. To use suEXEC, define a
VirtualHost section of an Apache configuration
file. For Apache 1.3, specify the desired CGI User
and Group:

<VirtualHost www.hackenbush.com>
User hugo
Group whyaduck
</VirtualHost>

Specify SuExecGroup for Apache 2.0:

<VirtualHost www.hackenbush.com>
SuExecUserGroup hugo whyaduck
</VirtualHost>

CGI scripts should be placed in directories for this virtual host
that permit script execution (by default,
~/public_html/cgi-bin), and they should be owned
by user hugo, group
whyaduck. For details, see http://httpd.apache.org/docs/suexecl.

10.3.3.3 Cgiwrap

Cgiwrap is also a
setuid root program that wraps CGI programs, but works quite
differently from suEXEC. Its installation and use are a bit complex,
described at http://cgiwrap.sourceforge.net/.

10.3.3.4 FastCGI

suEXEC
and Cgiwrap are used with external CGI programs. FastCGI is an
alternative for creating CGI programs without the startup time of a
standalone program, but also without the complexity of an Apache
module. The protocol is language-independent, and libraries are
available for the most common web languages. Details are available at
http://www.fastcgi.com.

FastCGI falls somewhere between standalone and module-based CGI. It
starts an external CGI program but maintains a persistent connection
through the Apache module mod_fastcgi.

Scripts need slight modification to work with FastCGI. You must have
set Options ExecCGI in
httpd.conf to enable a FastCGI application, just
as you would any other CGI program. If you want to allow use of
suEXEC with FastCGI, set FastCGIWrapper
On. FastCGI scripts are
vulnerable to the same problems as any CGI scripts.

10.3.3.5 Specifying CGI programs

There are a couple of ways to tell Apache to treat a file as a CGI
script rather than a static file.

Treat every file within a directory as a CGI script:

ScriptAlias /cgi-bin /usr/local/apache/cgi-bin

The directory for ScriptAlias must be outside the
DocumentRoot hierarchy. Otherwise, anyone can
access its contents as normal files and download or view their
contents. With write permission in the directory, they could also
upload CGI scripts.

Allow some files in a directory to be CGI scripts:

<Directory /usr/local/apache/mixed>
Options ExecCGI
</Directory>

Mixing static files and scripts is dangerous, since a configuration
typo could cause Apache to treat a script file as a normal file and
allow users to view its contents. This could reveal passwords or
other sensitive information. If you do mix files and scripts, you
need to tell Apache which files are CGI scripts and which are static
files. Use a file suffix or some other naming convention to mark the
script. We'll see how to protect files shortly.

Don't put a script interpreter program in a CGI
directory. For instance, don't put the binary for
Perl or a standalone PHP in
/usr/local/apache/cgi-bin. This lets anyone run
them without restrictions. CGI scripts should be as simple and
focused as possible.

Expect trouble if users can upload files to a directory and execute
them as CGI scripts. Consider using suEXEC (described earlier in this
chapter) or limiting CGI scripts to directories where you can see
them.

10.3.3.6 HTTP, URLs, and CGI

Just as a little SMTP knowledge aids understanding of email-security
issues, a little background on HTTP and URLs improves knowledge of
web security.

Every exchange between a web client and server is defined by the
Hypertext Transfer Protocol (HTTP). HTTP 1.0 was the first widely
used version, but it had some shortcomings. Most of these were
addressed with HTTP 1.1, the current version that is almost
universal. HTTP 1.1 is defined in RFC 2616 (http://www.w3.org/Protocols/rfc2616/rfc2616l).
The web client makes HTTP requests, and the web server responds. Web
browsers hide much of the data exchange, such as MIME types, cache
settings, content negotiation, timestamps, and other details. Other
clients (such as a web spider, wget, or
curl) offer much more control over the exchange.

An HTTP request contains an initial request line:

Method URI HTTP-Version Methods include OPTIONS, GET, HEAD, POST, PUT, TRACE, DELETE, and
CONNECT. Some methods have a corresponding URL format.

This line may be followed by request header
lines containing information about the client, the host,
authorization, and other things. These lines are followed by a blank
line, then the message body. The web server returns a header and an
optional body, depending on the request.

The URL types you use have security implications. Since the protocol
is text, it's easy to forge headers and bodies
(although attackers have also successfully forged binary data for
years). You can't trust what you're
being told, whether you're a web server or a client.
See section 15 of RFC 2616 for other warnings.

The following are the most common methods and some security
implications.

10.2.2.2.8 HEAD method

Do you want to know what web server
someone is running? It's easy.
Let's look at the HEAD data for the home page at
http://www.apache.org:

$ telnet www.apache.org 80
Trying 63.251.56.142...
Connected to daedalus.apache.org (63.251.56.142).
Escape character is '^]'.
HEAD / HTTP/1.1
Host: www.apache.org
HTTP/1.1 200 OK
Date: Sat, 13 Apr 2002 03:48:58 GMT
Server: Apache/2.0.35 (Unix)
Cache-Control: max-age=86400
Expires: Sun, 14 Apr 2002 03:48:58 GMT
Accept-Ranges: bytes
Content-Length: 7790
Content-Type: text/html
Connection closed by foreign host.
$ (A handy alternative to this manual approach is the
curl
client, available from http://www.haxx.se.) The actual responses
vary by web server and site. Some don't return a
Server: response header, or say
they're something else, to protect against attacks
aided by port 80 fingerprinting. The default
value returned by Apache includes the identity of many modules. To
return only a Server: Apache response, specify:

ServerTokens ProductOnly

10.2.2.2.9 OPTIONS method

If OPTIONS is supported, it tells us more
about the web server:

$ telnet www.apache.org 80
Trying 63.251.56.142...
Connected to daedalus.apache.org (63.251.56.142).
Escape character is '^]'.
OPTIONS * HTTP/1.1
Host: www.apache.org
HTTP/1.1 200 OK
Date: Sat, 13 Apr 2002 03:57:10 GMT
Server: Apache/2.0.35 (Unix)
Cache-Control: max-age=86400
Expires: Sun, 14 Apr 2002 03:57:10 GMT
Allow: GET,HEAD,POST,OPTIONS,TRACE
Content-Length: 0
Content-Type: text/plain
Connection closed by foreign host.
$ The OPTIONS method is not a security concern, but you might like to
try it on your own servers to see what it returns.

10.2.2.2.10 GET method

GET is the standard method for retrieving
data from a web server. A URL for the GET method may be simple, like
this call for a home page:

http://www.hackenbush.com/ A GET URL may be extended with a ? and
name=value arguments. Each instance of
name and value is URL encoded, and pairs are
separated by an &:

http://www.hackenbush.com/cgi-bin/groucho.pl?day=jan%2006&user=zeppo An HTTP GET request contains a header but no body. Apache handles the
request directly, assigning everything after the ?
to the QUERY_STRING environment variable. Since
all the information is in the URL itself, a GET URL can be bookmarked
or repeated from the browser, without resubmitting a form. It can
also be generated easily by client-side or server-side scripting
languages.

Although you may see some very long and complex GET URLs, web servers
may have size limits that silently snip your URL. Apache guards
against GET buffer overflow attacks, but some other web servers and
web cache servers may not.

Since all the parameters are in the URL, they also appear in the
web-server logs. If there is any sensitive data in the form, a POST
URL should be used.

The ? and /cgi-bin advertise
that this URL calls a CGI script called
groucho.pl. You may want the benefits of a GET
URL without letting everyone know that this is a CGI script. If an
attacker knows you're using Perl scripts on Apache,
for instance, he can target his attack more effectively. Another
reason to hide the invocation of a script involves making the URL
more search-engine friendly. Many web search engines skip URLs that
look like CGI scripts. One technique uses the
PATH_INFO environment variable and Apache
rewriting rules. You can define a CGI directory with a name that
looks like a regular directory:

ScriptAlias /fakedir/ "/usr/local/apache/real_cgi_bin/" Within this directory, you could have a CGI script called
whyaduck. When this URL is received:

http://www.hackenbush.com/fakedir/whyaduck/day/jan%2006/user/zeppo Apache will execute the CGI script
/usr/local/real-cgi-bin/whyaduck and pass it the
environment variable PATH_INFO with the value
/day/jan 06/user/zeppo. Your script can parse the
components with any method you like (use split in
Perl or explode in PHP to split on the slashes).

Since GET requests are part of the URL, they may be immortalized in
server logs, bookmarks, and referrals. This may expose confidential
information. If this is an issue, use POST rather than GET. If you
don't specify the method
attribute for a <form> tag in HTML, it uses
GET.

10.2.2.2.11 POST method

POST is used to send data to a CGI program
on the web server. A URL for the POST method appears bare, with no
? or encoded arguments. Data are sent in the HTTP
body to Apache, then from Apache to the standard input of the CGI
program.

A user must resubmit her original form and data to refresh the output
page, because the recipient has no way of knowing if the data may
have changed. (With a GET URL, everything's in the
URL.) The POST data size is not as limited as with GET. Normally POST
data is not logged, although you can configure Apache to do so. A
POST URL cannot be bookmarked, and it cannot be automatically
submitted from a browser without using client-side JavaScript (other
clients such as wget and
curl can submit POST requests). You need to have
a button or other link with a JavaScript URL that submits a form that
is somewhere on your page.

10.2.2.2.12 PUT method

This was the original HTTP upload mechanism. Specify a CGI script to
handle a PUT request, as you would for a POST
request. PUT seems to have been superseded by WebDAV and other
methods, which are described in Section 10.4.4.

10.2.2.2.13 TRACE method

The TRACE method was intended as a debugging
tool, but almost no one has heard of it or used it. It was a matter
of time until someone found an exploit (http://www.kb.cert.org/vuls/id/867593) and
recommended disabling TRACE processing in Apache. The environment
required for the exploit to work is so specific that this
doesn't appear to be necessary.

10.3.3.7 CGI languages

Any
language can be a CGI language just by following the CGI
specification. An HTTP response requires at least an initial MIME
type line, a blank, and then content. Here's a
minimal CGI script written in the shell:

#!/bin/sh
echo "Content-type: text/html"
echo
echo "Hello, world" Technically, we should terminate the first two echo lines with a
carriage-return-line feed pair ('\r\n\r\n'), but
browsers know what to do with bare Unix-style line feeds.

Although a C program might run faster than a shell or Perl
equivalent, CGI startup time tends to outweigh that advantage. I feel
that the best balance of flexibility, performance, and programmer
productivity lies with interpreted languages running as Apache
modules. The top languages in that niche are PHP and Perl.

In the following section on web applications, I'll
discuss the security trouble spots to watch, with examples from Perl
and PHP. But first, a few words about the PHP and Perl languages may
be helpful.

10.2.2.2.14 PHP

PHP
is a popular web-scripting language for Unix and Windows.
It's roughly similar to, and competes with, Visual
Basic and ASP on Windows. On Unix and Linux, it competes with Perl
and Java. Its syntax is simpler than Perl's, and its
interpreter is small and fast.

Versions of PHP before 4.1.2 had serious vulnerabilities in the
file-uploading code. These could allow an attacker to execute
arbitrary code on the web server if any PHP
script could be run, even if it did not perform file uploads. If your
version is older, get a patch from http://www.php.net.

PHP code is embedded in HTML and distinguished by any of these start
and end tags:

<?php ... ?>
<? ... ?>
<% ... %>

PHP files can contain any mixture of normal HTML and PHP, like this
(echo prints its arguments):

<? echo "string = $string\n"; ?>

or more compactly mixing HTML and PHP (=$string is
PHP shorthand for echo
$string):

string = <?=$string?>

PHP configuration options can be specified in three ways:

The php.ini file, normally in the
/usr/local/lib directory.
Here's an example that disables PHP error displays:

display_errors = off The Apache configuration files, in the styles shown in Table 10-6.

Table 10-6. PHP Apache configuration
Directive	Type of value
php_value `name value`	Any
php_flag `name` on\|off	Boolean
php_admin_value `name value`	Any
php_admin_flag `name` on\|off	Boolean

The following is an example that disables PHP's HTML
error display:

php_admin_flag display_errors off These can be placed within container directives to customize PHP
settings for different directories or virtual hosts.
php_value and php_flag may also
be used in .htaccess files.

Some directives (see http://www.php.net/manual/en/function.ini-set)
can be set in the PHP script at runtime:

ini_set("display_errors", "0");

10.2.2.2.15 Perl

Perl
is the mother of all web-scripting languages. The most popular module
for CGI processing, CGI.pm, is part of the
standard Perl release.

Here's a quick Perl script to get the value of a
form variable (or handcrafted GET URL) called
string:

#!/usr/bin/perl -w
use strict;
use CGI qw(:standard);
my $string = param("string");
echo header;
echo "string = $string\n";

A Perl CGI script normally contains a mixture of HTML print
statements and Perl processing statements.