10.4. Web ApplicationsThe Web Application Security Consortium has classified web threats and tried to standardize their descriptions (http://www.webappsec.org/threatl). The Open Web Application Secuity Project (OWASP) describes the top 10 vulnerabilities (http://www.owasp.org/documentation/toptenl) and how to secure web applications (http://www.owasp.org/documentation/guide/guide_aboutl). All are well worth reading. 10.4.1. Processing FormsThe top risk in the OWASP list is currently unvalidated input. This is most evident in the workhorse of web applications, form processing.In the previous section, I showed how to get and echo the value of the form element named string. I'll now show how to circumvent this simple code, and how to protect against the circumvention.Client-side form checking with JavaScript is a convenience for the user, and it avoids a round-trip to the server to load a new page with error messages. However, it does not protect you from a handcrafted form submission with bad data. Here's a simple form that lets the web user enter a text string: <form name="user_form" method="post" action="/cgi-bin/echo"> <input type="text" name="string"> <input type="submit" value="submit"> </form> When submitted, we want to echo the string. Let's look again at a naive stab at echo in PHP: <? echo "string = ", $_REQUEST["string"], "\n"; ?> And the same in Perl: #!/usr/bin/perl -w use strict; use CGI qw(:standard); print header; print "string = ", param("string"), "\n"; This looks just ducky. In fact, if you type quack into the string field, you see the output: string = quack But someone with an evil mind might enter this text into the string field: <script language=javascript>history.go(-1);</script> Submit this, and watch the JavaScript code bounce you right back to your input form. If this form did something more serious than echo its input (such as entering the contents of a literal tag into a database), the results could be more serious.
your knowledge and then getting it to download and execute on any browser. This cross-site scripting bug was fixed within JavaScript itself some time ago, but that doesn't help in this case, because JavaScript is being injected into the data of a server-side script. HTML tags that invoke active content are shown in Table 10-7. escape input data, removing any magic characters, quotes, callouts, or anything else that would treat the input as something other than plain text.An even better approach is to specify what you want, rather than escaping what you don't want. You can match the data against a regular expression specifying the legal input patterns. The complexity of the regular expression depends on the type of data and the desired level of validity checking. For example, you might want to ensure that a U.S. phone number field has exactly 10 digits, or that an email address follows RFC 822. 10.4.1.1 PHPTo avoid interpreting a text-form variable as JavaScript or HTML, escape the special characters with the PHP functions htmlspecialcharacters or htmlentities. Some helper functions are available at http://www.owasp.org/software/labs/phpfiltersl. As mentioned previously, it's even better to extract the desired characters from the input first via a regular-expression match. In the following section, there's an example of how Perl can be used to untaint input data.PHP has had another security issue with global data. When the PHP configuration variable register_globals is enabled, PHP creates an automatic global variable to match each variable in a submitted form. In the earlier example, a PHP variable named $string winks into existence to match the form variable string. This makes form processing incredibly easy. The problem is that anyone can craft a URL with such variables, forging a corresponding PHP variable. So any uninitialized variable in your PHP script could be assigned from the outside.The danger is not worth the convenience. Specify register_globals off in your php.ini file. Starting with PHP 4.2.0, this is the default setting. PHP Versions 4.1.1 and up also provide safer new autoglobal arrays. These are automatically global within PHP functions (in PHP, you need to say global var within a PHP function to access the normal global variable named var; this quirk always bites Perl developers). These arrays should be used instead of the older arrays $HTTP_GET_VARS and $HTTP_POST_VARS, and are listed in Table 10-8.
union of $_GET, $_POST, and $_COOKIE. This is handy when you don't care how the variable got to the server. 10.4.1.2 PerlPerl runs in taint mode in the following situations: Automatically, when the real and effective user ID and group ID differ Explicitly, when invoked with the -T flag This mode marks data originating outside the script as potentially unsafe and forces you to do something about it. To untaint a variable, run it through a regular expression, and grab it from one of the positional match variables ($1, $2, ...). Here's an example that gets a sequence of "word" characters (\w matches letters, digits, and _ ): #!/usr/bin/perl -wT use strict; use CGI qw(:standard); my $user = param("user"); if ($user =~ /^(\w+)$/) { $user = $1; } We'll see that taint mode applies to file I/O, program execution, and other areas where Perl is reaching out into the world. 10.4.2. Including FilesCGI scripts can include files inside or outside of the document hierarchy. Try to move sensitive information from your scripts to files located outside the document hierarchy. This is one layer of protection if your CGI script somehow loses its protective cloak and can be viewed as a simple file.Use a special suffix for sensitive include files (a common choice is .inc), and tell Apache not to serve files with that suffix. This will protect you when you accidentally put an include file somewhere in the document root. Add this to an Apache configuration file: <FilesMatch "\.inc$"> order allow,deny deny from all </Files> Also, watch out for text editors that may leave copies of edited scripts with suffixes like ~ or .bak. The crafty snoop could just ask your web server for files like program~ or program.bak. Your access and error logs will show if anyone has tried. To forbid serving them anywhere, add this to your Apache configuration file: <FilesMatch ~ "(~|\.bak)$"> order allow,deny deny from all </Files> When users are allowed to view or download files based on a submitted form variable, guard against attempts to access sensitive data, such as a password file. One exploit is to use relative paths (..): ../../../etc/passwd Cures for this depend on the language and are described in the following sections. 10.4.2.1 PHPExternal files can be included with the PHP include or include_once commands. These may contain functions for database access or other sensitive information. A mistake in your Apache configuration could expose PHP files within normal document directories as normal text files, and everyone could see your code. For this reason, I recommend the following: Include sensitive PHP scripts from a location outside of your document root. Edit php.ini to specify: include_path .:/usr/local/lib/php:/usr/local/my_php_lib Use the protected suffix for your included files: <? include_once "db_login.inc"; ?> Use the basename function to isolate the filename from the directory and open_basedir to restrict access to a certain directory. These will catch attempts to use ../ relative filenames.If you process forms where people request a file and get its contents, you need to watch the PHP file-opening command fopen and the file-reading commands fpassthru and readfile. fopen and readfile accept URLs as well as filenames; disable this with allow_url_fopen=false in php.ini. You may also limit PHP file operations to a specific directory with the open_basedir directive. This can be set within Apache container directives to limit virtual hosts to their backyards: <VirtualHost 192.168.102.103> ServerName a.test.com DocumentRoot /usr/local/apache/hosts/a.test.com php_admin_value open_basedir /usr/local/apache/hosts/a.test.com </VirtualHost> If safe_mode is enabled in php.ini or an Apache configuration file, a file must be owned by the owner of the PHP script to be processed. This is also useful for virtual hosts.Table 10-9 lists recommended safe settings for PHP.
might set up a directory for each virtual host under /usr/local/apache/host. You can specify multiple directories with a colon (:) separator. 10.4.2.2 PerlIn taint mode, Perl blocks use of the functions eval, require, open (except read-only mode), chdir, chroot, chmod, unlink, mkdir, rmdir, link, and symlink. You must untaint filenames before using any of these. As in the PHP example, watch for relative (../) names and other attempts to access files outside the intended area. 10.4.3. Executing ProgramsMost scripting languages let you run external programs. This is a golden opportunity for nasty tricks. Check the pathname of the external program and remove any metacharacters that would allow multiple commands. Avoid passing commands through a shell interpreter. 10.4.3.1 PHPEscape any possible attempts to slip in extra commands with this PHP function: $safer_input = escapeshellarg($input); system("some_command $safer_input"); or: system(escapeshellcmd("some_command $input")); These PHP functions invoke the shell and are vulnerable to misuse of shell metacharacters: system, passthru, exec, popen, preg_replace (with the /e option), and the backtick (`command`) operator.If safe_mode is set, only programs within safe_mode_exec_dir can be executed, and only files owned by the owner of the PHP script can be accessed.The PHP function eval($arg) executes its argument $arg as PHP code. There's no equivalent to safe_mode for this, although the disable_functions option lets you turn off selected functions. Don't execute any command with embedded user data. 10.4.3.2 PerlTaint mode will not let you pass unaltered user input to the functions system, exec, eval, or the backtick (`command`) operator. Untaint them before executing, as described earlier. 10.4.4. Uploading Files from FormsRFC 1867 documents form-based file uploadsa way of uploading files through HTML, HTTP, and a web server. It uses an HTML form, a special form-encoding method, and an INPUT tag of type FILE: <form method="post" enctype="multipart/form-data" action="/cgi-bin/process_form.php"> <input type="text" name="photo_name"> <input type="file" name="upload"> <input type="submit" value="submit"> </form> This is another golden opportunity for those with too much time and too little conscience to upload huge files and fill up the available space. A file upload is handled by a CGI file-upload script. There is no standard script, since so many things can be done with an uploaded file. 10.4.4.1 PHPUploaded files are saved as temporary files in the directory specified by the PHP directive upload_tmp_dir. The default value (/tmp) leaves them visible to anyone, so you may want to define upload_tmp_dir to some directory in a virtual host's file hierarchy. To access uploaded files, use the new autoglobal array $_files, which is itself an array. For the photo-uploading example, let's say you want to move an uploaded image to the photos directory of virtual host host: <? // $name is the original file name from the client $name = $_files['photo_file']['name']; // $type is PHP's guess of the MIME type $type = $_files['photo_file']['type']; // $size is the size of the uploaded file (in bytes) $size = $_files['photo_file']['size']; // $tmpn is the name of the temporary uploaded file on the server $tmpn = $_files['photo_file']['tmp_name']; // If the size and type look okay, move the temporary file // to its desired place. if (is_uploaded_file($tmpn)) move_uploaded_file($tmpn, "/usr/local/apache/host/photos"); You may check the file's type, name, and size before deciding what to do with it. The PHP option max_upload_filesize caps the size; if a larger file is uploaded, the value of $tmpn is none. When the PHP script finishes, any temporary uploaded files are deleted. 10.4.4.2 PerlThe CGI.pm module provides a file handle for each temporary file. #!/usr/bin/perl -wT use strict; use CGI qw(:standard); my $handle = param("photo_file"); my $tmp_file_name = tmpFileName($handle); my $size = $ENV{CONTENT_LENGTH}; # If the size looks okay, copy or rename the file # ... The temporary file goes away when the CGI script completes. 10.4.5. Accessing DatabasesAlthough relational databases have standardized on SQL as a query language, many of their APIs and interfaces, whether graphic or text based, have traditionally been proprietary. When the Web came along, it provided a standard GUI and API for static text and dynamic applications. The simplicity and broad applicability of the web model led to the quick spread of the Web as a database frontend. Although HTML does not offer the richness and performance of other graphical user interfaces, it's good enough for many applications.Databases often contain sensitive information, such as people's names, addresses, and financial data. How can a porous medium like the Web be made safer for database access? Here are some guidelines for Web-MySQL access (some are also discussed in Chapter 8): Don't have your database on the same machine as the web server. It's best if your database is behind a firewall that only passes queries from your web server. For example, MySQL normally uses port 3306, so you might only permit access from ports on the web server to port 3306 on the database server. Check that all default database passwords have been changed. For MySQL, ensure that the default user (called root, but not related to the Unix root user) has a password. You have a problem if you can get into the database without a password by typing: mysql -u root Use the SQL GRANT and REVOKE statements to make sure access to tables and other resources is allowed only for the desired MySQL IDs on the desired servers. An example might follow this pattern: GRANT SELECT ON sample_table TO "sample_user@sample_machine" IDENTIFIED BY "sample password" Do not allow access to the MySQL users table by anyone other than the MySQL root user, since it contains the permissions and encrypted passwords. Don't use form-variable values or names in SQL statements. If the form variable user maps directly to a user column or table, someone will deduce the pattern and experiment. Check user input before using it in SQL statements. This is similar to checking user input before executing a shell command. Such exploits have been called SQL injection. See Chapter 8 for more details. Any time information is exchanged, someone will be tempted to change it, block it, or steal it. We'll quickly review these issues in PHP and Perl database CGI scripts: Which database APIs to use Protecting database account names and passwords Defending against SQL injection 10.4.5.1 PHPPHP has many specific and generic database APIs. There is not yet a clear leader to match Perl's database-independent (DBI) module.A PHP fragment to access a MySQL database might begin like this: <? $link = mysql_connect("db.test.com", "dbuser", "dbpassword"); if (!$link) echo "Error: could not connect to database\n"; ?> If this fragment is within every script that accesses the database, every instance will need to be changed if the database server, user, or password changes. More importantly, a small error in Apache's configuration could allow anyone to see the raw PHP file, which includes seeing these connection parameters. It's easier to write a tiny PHP library function to make the connection, put it in a file outside the document root, and include it where needed.Here's the include file: // my_connect.inc // PHP database connection function. // Put this file outside the document root! // Makes connection to database. // Returns link id if successful, false if not. function my_connect( ) { $database = "db.test.com"; $user = "db_user"; $password = "db_password"; $link = mysql_connect($database, $user, $password); return $link; } And this is a sample client: // client.php // PHP client example. // Include path is specified in include_path in php.ini. // You can also specify a full pathname. include_once "my_connect.inc"; $link = my_connect( ); // Do error checking in client or library function if (!$link) echo "Error: could not connect to database\n"; // ... Now that the account name and password are better protected, you need to guard against malicious SQL code. This is similar to protecting against user input passing directly to a system command, for much the same reasons. Even if the input string is harmless, you still need to escape special characters.The PHP addslashes function puts a backslash (\) before these special SQL characters: single quote ('), double quote ("), backslash (\), and NUL (ASCII 0). This will be called automatically by PHP if the option magic_quotes_gpc is on. Depending on your database, this may not quote all the characters correctly.SQL injection is an attempt to use your database server to get access to otherwise protected data (read, update, or delete) or to get to the operating system. For an example of the first case, say you have a login form with user and password fields. A PHP script would get these form values (from $_GET, $_POST, or $_REQUEST, if it's being good), and then build a SQL string and make its query like this: $sql = "SELECT * FROM users WHERE\n" . "user = '$user' AND\n". "password = '$password'"; $result = mysql_query($sql); if ($result && $row = mysql_fetch_array($result) && $row[0] == 1) return true; else return false; An exploiter could enter these into the input fields (see Table 10-10).
user = '' OR '' = '' AND password = '' OR '' = '' The door is now open. To guard against this, use the techniques I've described for accessing other external resources, such as files or programs: escape metacharacters and perform regular-expression searches for valid matches. In this example, a valid user and password might be a sequence of letters and numbers. Extract user and password from the original strings and see if they're legal.In this example, if the PHP option magic_quotes_gpc were enabled, this exploit would not work, because all quote characters would be preceded by a backslash. But other SQL tricks can be done without quotes.A poorly written script may run very slowly or even loop forever, tying up an Apache instance and a database connection. PHP's set_time_limit function limits the number of seconds that a PHP script may execute. It does not count time outside the script, such as a database query, command execution, or file I/O. It also does not give you more time than Apache's Timeout variable. 10.4.5.2 PerlPerl has the trusty database-independent module DBI and its faithful sidekicks, the database-dependent (DBD) family. There are DBD modules for many popular databases, both open source (MySQL, PostgreSQL) and commercial (Oracle, Informix, Sybase, and others).A MySQL connection function might resemble this: # my_connect.pl sub my_connect { my $server = "db.test.com"; my $db = "db_name"; my $user = "db_user"; my $password = "db_password"; my $dbh = DBI->connect( "DBI:mysql:$db:$server", $user $password, { PrintError => 1, RaiseError => 1 }) or die "Could not connect to database $db.\n"; return $dbh; } 1; As in the PHP examples, you'd rather not have this function everywhere. Perl has, characteristically, more than one way to do it. Here is a simple way: require "/usr/local/myperllib/my_connect.pl"; Keep the my_connect.pl script outside Apache's DocumentRoot directory to prevent its contents from being viewed. If your connection logic is more complex, it could be written as a Perl package or a module.Taint mode won't protect you from entering tainted data into database queries. You'll need to check the data yourself. Perl's outstanding regular-expression support lets you specify patterns that input data must match before going into a SQL statement. 10.4.6. AuthenticationYour web site may have some restricted content, such as premium pages for registered customers or administrative functions for web site maintainers. Use authentication to establish the identity of the visitor. Broken authentication and session management is number three in the OWASP top 10. 10.4.6.1 Basic authenticationThe simplest authentication method in Apache is basic authentication. This requires a password file on the web server and a require directive in a config file: <Location /auth_demo_dir> AuthName "My Authorization" AuthType Basic # Note: Keep the password files in their own directory AuthUserFile /usr/local/apache/auth_dir/auth_demo_password Order deny, allow Require valid-user </Location> I suggest storing password files in their own directories, outside the document root. You may use subdirectories to segregate files by user or virtual host. This is more manageable than .htaccess files all over the site, and it keeps Apache running faster.You can specify any matching user, a list of users, or a list of groups: require valid-user require user user1 user2 ... require group group1 group2 ... Where are the names and passwords stored? The simplest solution, specified by AuthUserFile in the example, is a flat text file on the server. To create the password file with an initial user named raoul, type the following: htpasswd -c /usr/local/apache/auth_dir/auth_demo_password raoul To add raoul to an existing password file: htpasswd /usr/local/apache/auth_dir/auth_demo_password -u raoul ... (prompt for password for raoul) ... When a visitor attempts to access /auth_demo_dir on this site, a dialog box pops up and prompts him for his name and password. These will be sent with the HTTP stream to the web server. Apache will read the password file /etc/httpd/auth/image/library/english/10020_auth_demo_password, get the encrypted password for the user raoul, and see if they match.
implementation (file, DBM, DB, MySQL, LDAP) by matching Apache modules and configuration directives. For example, mod_auth_mysql is configured with the table and column names in a customer table in a MySQL database. After the name and password are sent to Apache from the browser, mod_auth_mysql queries the database, and Apache allows access if the query succeeds and the username and password were found.Browsers typically cache this authentication information and send it to the web server as part of each HTTP request header for the same realm (a string specified to identify this resource). What if the user changes her password during her session? Or what if the server wants to log the client off after some period of inactivity? In either case, the cached credentials could become invalid, but the browser still holds them tight. Further attempts by the user to reach a web page in the realm will fail. Unfortunately, HTTP has no way for a server to expire credentials in the client. It may be necessary to clear all browser caches (memory and disk) to clear the authentication data, forcing the server to request reauthentication and causing the client to open a new dialog box. Basic authentication is not encrypted, and credentials are sent to the server with every request. A sniffer can and will pick up the name and password. Use SSL (URLs starting with https://) for privacy. Although the initial SSL handshake is slow, the following content encryption is not so bad.Direct authentication with a scripting language gives more flexibility than the built-in browser dialog box. The script writes an HTML form to the client, and it processes the reply as though it came from the standard dialog box. 10.4.6.2 Digest authenticationThe second HTTP client authentication method, digest authentication, is more secure, because it uses an MD5 hash of data rather than cleartext passwords. RFC 2617 documents basic and digest authentication. The Apache server and Mozilla implement the standard correctly in the module mod_digest. Microsoft did not, so digest authentication in IE 5 and IIS 5 does not currently interoperate with other web servers and browsers. Another implementation has been written by a security group at Microsoft, so in the future, this may be resolved. For now, SSL is the only safe way to communicate authentication data. 10.4.6.3 Safer authenticationIt's surprisingly tricky to create secure client authentication. User input can be forged, HTTP referrals are unreliable, and even the client's apparent IP address can change from one access to the next if the user is behind a proxy farm. It would be beneficial to have a method that's usable within and across sites. For cross-site authentication, the authenticating server must convey its approval or disapproval in a way that can't be easily forged and that will work even if the servers aren't homogeneous and local.A simple adaptation of these ideas follows. It uses a public variable with unique values to prevent a replay attack. A timestamp is useful because it can also be used to expire old logins. This value is combined with a constant string that is known only by the cooperating web servers to produce another string. That string is run through a one-way hash function. The timestamp and hashed string are sent from the authenticating web server (A) to the target web server (B).Let's walk through the process. First, the client form gets the username and password and submits them to Server A over a secure SSL connection: # Client form <form method="get" action="https://a.test.com/auth.php"> User: <input type="text" name="user"> Password: <input type="password" name="password"> <input type="submit"> </form> On Server A, a PHP script gets the timestamp, combines it with the secret string, hashes the result, and redirects to Server B: <? // a.test.com/auth.php $time_arg = Date( ); $secret_string = "babaloo"; $hash_arg = md5($time_arg . $secret_string); $url = "http://b.test.com/login.php" . "?" . "t=" . urlencode($time_arg) . "&h=" . urlencode($hash_arg); header("Location: $url"); ?> On Server B, a script confirms the input from Server A: <? // b.test.com/login.php // Get the CGI variables: $time_arg = $_GET['t']; $hash_arg = $_GET['h']; // Servers A and B both know the secret string, // the variable(s) it is combined with, and their // order: $secret_string = "babaloo"; $hash_calc = md5($time_arg . $secret_string); if ($hash_calc == $hash_arg) { // Check $time_arg against the current time. // If it's too old, this input may have come from a // bookmarked URL, or may be a replay attack; reject it. // If it's recent and the strings match, proceed with the login... } else { // Otherwise, reject with some error message. } ?> This is a better-than-nothing method, simplified beyond recognition from the following sources, which should be consulted for greater detail and security: Example 16-2 in Web Security, Privacy, and Commerce (O'Reilly). Dos and Donts of Client Authentication on the Web (http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-818.pdf) describes how a team at MIT cracked the authentication schemes of a number of commercial sites, including the Wall Street Journal. Visit http://cookies.lcs.mit.edu/ for links to the Perl source code of their Kooky Authentication Scheme. 10.4.7. Access Control and AuthorizationOnce authenticated, what is the visitor allowed to do? This is the authorization or access control step. You can control access by a hostname or address, by the value of an environment variable, or by a person's ID and password. Broken access control is the second highest vulnerability in the OWASP top 10 list. 10.4.7.1 Host-based access controlThis grants or blocks access based on a hostname or IP address. Here is a sample directive to prevent everyone at evil.com from viewing your site: <Location /> order deny,allow deny from .evil.com allow from all </Location> The period before evil.com is necessary. If I said: deny from evil.com I would also be excluding anything that ends with evil.com, such as devil.com or www.bollweevil.com.You may also specify addresses:
10.4.7.2 Environment-variable access controlThis is a very flexible solution to some tricky problems. Apache's configuration file can set new environment variables based on patterns in the information it receives in HTTP headers. For example, here's how to serve images from /image_dir on http://www.hackenbush.com, but keep people from linking to the images from their own sites or stealing them: SetEnvIf Referer "^www.hackenbush.com" local <Location /image_dir> order deny,allow deny from all allow from env=local </Location> SetEnvIf defines the environment variable local if the referring page was from the same site. 10.4.7.3 User-based access controlIf you allow any .htaccess files in your Apache configuration, Apache must check for a possible .htaccess file in every directory leading to every file that it serves, on every access. This is slow: look at a running httpd process sometime with strace httpd to see the statistics from all these look-ups. Also, .htaccess files can be anywhere, modified by anyone, and very easy to overlook. You can get surprising interactions between your directives and those in these far-flung files. So let's consider them a hazard. We can still selectively and carefully allow them.Try to put your access-control directives directly in your Apache configuration file (httpd.conf or access.conf). Disallow overrides for your whole site with the following: <Location /> AllowOverride None </Location> Any exceptions must be made in httpd.conf or access.conf, including granting the ability to use .htaccess files (only httpd.conf for Apache 2). You might do this if you serve many independent virtual hosts and want to let them specify their own access control and CGI scripts. But be aware that you're increasing your server's surface area. 10.4.7.4 Combined access controlApache's configuration mechanism is surprisingly flexible, allowing you to handle some tricky requirements. For instance, to allow anyone from good.com as well as a registered user: <Location /> order deny,allow deny from all # Here's the required domain: allow from .good.com # Any user in the password file: require valid-user # This does an "or" instead of an "and": satisfy any </Location> If you leave out satisfy any, the meaning changes from or to and, a much more restrictive setting. 10.4.8. SSLSSL encrypts data between a web browser and web server. It's used throughtout the Web to protect login names, passwords, personal information, and, of course, credit card numbers. The initial SSL handshake is slow in software, and much faster with a hardware SSL accelerator.Until recently, people tended to buy a commercial server to offer SSL. RSA Data Security owned a patent on a public-key encryption method used by SSL, and they licensed it to companies. After the patent expired in September 2000, free implementations of Apache+SSL emerged. Two modulesApache-SSL and mod_sslhave competed for the lead position. mod_ssl is more popular and easier to install, and it can be integrated as an Apache DSO. It's included with Apache 2 as a standard module. For Apache 1.x, you need to get mod_ssl from http://www.modssl.org and OpenSSL from http://www.openssl.org.Early in the SSL process, Apache requires a server certificate to authenticate its site's identity to the browser. Browsers have built-in lists of CAs and their credentials. If your server certificate was provided by one of these authorities, the browser will silently accept it and establish an SSL connection. The process of obtaining a server certificate involves proving your identity to a CA and paying a license fee. If the server certificate comes from an unrecognized CA or is self-signed, the browser will prompt the user to confirm or reject it. Large commercial sites pay annual fees to the CA to avoid this extra step, as well as to avoid the appearance of being less trustworthy. 10.4.9. Sessions and CookiesOnce a customer has been authenticated for your site, you want to keep track of him. You don't want to force a login on every page, so you need a way to maintain the state over time and multiple page visits.Since HTTP is stateless, visits need to be threaded together. If a person adds items to a shopping cart, they should stay there even if the user takes side trips through the site. Scripting languages address the problems of remembering information from page to page through the concept of a session.A session is a sequence of interactions. It has a session ID (a unique identifier), data, and a time span. A good session ID should be difficult to guess or reverse-engineer. A random ID is best, but an ID may be calculated from some input variables, such as the user's IP or the time. If the ID is not random, it should be encrypted. PHP, Perl, and other languages have code to create and manage web sessions.If the web user allows cookies in her browser, the web script may write the session ID as a variable in a cookie for your web site. If cookies are not allowed, you need to propagate the session ID with every URL. Every GET URL needs an extra variable, and every POST URL needs some hidden field to house this ID. 10.4.9.1 PHPPHP can be configured to check every URL on a page and tack on the session ID, if needed. In php.ini, add the following: session.use_trans_sid=1 This is a little slower, since PHP needs to examine every URL in the page's HTML contents.Without this, you need to track the sessions yourself. If cookies are enabled in the browser, PHP defines the constant SID to be an empty string. If cookies are disabled, SID is defined as PHPSESSID=id, where id is the 32-character session ID string. To handle either case in your script, append SID to your links: <a href=">link</a> If cookies are enabled, the HTML created by the previous example would be as follows: <a href=">link</a> If cookies are disabled, the session ID becomes part of the URL: <a href="PHPSESSID=379d65e3921501cc79df7d02cfbc24c3">link</a> By default, session variables are written to /tmp/sess_id. Anyone who can list the contents of /tmp can hijack a session ID, or possibly forge a new one. To avoid this, change the session directory to a more secure location (outside of DocumentRoot, of course).In php.ini: session.save_path=/usr/local/apache/sessions Or, in Apache's httpd.conf: php_admin_valuesession.save_path /usr/local/apache/sessions The directory and files should be owned by the web-server user ID and hidden from others: chmod 700 /usr/local/apache/session If there is more than one group of PHP developers, use virtual hosts and a host-specific session directory (such as /usr/local/apache/host/sessions) to prevent them from hijacking each other's sessions.You can also tell PHP to store session data in shared memory, a database, LDAP, or some other storage method. 10.4.9.2 PerlThe Apache::Session module provides session functions for mod_perl. The session ID can be saved in a cookie or manually appended to URLs. Session storage may use the filesystem, a database, or RAM. See the documentation at http://www.perldoc.com/cpan/Apache/Sessionl.Apache provides its own language-independent session management with mod_ session. This works with or without cookies (by appending the session ID to the URL in the QUERY_STRING environment variable) and can exempt certain URLs, file types, and clients from session control. 10.4.10. Site Management: Uploading FilesAs you update your web site, you will be editing and copying files. You may also allow customers to upload files for some purposes. How can you do this securely?Tim Berners-Lee originally envisioned the Web as a two-way medium, where browsers could easily be authors. Unfortunately, as the Web commercialized, the emphasis was placed on browsing. Even today, the return path is somewhat awkward, and the issue of secure site management is not often discussed. 10.4.10.1 Not-so-good ideasI mentioned form-based file uploads earlier. Although you can use this for site maintenance, it handles only one file at a time and forces you to choose it from a list or type its name.Although FTP is readily available and simple to use, it is not recommended for many reasons. It still seems too difficult to secure FTP servers: account names and passwords are passed in the clear.Network filesystems such as NFS or Samba are appealing for web-site developers, because they can develop content on their client machines and then drag and drop files to network folders. These filesystems are still too difficult to secure across the public Internet and are not recommended. At one time, Sun was promoting WebNFS as the next-generation, Internet-ready filesystem, but there has been little public discussion about this in the past few years.The HTTP PUT method is usually not available in web browsers. HTML authoring tools, such as Netscape Composer and AOLPress, use PUT to upload or modify files. PUT has security implications similar to form-based file uploads, and it now looks as if it's being superseded by DAV.Microsoft's FrontPage server extensions define web-server extensions for file uploading and other tasks. The web server and FrontPage client communicate with a proprietary RPC over HTTP. The extensions are available for Apache and Linux (http://www.rtr.com/fpsupport/indexl), but only as binaries.FrontPage has had serious security problems in the past. The author of the presentation Apache and FrontPage at ApacheCon 2001 recommended: "If at all possible, don't use FrontPage at all." There seems to be a current mod_frontpage DSO for Apache (http://www.rtr.com/fpsupport/whatsnew). Microsoft appears to be moving toward DAV. 10.4.10.2 Better ideas: ssh, scp, sftp, rsyncscp and sftp are good methods for encrypted file transfer. To copy many files, rsync or Unison over ssh provide an incremental, compressed, encrypted data transfer. This is especially useful when mirroring or backing up a web site. I do most of my day-to-day Linux work on live systems with ssh, vi, scp, and rsync. When working from a Windows box, I use putty and WinSCP. A true VPN would be even more convenient. 10.4.10.3 DAVDistributed Authoring and Versioning (DAV or WebDAV) is a recent standard for remote web-based file management. DAV lets you upload, rename, delete, and modify files on a web server. It's supported in Apache (as the mod_dav module) and by all the major web authoring tools, including: Microsoft web folders with IE 5 and Windows 95 and up. These look like local directories under Explorer, but are actually directories on a web server under DAV management. This is the simplest drag-and-drop solution I've seen for authors on Windows machines to publish to Apache on Linux. See http://www.mydocsonline.com/info_webfoldersl. Microsoft FrontPage 2003 Macromedia Dreamweaver UltraDev Adobe GoLive, InDesign, and FrameMaker Apple Mac OS X iDisk OpenOffice To add DAV support to Apache, ensure that mod_dav is included: Download the source from http://www.moddav.org. Build the module: ./configure --with-apxs=/usr/local/apache/bin/apxs Add these lines to httpd.conf: Loadmodule dav_module libexec/libdav.so Addmodule mod_dav.c Create a password file: htpasswd -s /usr/local/apache/passwords/dav.htpasswd user password In httpd.conf, enable DAV for the directories you want to make available. If you allow file upload, you should have some access control as well: # The directory part of this must be writeable # by the user ID running apache: DAVLockDB /usr/local/apache/davlock/ DAVMinTimeout 600 # Use a Location or Directory for each DAV area. # Here, let's try "/DAV": <Location /DAV> # Authentication: AuthName "DAV" AuthUserFile /usr/local/apache/passwords/dav.htpasswd" AuthType Basic # Some extra protection AllowOverride None # Allow file listing Options indexes # Don't forget this one!: DAV On # Let anyone read, but # require authentication to do anything dangerous: <LimitExcept GET HEAD OPTIONS> require valid-user </Limit> </Location> The security implications of DAV are the same as for basic authentication: the name and password are passed as plain text, and you need to protect the name/password files.DAV is easy to use and quite flexible. A new extension called DELTA-V will handle versioning, so DAV could eventually provide a web-based source-control system. 10.4.11. XML, Web Services, and RESTXML started as a text-based markup language to preserve the structure of data. It grew beyond file formats to RPC protocols such as XML-RPC and SOAP. These protocols use HTTP because it usually passes through corporate firewalls, and it would be difficult to establish a new specialized protocol. With other proposed standards such as Web Services Description Language (WSDL) and Universal Description, Discovery, and Integration (UDDI), a new field called web services (http://www.w3.org/2002/ws/) is emerging.There are some security concerns about this. You construct a firewall based on your knowledge that server A at port B can do C and D. But with SOAP and similar protocols, HTTP becomes a conduit for remote procedure calls. Even a stateful firewall cannot interpret the protocol to see which way the data flows or the implications of the data. That would require a packet analyzer that knows the syntax and semantics of the XML stream, which is a difficult and higher-level function.IBM, Microsoft, and others founded the Web Services Interoperability Group (http://www.ws-i.org) to create web-services standards outside of the IETF and W3C. Security was not addressed until the first draft of Web Services Security (http://www-106.ibm.com/developerworks/webservices/library/ws-secure/) appeared in April 2002. It describes an extensible XML format for secure SOAP message exchanges. This addresses the integrity of the message but still doesn't guarantee that the message's contents are safe when handled by the client or server. The http://www.ws-i.org/Pro/image/library/english/10020_BasicSecurityProfile-1.0-2004-05-12l) was approved in 2004. A separate group, OASIS, recently approved three Web Services Security specifications (http://www.oasis-open.org/specs/index.php).It's hard to be certain (the standards are heavy sledding), but it doesn't look like we have end-to-end security for web services yet.An alternative to XML-based web services is Representational State Transfer (REST), which uses only traditional web componentsHTTP and URIs. A description is found in Second Generation Web Services (http://www.xml.com/pub/a/2002/02/20/restl). Its proponents argue that REST can do anything that SOAP can do, but more simply and securely. All the techniques described in this chapter, as well as functions such as caching and bookmarking, could be applied because current web standards are well established. For instance, an HTTP GET method has no side effects and never modifies server state. A SOAP method may read or write, but this is due to a separate agreement between the server and client, and cannot be determined from the syntax of the SOAP message. See Some Thoughts About SOAP Versus REST on Security (http://www.prescod.net/rest/securityl).As these new web services roll out, the Law of Unintended Consequences will get a good workout. Expect major surprises. 10.4.12. Detecting and Deflecting AttackersThe more attackers know about you, the more vulnerable you are. Some use port 80 fingerprinting to determine what kind of server you're running. They can also pass a HEAD request to your web server to get its version number, modules, etc.Script kiddies are not known for their precision, so they will often fling IIS attacks such as Code Red and Nimda at your Apache server. Look at your error_log to see how often these turn up. You can exclude them from your logs with Apache configuration tricks. A more active approach is to send email to the administrator of the offending site, using a script like NimdaNotifyer (see http://www.digitalcon.ca/nimda/). You may even decide to exclude these visitors from your site. See Chapter 13 or visit http://www.snort.org to see how to integrate an IP blocker with their intrusion detector.A tarpit turns your network's unused IP addresses into a TCP-connection black hole, holding on to attackers who try to connect to them. Although an effective tool, a tarpit may actually be illegal in some places. Read the La Brea story at http://www.hackbusters.net/. 10.4.13. Caches, Proxies, and Load BalancersA proxy is a man in the middle. A caching proxy is a man in the middle with a memory. All the security issues of email apply to web pages as they stream about: they can be read, copied, forged, stolen, etc. The usual answer is to apply end-to-end cryptography.If you use sessions that are linked to a specific server (stored in temporary files or shared memory rather than a database), you must somehow get every request with the same session ID directed to the same server. Some load balancers offer session affinity to do this. Without it, you'll need to store the sessions in some shared medium, such as an NFS-mounted filesystem or a database. |