
![]() | ![]() |
Chapter 19. CGI Programming
Contents:
IntroductionWriting a CGI ScriptRedirecting Error MessagesFixing a 500 Server ErrorWriting a Safe CGI ProgramExecuting Commands Without Shell EscapesFormatting Lists and Tables with HTML ShortcutsRedirecting to a Different LocationDebugging the Raw HTTP ExchangeManaging CookiesCreating Sticky WidgetsWriting a Multiscreen CGI ScriptSaving a Form to a File or Mail PipeProgram: chemiserieStephen C. JohnsonA successful tool is one that was used to do something undreamt of byits author.
19.0. Introduction
Changes in the environment or the
availability of food can make certain species more successful than
others at finding food or avoiding predators. Many scientists believe
a comet struck the Earth millions of years ago, throwing an enormous
cloud of dust into the atmosphere. Subsequent radical changes to the
environment proved too much for some organisms, say dinosaurs, and
hastened their extinction. Other creatures, such as mammals, found
new food supplies and freshly exposed habitats to compete in.Much as the comet altered the environment for prehistoric species,
the Web has altered the environment for modern programming languages.
It's opened up new vistas, and although some languages have found
themselves eminently unsuited to this new world order, Perl has
positively thrived. Because of its strong background in text
processing and system glue, Perl has readily adapted itself to the
task of providing information using text-based protocols.
19.0.1. Architecture
The Web is
driven by plain text. Web servers and web browsers communicate using
a text protocol called HTTP, Hypertext Transfer Protocol. Many of the
documents exchanged are encoded in a text markup system called HTML,
Hypertext Markup Language. This grounding in text is the source of
much of the Web's flexibility, power, and success. The only notable
exception to the predominance of plain text is the Secure Socket
Layer (SSL) protocol that encrypts other protocols like HTTP into
binary data that snoopers can't decode. Web pages are identified
using the Uniform Resource Locator (URL) naming scheme. URLs look
like this:http://www.perl.com/CPAN/
http://www.perl.com:8001/bad/mojol
ftp://gatekeeper.dec.com/pub/misc/netlib.tar.Z
ftp://anonymous@myplace:gatekeeper.dec.com/pub/misc/netlib.tar.Z
file:///etc/motd
The first part
(http, ftp,
file) is called the scheme,
which identifies how the file is retrieved. The next part
(://) means a hostname will follow, whose
interpretation depends on the scheme. After the hostname comes the
path identifying the document. This path
information is also called a partial URL.The Web is a client-server system. Client browsers like Netscape and
Lynx request documents (identified by a partial URL) from web servers
like Apache. This browser-to-server dialog is governed by the HTTP
protocol. Most of the time, the server merely sends back the file
contents. Sometimes, however, the web server runs another program to
return a document that could be HTML text, binary image, or any other
document type.The server-to-program dialog can be handled in two ways. Either the
code to handle the request is part of the web server process, or else
the web server runs an external program to generate a response. The
first scenario is the model of Java servlets and mod_perl (covered in
Chapter 21). The second is governed by the Common
Gateway Interface (CGI) protocol, so the server runs a CGI
program (sometimes known as a CGI
script). This chapter deals with CGI programs.The server tells the CGI program what page was requested, what values
(if any) came in through HTML forms, where the request came from,
whom they authenticated as (if they authenticated at all), and much
more. The CGI program's reply has two parts: headers to say "I'm
sending back an HTML document," "I'm sending back a GIF image," or
"I'm not sending you anything; go to this page instead," and a
document body, perhaps containing image data, plain text, or HTML.The CGI
protocol is easy to implement wrong and hard to implement right,
which is why we recommend using Lincoln Stein's excellent CGI.pm
module. It provides convenient functions for accessing the
information the server sends you, and for preparing the CGI response
the server expects. It's so useful, it's included in the standard
Perl distribution, along with helper modules such as CGI::Carp and
CGI::Fast. We show it off in Recipe 19.1.Some web servers come with a Perl interpreter embedded in them. This
lets Perl generate documents without starting a new process. The
system overhead of reading an unchanging page isn't noticeable on
infrequently accessed pages, even when it's happening several times a
second. CGI accesses, however, bog down the machine running the web
server. Chapter 21 shows how to use
mod_perl, the Perl interpreter embedded in the
Apache web server to get the benefits of CGI programs without the
overhead.
19.0.2. Behind the Scenes
CGI programs are called each time the web server needs a dynamic
document generated. It is important to understand that your CGI
program doesn't run continuously, with the browser calling different
parts of the program. Each request for a partial URL corresponding to
your program starts a new copy. Your program generates a page for
that request, then quits.
A browser can request a document in
several distinct ways called methods. (Don't
confuse HTTP methods with the methods of object-orientation. They
have nothing to do with each other). The GET method is the most
common, indicating a simple request for a document. The HEAD method
supplies information about the document without actually fetching it.
The POST method submits form values.Form values can be encoded in both GET and POST methods. With the GET
method, values are encoded directly in the URL, leading to ugly URLs
like this:http://www.perl.com/cgi-bin/program?name=Johann&born=1685
With the POST method, values are encoded in a separate part of the
HTTP request that the client browser sends the server. If the form
values in the previous example URL were sent with a POST request, the
user, server, and CGI script would all see the URL:http://www.perl.com/cgi-bin/program
The GET and POST
methods differ in another respect: idempotency.
This simply means that making a GET request for a particular URL once
or multiple times should be no different. The HTTP protocol
definition says that a GET request may be cached by the browser, the
server, or an intervening proxy. POST requests cannot be cached,
because each request is independent and matters. Typically, POST
requests any changes or depends on the state of the server (query or
update a database, send mail, or purchase a computer).
Most
servers log requests to a file (the access log)
for later analysis by the webmaster. Error messages produced by CGI
programs don't by default go to the browser. Instead they are logged
to a file on the server (the error log), and the
browser simply gets a "500 Server Error" message, which means that
the CGI program didn't uphold its end of the CGI bargain.
Error messages are useful in debugging
any program, but they are especially so with CGI scripts. Sometimes,
though, the authors of CGI programs either don't have access to the
error log or don't know where it is. Sending error messages to a more
convenient location is discussed in Recipe 19.2. Tracking down errors is covered in Recipe 19.3.Recipe 19.8 shows how to learn what your
browser and server are really saying to one another. Unfortunately,
some browsers do not implement the HTTP specification correctly, and
this recipe helps you determine whether your program or your browser
is the cause of a problem.
19.0.3. Security
CGI programs let anyone run a program on
your system. Sure, you get to pick the program, but the anonymous
user from Out There can send unexpected values, hoping to trick it
into doing the wrong thing. Thus security is a big concern on the
Web.Some sites address this concern by banning CGI programs. Sites that
can't do without the power and utility of CGI programs must find ways
to secure their CGI programs. Recipe 19.4
gives a checklist of considerations for writing a secure CGI script,
briefly covering Perl's tainting mechanism for guarding against
accidental use of unsafe data. Recipe 19.5
shows how your CGI program can safely run other programs.
19.0.4. HTML and Forms
Some HTML tags
let you create forms, where the user can fill in values to submit to
the server. The forms are composed of widgets, such as text entry
fields and check boxes. CGI programs commonly return HTML, so the CGI
module has helper functions to create HTML for everything from tables
to form widgets.In addition to Recipe 19.6, this chapter also
has Recipe 19.10, which shows how to create
forms that retain values over multiple calls. Recipe 19.11 shows how to make a single CGI script that
produces and responds to a set of pages, such as a product catalog
and ordering system.
10.0.5. Web-Related Resources
Unsurprisingly, some of the best references
on the Web are found on the Web:WWW Security FAQ
http://www.w3.org/Security/Faq/Web FAQ
http://www.boutell.com/faq/CGI FAQ
http://www.webthing.com/tutorials/cgifaqlHTTP Specification
http://www.w3.org/pub/WWW/Protocols/HTTP/HTML Specification
http://www.w3.org/TR/REC-html40/http://www.w3.org/pub/WWW/MarkUp/CGI Specification
http://www.w3.org/CGI/CGI Security FAQ
http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txtWe recommend CGI Programming with Perl, by Scott
Guelich, Shishir Gundavaram, and Gunther Birznieks (O'Reilly);
HTML & XHTML: The Definitive Guide, by Chuck
Musciano and Bill Kennedy (O'Reilly); and HTTP: The
Definitive Guide, by David Gourley and Brian Totty, et al
(O'Reilly).
![]() | ![]() | ![]() |
18.16. Program: expn and vrfy | ![]() | 19.1. Writing a CGI Script |

Copyright © 2003 O'Reilly & Associates. All rights reserved.