As the owner of a successful—or soon-to-be so—Website, you probably see site traffic as something you'd like to encourage. Unfortunately, high site traffic is just the kind of thing that a Web server administrator dreads—especially when that site is primarily composed of dynamically generated, database-driven pages. Such pages take a great deal more horsepower from the computer that runs the Web server software than plain, old HTML files do, because every page request is like a miniature program that runs on that computer.
While some pages of a database-driven site must always display up-to-the-second data culled from the database, others don't necessarily. Consider the front page of a Website like sitepoint.com. Typically, it presents a sort of "digest" of what's new and fresh on the site. But how often does that information actually change? Once a day? Once a week? And how important is it that visitors to your site see those changes the instant they occur? Would your site really suffer if changes took effect after a bit of a delay?
By converting high-traffic dynamic pages into semi-dynamic equivalents, which are static pages that get dynamically regenerated at regular intervals to freshen their content, you can go a long way towards reducing the toll that the database-driven components of your site take on your Web server's performance.
Say you have index.php, your front page, which provides a summary of new content on your site. Through examination of server logs, you'll probably find that this is one of the most requested pages on your site. If you ask yourself some of the questions above, you'll realize that this page doesn't have to be dynamically generated for every request. As long as it's updated every time new content is added to your site, it'll be as dynamic as it needs to be. With a PHP script, you can generate a static snapshot of the dynamic page's output and put this snapshot online, in place of the dynamic version, as indexl.
This little trick will require some reading, writing, and juggling of files. PHP is perfectly capable of accomplishing this task, but we have not yet seen the functions we'll need:
Opens a file for reading and/or writing. This file can be stored on the server's hard disk, or PHP can load it from a URL just like a Web browser would. |
||
Tells PHP you're finished reading/writing a particular file and releases it for other programs or scripts to use. |
||
Reads data from a file into a PHP variable. Allows you to specify how much information (i.e. how many characters or bytes) to read. |
||
Writes data from a PHP variable into a file. |
||
Performs a run-of-the-mill file copy operation. |
||
Deletes a file from the hard disk. |
Do you see where we're headed? If not, don't worry—you will in a moment.
Create a file called generateindex.php. It will be the responsibility of this file to load index.php, the dynamic version of your front page, as a Web browser would, then write the static version of the file as an updated version of indexl. If anything goes wrong in this process, you want to avoid the potential destruction of the good copy of indexl, so we'll make this script write the new static version into a temporary file (tempindexl) and then copy it over indexl if all is well.
Here's the code for generateindex.php, with ample comments so you can see what's going on:
<!-- generateindex.php --> <?php // Sets the files we'll be using $srcurl = 'http://localhost/index.php'; $tempfilename = 'tempindexl'; $targetfilename = 'indexl'; ?> <l> <head> <title> Generating <?=$targetfilename?> </title> </head> <body> <p>Generating <?=$targetfilename?>...</p> <?php // Begin by deleting the temporary file, in case // it was left lying around. This might spit out an // error message if it were to fail, so we use // @ to suppress it. @unlink($tempfilename); // Load the dynamic page by requesting it with a // URL. The PHP will be processed by the Web server // before we receive it (since we're basically // masquerading as a Web browser), so what we'll get // is a static HTML page. The 'r' indicates that we // only intend to read from this "file". $dynpage = fopen($srcurl, 'r'); // Check for errors if (!$dynpage) { die("<p>Unable to load $srcurl. Static page update aborted!</p>"); } // Read the contents of the URL into a PHP variable. // Specify that we're willing to read up to 1MB of // data (just in case something goes wrong). ldata = fread($dynpage, 1024*1024); // Close the connection to the source "file", now // that we're done with it. fclose($dynpage); // Open the temporary file (creating it in the // process) in preparation to write to it (note // the 'w'). $tempfile = fopen($tempfilename, 'w'); // Check for errors if (!$tempfile) { die("<p>Unable to open temporary file ($tempfilename) for writing. Static page update aborted!</p>"); } // Write the data for the static page into the // temporary file fwrite($tempfile, ldata); // Close the temporary file, now that we're done // writing to it. fclose($tempfile); // If we got this far, then the temporary file // was successfully written, and we can now copy // it on top of the static page. $ok = copy($tempfilename, $targetfilename); // Finally, delete the temporary file. unlink($tempfilename); ?> <p>Static page successfully updated!</p> </body> <l>
The above code only looks daunting because of the large comments I've included. Remove them, and you'll see it's actually a fairly simple script.
Now, whenever generateindex.php is executed (say, when a browser requests it), a fresh copy of indexl will be generated from index.php. If we move index.php and generateindex.php into a restricted-access directory, you can make sure that only site administrators have the ability to update the front page of your site in this way. Expand this script to generate all semi-dynamic pages on your site, and add an "update semi-dynamic pages" link to your content management system!
If you'd rather have your front page updated automatically, you'll need to set up your server to run generateindex.php at regular intervals—say, every hour. Under recent versions of Windows, you can use the Task Scheduler (called System Agent in older versions of Windows equipped with MS Plus Pack), to run php.exe, a stand-alone version of PHP included with the Windows PHP distribution, automatically every hour. Just create a batch file called generateindex.bat that contains this line of text:
C:\PHP\php.exe C:\WWW\generateindex.php
Adjust the paths and file names as necessary, and then set up Task Scheduler to run generateindex.bat every hour. In some versions of Windows, you'll need to set up 24 tasks to be run daily at the appropriate times. Done!
Under Linux, or other UNIX based platforms, you can do a similar thing with cron—a program installed on just about every UNIX system out there that lets you define tasks to be run at regular intervals. Ask your friendly neighbourhood Linux know-it-all, check your favourite Linux Website, or post a message on the SitePoint Forums if you need any help getting started with cron.
The task you'll set up cron to run will be very similar to the Windows task discussed above. The stand-alone version of PHP you'll need, however, doesn't come with the PHP Apache loadable module we compiled way back in "Installation". You'll need to compile it separately from the same package we used to compile the Apache module. Instructions for this are provided with the package and on the PHP Website, but feel free to post in the SitePoint Forums if you need help!
For experienced cron users in a hurry, here's what the line in your crontab file should look like:
0 0-23 * * * php /path/to/generateindex.php > /dev/null