Web Database Applications With Php And Mysql (2nd Edition) [Electronic resources] نسخه متنی

9.2 Server-Side Validation with PHP

In this section, we introduce
validation on the server using PHP. We show you how to validate
numbers including currencies and credit cards, strings including
email addresses and Zip Codes, and dates and times. We also show you
how to check for mandatory fields, field lengths, and data types.
Many of the PHP functions we useincluding the regular
expression and string functionsare discussed in detail in
Chapter 3.

We illustrate many of our examples in this section with a case study
of validating customer details. The techniques described here are
typical of those that validate a form after the user has submitted
data to the server. We show how to extend and integrate this approach
further in Chapter 10 so that the batch errors
are reported as part of a customer form, and we show a completed
customer entry form and validation in Chapter 17.

9.2.1 Mandatory Data

Testing
whether mandatory fields have been entered is straightforward, and we
have implemented this in our examples in Chapter 8. For example, to test if the
user's surname has been entered, the following
approach is used:

/// Validate the Surname
if (empty($surname))
formerror($template, "The surname field cannot be blank.", $errors);

The formerror( ) function outputs the error
message as a batch error using a template and is discussed in detail
in Chapter 8. For simplicity and compactness in
the remainder of our examples in this chapter, we omit the
formerror( ) function from code fragments and
simply output the error messages using print.

9.2.2 Validating Strings

In this
section, we discuss nonnumeric validation. We begin with the basics
of validating strings, and then discuss the specifics of email
addresses, URLs, and Zip or post codes.

9.2.2.1 Basic techniques

It's likely that most of the data entered by users
will be strings and require validation. Indeed, checking that strings
contain legal characters, are of the correct length, or have the
correct format is the most common validation task. Strings are
popular for two reasons: first, all data from a form that is stored
in the superglobals $_GET and
$_POST is of the type string; and, second, some
nonstring data such as a date of birth or a phone number is likely to
be stored as a string in a database table because it may contain
brackets, dashes, and slashes. However, despite dates and phone
numbers being sometimes stored as strings, we discuss their
validation in Section 9.2.2.5.

The simplest test of a string is to check if it meets a minimum or
maximum length requirement. For example:

if (strlen($password) < 4 || strlen($password) > 8)
print "Password must contain between 4 and 8 characters";

Length
validation can also be performed using a regular expression, as we
show in later examples in this section. Our mysqlclean(
) and shellclean( ) functions also
include an implicit maximum length validation. As discussed in Chapter 6, these functions should be used as a first
step in validation that helps to secure an application.

Common tests for legal characters include
checking if strings are uppercase, lowercase, alphabetic, or are
drawn from a defined character set (such as, for example, alphabetic
strings that may include hyphens or apostrophes). In PHP, the
is_string( )
function can be used to check if
a variable is a string type. However, this is of limited use in
validation because a string can contain any character including (or
even exclusively) digits or special characters. It's
more useful to test what characters are in the string or detect
characters that shouldn't be there.

Regular expressions offer three shortcuts for use in basic tests that
are discussed in Chapter 3. To test if a string
is alphabetic, use:

if (!ereg("^[[:alpha:]]$", $string))
print "String must contain only alphabetic characters.";

To test if a string is uppercase or lowercase, use:

if (ereg("^[[:upper:]]$", $string))
print "String contains only uppercase characters.",;
if (ereg("^[[:lower:]]$", $string))
print "String contains only lowercase characters";

The expressions work for the English character sets, and also work
for French if you set your locale at the beginning of the script
using, for example,
setlocale(`LC_ALL',
'fr'). In the future, it
should work for all localities and, therefore, these techniques are
useful for internationalizing your application.

If you're working with only the English language a
simpler alphabetic test works:

if (!eregi("^[a-z]*$", $string))
print "String must contain only alphabetic characters.";

For other character sets (or if you want detailed control over
English validation), a handcrafted expression works well. For
example, the following works as an alphabetic test for Spanish:

if (!eregi("^[a-zñ]*$", $string))
print "La cadena debe contener solamente caracteres alfabeticos";

Sometimes it's easier to check what characters
shouldn't be there. For example, at our university,
student email accounts must begin with an S:

if (!ereg("^S", $text))
print "Student accounts must begin with S.";

However, for this simple example, a regular expression will run
slower than using a string library function. Instead, a better
approach is to use substr(
)
:

if (substr($text, 0 , 1) != "S")
print "Student accounts must begin with S.";

In general, you should use string functions for low complexity tasks.

For our customer case study, we might allow the firstname and surname
of the customer to contain only alphabetic characters, hyphens, and
apostrophes; white space, numbers, and other special characters
aren't allowed. For the firstname we use:

elseif (!eregi("^[a-z'-]*$", $firstName))
print "The first name can contain only alphabetic " .
"characters or - or '";

Length validation and character checks are often combined. For
example, the customer's middle initial might be
limited to exactly one alphabetic character:

if (!empty($initial) && !eregi("^[a-z]$", $initial))
print "The initial field must be empty or one character in length.";

The if statement contains two clauses: a check as
to whether the field contains data and, if that's
true, a check of the contents of the field using
eregi( ). As discussed in Chapter 2, the second clause is checked only if the
first clause is true when an
AND (&&) expression is
evaluated. If the variable is empty, the eregi(
) expression isn't evaluated.

The expression ^[a-z]$ is the same as
^[a-z]{1}$. To check if a string is exactly four
alphabetic characters in length use ^[a-z]{4}$. To
check if it's between two and four characters use
^[a-z]{2,4}$.

9.2.2.2 Validating Zip and postcodes

Zip
or postcodes are numeric in most countries but are typically stored
as strings because spaces, letters, and special characters are
sometimes allowed. In our customer case study, we might validate Zip
Codes using a simple regular expression:

// Validate Zipcode
if (!ereg("^([0-9]{4,5})$", $zipcode))
print "The zipcode must be 4 or 5 digits in length.";

This permits a Zip Code of either four or five digits in length; this
works for both U.S. Zip Codes, and Australia's and
several other countries' postcodes, but
it's unsuitable for many other countries. For
example, postcodes from the United Kingdom include letters and a
space and have a complex structure.

For complete validation, we could adapt our Zip or postcode
validation to match the country that the user has entered. Example 9-1 shows a validation function that adapts for
many Zip and postcodes. The final five case
statements check postcodes that must include spaces, dashes, and
letters.

Example 9-1. A code fragment to validate many popular Zip and postcodes

function checkcountry($country, $zipcode)
{
switch ($country)
{
case "Austria":
case "Australia":
case "Belgium":
case "Denmark":
case "Norway":
case "Portugal":
case "Switzerland":
if (!ereg("^[0-9]{4}$", $zipcode))
{
print "The postcode/zipcode must be 4 digits in length";
return false;
}
break;
case "Finland":
case "France":
case "Germany":
case "Italy":
case "Spain":
case "USA":
if (!ereg("^[0-9]{5}$", $zipcode))
{
print "The postcode/zipcode must be 5 digits in length";
return false;
}
break;
case "Greece":
if (!ereg("^[0-9]{3}[ ][0-9]{2}$", $zipcode))
{
print "The postcode must have 3 digits, a space,
and then 2 digits";
return false;
}
break;
case "Netherlands":
if (!ereg("^[0-9]{4}[ ][A-Z]{2}$", $zipcode))
{
print "The postcode must have 4 digits, a space, and then 2
letters";
return false;
}
break;
case "Poland":
if (!ereg("^[0-9]{2}-[0-9]{3}$", $zipcode))
{
print "The postcode must have 2 digits, a dash,
and then 3 digits";
return false;
}
break;
case "Sweden":
if (!ereg("^[0-9]{3}[ ][0-9]{2}$", $zipcode))
{
print "The postcode must have 3 digits, a space,
and then 2 digits";
return false;
}
break;
case "United Kingdom":
if (!ereg("^(([A-Z][0-9]{1,2})|([A-Z]{2}[0-9]{1,2})|" .
"([A-Z]{2}[0-9][A-Z])|([A-Z][0-9][A-Z])|" .
"([A-Z]{3}))[ ][0-9][A-Z]{2}$", $zipcode))
{
print "The postcode must begin with a string of the format
A9, A99, AA9, AA99, AA9A, A9A, or AAA,
and then be followed by a space and a string
of the form 9AA.
A is any letter and 9 is any number.";
return false;
}
break;
default:
// No validation
}
return true;
}

Another common validation check with Zip Codes is to check that they
match the city or state using a database table, but we
don't consider this approach here.

9.2.2.3 Validating email addresses

Email
addresses are another common string that requires field organization
checking. There is a standard maintained by the Internet Engineering
Task Force (IETF) called RFC-2822 that defines what a valid email
address can be, and it's much more complex than
might be expected. For example, an address such as the following is
valid:

" <test> "@webdatabasebook.com

In our customer case study, we might use a regular expression and
network functions to validate an email address. A function for this
purpose is shown in Example 9-2.

Example 9-2. A function to validate an email address

function checkemail($email)
{
// Check syntax
$validEmailExpr =  "^[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*" .
"@[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*$";
// Validate the email
if (empty($email))
{
print "The email field cannot be blank";
return false;
}    
elseif (!eregi($validEmailExpr, $email))
{  
print "The email must be in the name@domain format.";
return false;
}
elseif (strlen($email) > 30)
{ 
print "The email address can be no longer than 30 characters.";
return false;
}
elseif (function_exists("getmxrr") && function_exists("gethostbyname"))
{
// Extract the domain of the email address
$maildomain = substr(strstr($email, '@'), 1);
if (!(getmxrr($maildomain, $temp) || 
gethostbyname($maildomain) != $maildomain))
{
print "The domain does not exist.";
return false;
}
}
return true;
}

If any email test fails, an error message is output, and no further
checks of the email value are made. A valid email passes all tests.

The first check tests to make sure that an email address has been
entered. If it's omitted, an error is generated. It
then uses a regular expression to check if the email address matches
a template. It isn't RFC-2822-compliant but works
reasonably for most email addresses:

It uses eregi( ), so either upper- or lowercase
are matched by the use of a-z.

It expects the string to begin with a character from the set
0-9, a-z, and
~!#$%&_-. There has to be at least one
character from this set at the beginning of the email address for it
to be valid.

After the first character matches, there is an optional bracketed
expression:

([.]?[0-9a-z~!#$%&_-])*

This expression is optional because it's suffixed
with the * operator. However, if it does match, it
matches any number of the characters specified. There can only be one
consecutive full-stop if a full-stop occurs, as determined by the
expression [.]?. The expression, for example,
matches the string fred.williams but not
fred..williams.

After the initial part of the email address, the character
@ is expected. The @ has to
occur after the first word for the string to be valid; our regular
expression rejects an email address such as fred
that has only the initial or local component.

Our validation expects there to be another word of at least one
character after the @ symbol, and this can be
followed by any combination of the permitted characters. Strings of
permitted characters can be separated by a single full-stop.

The function is imperfect. It allows several illegal email addresses
and doesn't allow many that are legal but unusual.

The third step is to check the length of the email address. If it
exceeds 30 characters, an error is generated.

The fourth and final step is to check whether the domain of the email
address actually exists. The fragment only works on platforms that
support the network library functions getmxrr(
)
and gethostbyname(
)
:

elseif (function_exists("getmxrr") && function_exists("gethostbyname"))
{
// Extract the domain of the email address
$maildomain = substr(strstr($email, '@'), 1);
if (!(getmxrr($maildomain, $temp) || 
gethostbyname($maildomain) != $maildomain))
{
print "The domain does not exist.";
return false;
}
}

The function getmxrr( ) queries an Internet
domain name server (DNS) to check if there is a record of the email
domain as a mail exchanger (MX). If the domain isn't
an `MX', the domain is checked with
gethostbyname( ) to see if it has an
`A' record; the relevant standard
RFC-974 states that when a domain does not have an
`MX', it should be interpreted as
having one equal to the host name. If both tests fail, the domain of
the email address isn't valid and we reject the
email address.

For platforms (such as Microsoft Windows) that don't
have the getmxrr( ) and gethostbyname(
) functions, the
PEAR Net_DNS package can be used
instead. It must be installed using the PEAR installer. The DNS
lookup package must then be included into the source code using:

require_once "Net/DNS.php";

Installation of packages is discussed in Chapter 7.

The following fragment is a function checkMailDomain(
) that uses PEAR Net_DNS to check if the domain parameter
$domain has a record of the type matching the
parameter $type:

// Call with $type of MX, then A to check if an email address
// domain is valid
function checkMailDomain($domain, $type)
{
// Create a DNS resolver, and look up an $type record for $domain
$resolver = new Net_DNS_Resolver( );
$answer = $resolver->search($domain, $type);
// Is there an answer record?
if (isset($answer->answer))
// Iterate through the answers
foreach($answer->answer as $ans)
// If it's a $type answer, return true
if ($ans->type == $type)
return true;
return false;
}

The function returns true if the DNS server
responds with an answer that includes a record of the type
that's been requested; it returns
false otherwise.

The following code fragment can then be used to validate an email
address:

// Extract the domain of the email address
$maildomain = substr(strstr($email, '@'), 1);
if (!(checkMailDomain($maildomain, "MX") || 
checkMailDomain($maildomain, "A")))
{
print "The domain does not exist.";
return false;
}

As in the previous example that uses getmxrr( )
and gethostbyname( ), we check if there is a
record of the email domain as a mail exchanger (MX). If the domain
isn't an `MX', the
domain is checked to see if it has an
`A' record. If both tests fail, the
domain of the email address isn't valid and we
reject the email address.

9.2.2.4 Validating URLs

Home pages, links,
and other URLs are sometimes entered by users. In PHP, validating
these is straightforward because the library function
parse_url( ) can do most of the work for you.

The parse_url( ) function takes one parameter, a
URL string, and returns an associative array that contains the
components of the URL. For example:

$bits =
parse_url("http://www.webdatabasebook.com/test.php?status=F#message");
foreach($bits as $var => $val)
echo "{$var} is {$val}\n";

produces the output:

scheme is http
host is www.webdatabasebook.com
path is /test.php
query is status=F
fragment is message

The parse_url( ) function can be used in
validation as follows:

$bits = parse_url($url);
if ($bits["scheme"] != "http")
print "URL must begin with http://.";
elseif (empty($bits["host"]))
print "URL must include a host name.";
elseif (function_exists('checkdnsrr') && !checkdnsrr($bits["host"], 'A'))  
print "Host does not exist.";

You might also add elseif clauses to check for
specific path, query, or fragment components. In addition, you could
modify the test of the scheme to check for other valid URL types,
including ftp://, https://, or file://.

Unfortunately, at the time of writing, parse_url(
) is slightly broken in PHP 4.3; it works fine in earlier
and later versions of PHP. The bug is that if no path is present in
the URL, all following components (such as a query or fragment) are
incorrectly appended to the host element. To fix this, you can
include the following fragment after the call to parse_url(
):

// Fix the hostname (if needed) in PHP 4.3
if (strpos($bits["host"], '?'))
$bits["host"] = substr($bits["host"], 0, strpos($bits["host"], '?'));
if (strpos($bits["host"], '#'))
$bits["host"] = substr($bits["host"], 0, strpos($bits["host"], '#'));

For non-Unix environments, you can check the host domain exists by
using the PEAR-based approach described in the previous
section.

9.2.2.5 Validating numbers

Checking
that values are numeric, are within a range, or have the correct
format is a common validation task. For our case study customer
example, there might be several semi-numeric fields such as fax and
telephone numbers, the customer's salary, or a
credit card number. Zip and post codes aren't always
numeric, and are discussed in Section 9.2.2.

The two most common checks for numbers are whether they are in fact
numeric and whether they're within a required range.
In PHP, the is_numeric(
)
function can be used to check if a
variable contains only digits or if it matches one of the legal
number formats. For example, to check if a salary is numeric, you can
use:

if (!is_numeric($salary))
print "Salary must be numeric";

The is_numeric( ) function
doesn't always behave in the way you expect. Leading
and trailing spaces, carriage returns, commas, and spaces after minus
signs can result in a false return value. Leading
and trailing spaces can be removed with the trim(
) function, while allowing specialized formats may instead
require the use of a regular expression.

The legal number formats to is_numeric( )
include integers such as 87000, scientific notation such as
12e4, floating point numbers such as 3.14159 (or
3,14159 if your locale is set to France), hexadecimal notation such
as 0xff, and negative numbers such as -1.

Before checking variables initialized from form data, they should be
converted to a numeric type using the functions intval(
) or floatval( ) that convert a
string to a number. A test such as if ($_GET["year"] < 1902) may not work as expected, because
$_GET["year"] is a string and
1902 is an integer. The test if (intval($_GET["year"]) < 1902) works reliably. Both
functions are discussed in Chapter 3.

Consider an example. Suppose that a whole-dollar salary is provided
from a form through the POST method and is stored
as $_POST["salary"]. To check if
it's a valid number, use the following steps:

if (!is_numeric($salary))
print "Salary must be numeric";
else 
// remove spaces and convert to an integer
$salary = intval($_POST["salary"]);

After type conversion to numbers, form data can be validated to check
whether it meets range requirements using the basic comparison
operators. For example, to check that an age is in a sensible range,
you could use:

if ($age < 5 || $age > 105)
print "Age must be in the range 5 to 105";

Another common type of numeric validation is checking currencies.
Generally, these have one of two common formats: only a currency
amount (for example, 10 dollars, 10 cents, or 25 Yen), or a currency
amount and a unit amount (for example, $10.15). Currencies should be
checked to see if they match the required format, and then (if
needed) to see if they're within a range. For
example, to check if a currency amount is in whole dollars and
between four and six digits in length, you could use:

if (!ereg("^[0-9]{4,6}", $salary))
print "Salary must be in whole dollars";

To check if a value is in the currency and unit format, you could use:

if (!ereg("^[0-9]{1,3}[.][0-9]{2}$", $price))
print "Item price must be between US$0.00 and US$999.99, " . 
"and must include the cent amount.";

It's important for an internationalized web database
application to inform the user what currencies are allowed.

Simple variations of the currency validation techniques can be used
to check the format of floating point numbers. For example, if a
maximum of five decimal places are allowed for a length value, use:

if (!ereg("^[0-9]*([.][0-9]{1,5})?$", strval($length)))
print "Length can have a maximum of five decimal places";

The expression ^[0-9]* allows any number of digits
at the beginning of the number and before the optional decimal place.
The ? in the expression
([.][0-9]{1,5})?$ implements an optional mantissa
by allowing either zero or one copies of a string that matches the
bracketed expression that precedes the ?. The
bracketed expression itself requires a decimal point (represented by
[.]), and then between one and five digits
(represented by [0-9]{1,5}). The end of the number
is expected after the optional mantissa. To allow positive or
negative values to be specified, you could add
[+-]? immediately after the ^
at the beginning of the expression.

It doesn't always make sense to range check numeric
data. For example, phone and fax numbers aren't
usually added, subtracted, or tested against ranges. In our customer
example, we might validate a phone number using a regular expression
that checks it has a reasonable structure:

// Phone is optional, but if it is entered it must have
// correct format
$validPhoneExpr = "^([0-9]{2,3}[ ]*)?[0-9]{4}[ ]*[0-9]{4}$";
if (!empty($phone) && !ereg($validPhoneExpr, $phone))
print "The phone number must be 8 digits in length, " .
"with an optional 2 or 3 digit area code";

This is an AND (&&)
expression, so the ereg( ) function is only
evaluated if the $phone variable is not empty.

The first expression ^([0-9]{2,3}[
]*)? matches either zero or one occurrence of the
bracketed expression at the beginning of the value. Inside the
brackets, the expression that is matched is two or three digits and
any number of optional space characters (represented as
[ ]*). For example, a string
03 matches, as does 835. The
second part of the expression [0-9]{4}[ ]*[0-9]{4}$ matches exactly four digits, followed by any
number of optional spaces, followed by another four digits, and then
the end of the string is expected. For example, the strings
1234 1234 and
12341234 both match the expression.

9.2.2.6 Validating credit cards

The
last numeric type we consider in this section is credit card numbers.
There are two steps to validating a credit card
that's entered for payment of goods or services:
first, we need to check the credit card number and its expiration
date are valid; and, second, we need to verify that the payment will
be honored by the bank or other credit card provider. If the
user's entering their credit card as part of the
account creation process, the second step isn't
usually needed until they make a payment.

In this section, we show you how to validate a credit card number.
Expiration dates can be validated using the date checking functions
discussed later in this section.

Checking that payment will be honored by the credit card provider is
outside the scope of this book. However, many credit card payment
validation network libraries are available for this purpose: PEAR
contains a few, several are available as PHP libraries as listed in
Appendix G, and open source solutions have been
developed and are readily available on the Web. All credit checking
facilities require a paid subscription to a validation service.

Example 9-3 shows a function checkcard(
) that validates credit card numbers. The function works
as follows. First, it checks the card number contains only digits and
spaces, and after the check it removes the spaces using
ereg_replace( ) leaving only the card number.
Second, it extracts the first four digits and checks which of the
different credit cards it matches and uses this to determine the
correct length of the number; we discuss this further next. Third, it
rejects cards that aren't supported or where the
length doesn't match the correct length for the
card. Last, the credit card is validated using the
Luhn algorithm, which we return to in a moment.

Example 9-3. A function to validate credit card numbers

function checkcard($cc, $ccType)
{
if (!ereg("^[0-9 ]*$", $cc))
{
print "Card number must contain only digits and spaces.";
return (false);
}
// Remove spaces
$cc = ereg_replace('[ ]', '', $cc);
// Check first four digits
$firstFour = intval(substr($cc, 0, 4));
$type = ";
$length = 0;
if ($firstFour >= 8000 && $firstFour <= 8999)
{
// Try: 8000 0000 0000 1001
$type = "SurchargeCard";
$length = 16;
}
elseif ($firstFour >= 9100 && $firstFour <= 9599)
{
// Try: 9100 0000 0001 7
$type = "AustralianExpress";
$length = 13;
}
if (empty($type) || strcmp($type, $ccType) != 0)
{
print "Please check your card details.";
return (false);
}
if (strlen($cc) != $length)
{
print "Card number must contain {$length} digits.";
return (false);
}
$check = 0;
// Add up every 2nd digit, beginning at the right end
for($x=$length-1;$x>=0;$x-=2)
$check += intval(substr($cc, $x, 1));
// Add up every 2nd digit doubled, beginning at the right end - 1.
// Subtract 9 where doubled value is greater than 10
for($x=$length-2;$x>=0;$x-=2)
{
$double = intval(substr($cc, $x, 1)) * 2;
if ($double >= 10)
$check += $double - 9;
else
$check += $double;
}
// Is $check not a multiple of 10?
if ($check % 10 != 0)
{
print "Credit card invalid. Please check number.";
return (false);
}
return (true);
}

Table 9-1 shows the prefixes of the four most
popular credit cards and the card number length for those cards. For
example, MasterCard cards always begin with four
digits in the range 5100 to 5599, and are sixteen digits in length.
The function in Example 9-2 supports two fictional
cards: SurchargeCard that begins with numbers in
the range 8000 to 8999 and has 16 digits, and
AustralianExpress with prefixes from 9100 to
9599 and 13 digits in length. Example valid card numbers for these
fictional cards are included as comments in the code. You can find
sample numbers for all popular cards at http://www.verisign.com/support/payflow/link/pfltestprocessl.

Table 9-1. Popular credit card prefixes and lengths
Card name	Four-digit prefix	Length
American Express	3400-3499, 3700-3799	15
Diners Club	3000-3059, 3600-3699, 3800-3889	14
MasterCard	5100-5599	16
Visa	4000-4999	13 or 16

Credit card validation is performed with the Luhn algorithm. This
works as follows:

Sum up every second digit in the credit card number, beginning with
the last digit and proceeding right-to-left.

Sum up the double of every second digit in the credit card number,
beginning with the second to the last digit and proceeding
right-to-left. If the double of the digit is greater than 10,
subtract 9 from the value before adding it to the sum.

Determine if the sum of the two steps is a multiple of 10. If it is,
the credit card number is valid. If not, the number is rejected.

Consider an example credit card of ten digits in length: 1234000014.
In the first step, we add every second digit from the right,
beginning with the last. So, 4+0+0+4+2=10. Then, in the second step,
we add the double of each digit beginning with the second last
(subtracting 9 if any doubling is over 10) and then add the sum to
the total from the first step. So, 2+0+0+6+2=10, and adding to 10
from the first step gives 20. Since 20 is exactly divisible by 10,
the card has a valid number.

9.2.3 Validating Dates and Times

Dates of birth, expiry dates, order
dates, and other dates are commonly entered by users. Most dates
require specialized checks to see if the date is valid and if
it's in a required date range. Times are less
complicated, but specialized checks are still useful.

9.2.3.1 Dates

Dates can be given in several
different formats and using many different calendars. We only discuss
the Gregorian calendar here.

In the U.S.,
months are listed before days, but the majority of the rest of the
world uses the opposite approach. Years can be provided as two or
four digits, although we recommend avoiding two digit years for the
obvious confusion caused when 99 comes before 00. This leads to four
formats: DDMMYY, DDMMYYYY,
MMDDYY, and MMDDYYYY, where
Y is a year digit, M is month
digit, and D is a day digit.

In all date formats, a forward slash, a hyphen, or (rarely) a colon
can be used to separate the groups, leading to twelve formats in
total. For sorting, a thirteenth (convenient) format is
YYYYMMDD without the separators. Dates can also be
specified using month names, leading to strings such as
11-Aug-1969 and 11 August 1969.

Date values have complex validation requirements, and are difficult
to manipulate. Months have different numbers of days, some years are
leap years, and some annual holidays fall on different days in
different years. Adding and subtracting dates, working out the date
of tomorrow or next week, and finding the first Sunday of the month
aren't straightforward.
A
particularly non-straightforward task is finding when the Christian
religion's Easter holiday falls in a year, as
explained at the Astronomical Society of South Australia web site,
http://www.assa.org.au/edml.

Consider an example from our
customer case study. Let's suppose the user is
required to provide a date of birth in the format common to most of
the world, DD/MM/YYYY. We then need to validate
this date of birth to check that it has been entered and to check its
format, its validity, and whether it's within a
range. The range of valid dates in the example begins with the user
being alivefor simplicity, we assume alive users are born
after 1902and ends with the user being at least 18 years of
age.

Date-of-birth checking is implemented with the code in Example 9-4.

Example 9-4. Date-of-birth validation

function checkdob($birth_date)
{
if (empty($birth_date))
{
print "The date of birth field cannot be blank.";
return false;
}
// Check the format and explode into $parts
elseif (!ereg("^([0-9]{2})/([0-9]{2})/([0-9]{4})$", 
$birth_date, $parts))
{
print "The date of birth is not a valid date in the 
format DD/MM/YYYY";
return false;
}
elseif (!checkdate($parts[2],$parts[1],$parts[3]))
{
print "The date of birth is invalid. Please check that the month is
between 1 and 12, and the day is valid for that month.";
return false;
}
elseif (intval($parts[3]) < 1902 || 
intval($parts[3]) > intval(date("Y")))
{
print "You must be alive to use this service.";
return false;
}
else
{
$dob = mktime(0, 0, 0, $parts[2], $parts[1], $parts[3]);
// Check whether the user is 18 years old.
if ((float)$dob > (float)strtotime("-18years"))
{
print "You must be 18+ years of age to use this service";
return false;
}
}
return true;
}

If any date test fails, an error is reported, and no further checks
of the date are made. A valid date passes all the tests.

The first check tests if a date has been entered. The second check
uses a regular expression to check whether the date consists of
numbers and if it matches the template 99/99/9999
(where 9 means a number):

elseif (!ereg("^([0-9]{2})/([0-9]{2})/([0-9]{4})$", $birth_date, $parts))
{
print "The date of birth is not a valid date in the format DD/MM/YYYY";
return false;
}

You can adapt this check to match any of the other thirteen basic
formats we outlined at the beginning of this section.

Whatever the result of this formatting check, the expression also
explodes the date into the array $parts so that
the component that matches the first bracketed expression
([0-9{2}) is found in
$parts[1], the second bracketed expression in
$parts[2], and the third bracketed expression in
$parts[3]. Using this approach, the day of the
month is accessible as $parts[1], the month as
$parts[2], and the year as
$parts[3]. The ereg( )
function also stores the string matching the complete
expression in $parts[0].

The third check uses the exploded data stored in the array
$parts and the function checkdate(
)
to test if the date is a valid calendar
date. For example, the date 31/02/1970 would fail
this test. The fourth check tests if the year is in the range 1902 to
the current year. The function
date("Y")
returns the current year as a string.

The fifth and final check tests if the user is 18 years of age or
older, and uses the approach described in Chapter 3. It finds the difference between the date
of birth and the current date using library functions, and checks
that this difference is more than 18 years. We use the
mktime( )
function to convert the date of birth to a
large numeric Unix timestamp value, and the strtotime(
)
function to discover the timestamp of
exactly 18 years ago. Both are cast to a large floating number to
ensure reliable comparison, and if the user is born in the past 18
years, an error is produced.

The mktime( ) function works for years between
1901 and 2038 on Unix systems, and only from 1970 to 2038 for
variants of Microsoft Windows. The PEAR Date package
doesn't suffer from year limitations, and we discuss
how to use it later in this section.

9.2.3.2 Times

Times

are
easier to work with than dates, but they also come in several valid
formats. These include the 24-hour clock format
9999, the 12-hour clock formats
99:99am or 99:99pm (or with a
period instead of a colon), and formats that include seconds and
hundredths of seconds. In each format, different ranges of values are
allowed.

Consider an example where a user is required to enter a date in the
12-hour format using a colon as the separator. With this format,
12:42p.m. and 1:01a.m. are valid times. You can validate this format
using the following regular expression:

if (!eregi("^(1[0-2]|0[1-9]):([0-5][0-9])(am|pm)$", $time))
print "Time must be a valid 12-hour clock time in the format 
HH:MMam or HH:MMpm.";

The first part of the expression ^(1[0-2]|0[1-9])
requires that the time begins with a number in range 10 to 12, or 01
to 09. After the colon, the second part of the expression requires
the minute value to be in the range 00 to 59 as specified by the
expression ([0-5][0-9]). Either AM or PM (in
either upper- or lowercase) must then follow to conclude the time
string.

For 24-hour times, a simple variant works:

if (!eregi("^([0-1][0-9]|2[0-3])([0-5][0-9])$", $time))
print "Time must be a valid 24-hour clock time in the format HHMM.";

Working out differences between times is reasonably straightforward,
after the time has been parsed into its components! For example, to
check if a 12-hour clock arrival time is before a 12-hour clock
departure time, use the following fragment:

// Explode departure time into the array $depBits
if (!eregi("^(1[0-2]|[1-9]):([0-5][0-9])(am|pm)$", $depTime, $depBits))
print "Departure time must be a valid 12-hour clock time
in the format HH:MMam or HH:MMpm.";
// Explode arrival time into the array $arrBits
if (!eregi("^(1[0-2]|[1-9]):([0-5][0-9])(am|pm)$", $arrTime, $arrBits))
print "Arrival time must be a valid 12-hour clock time
in the format HH:MMam or HH:MMpm.";
if (($depBits[3] == "pm" && $arrBits[3] == "am")) ||
($depBits[1] > $arrBits[1] && $depBits[3] == $arrBits[3]) ||
($depBits[2] >= $arrBits[2] && $depBits[1] == $arrBits[1] 
&& $depBits[3] == $arrBits[3]))
print "Arrival time must be after departure time.";

The two ereg( ) expressions validate the format
of a time using the approach we described previously. Similarly to
our date validation, both expressions also explode the times into the
arrays $arrBits and $depBits.
The arrays contain the hour as elements
$arrBits[1] and $depBits[1],
the minutes as $arrBits[2] and
$depBits[2], and the AM or PM suffix as
$arrBits[3] and $depBits[3].

To determine if the arrival time is earlier than the departure time,
there are three tests: first, if the arrival time is AM, the
departure time can't be PM; second, if both times
are AM or both times are PM the arrival hour can't
be earlier than the departure hour; and, last, if both times are AM
or both times are PM, and the departure hour is the arrival hour the
arrival minutes can't be less than or equal to the
departure minutes. With 24-hour times, only one test is needed; this
is perhaps a good reason to use them in preference to 12-hour times
in your applications.

For this type of validation, you could also convert a time to an
integer value and then compare values. For example, you could convert
two times to Unix timestamps and then compare these to determine if
the arrival time is earlier than the departure time. However, as
discussed in the previous section, the PHP date and time functions
don't behave the same on all platforms, and so this
approach isn't always portable between operating
systems. For this reason, using logic as in our previous example or
using a reliable package, such as the PEAR Date package discussed in
the next section, is preferable.

9.2.3.3 Using the PEAR Date package

The PEAR Date package introduced in
Chapter 7 is not limited in year ranges and
provides a wide range of date validation and manipulation tools. It
must be installed using the PEAR installer (as discussed in Chapter 7) and then the date calculation package must
be included into the source code using:

require_once "Date/Calc.php";

An object can then be created using:

$date = new Date_Calc( );

Using the PEAR Date package, we can rewrite our date of birth
checking in Example 9-4. Our third date of birth
check can be rewritten to use the method isValidDate(
)
as follows:

elseif (!$date->isValidDate($parts[1], $parts[2], $parts[3]))
{
print "The date of birth is invalid. Please check that the month 
is between 1 and 12, and the day is valid for that month.";
return false;
}

The fourth check can be modified slightly to use the
isFutureDate( )
method to check if the user has
been born:

elseif (intval($parts[3]) < 1902 || 
$date->isFutureDate($parts[1], $parts[2], $parts[3]))
{
print "You must be alive to use this service.";
return false;
}

The fifth check can make use of the compareDates(
)
method to avoid the use of
strtotime( ) and mktime( )
and solve the year limitation problem. The method compares two dates
each specified as a day, month, and year. In our check, we test the
difference between the date of birth and eighteen years earlier than
today:

else
{
// Check whether the user is 18 years old.
if ($date->compareDates($parts[1], $parts[2], $parts[3],
intval(date("d")), intval(date("m")), intval(date("Y"))-18) > 0)
{
print "You must be 18+ years of age to use this service.";
return false;
}

The compareDates( ) method returns 0 if the two
dates are equal, -1 if the first date is less than the second, and 1
if the first date is greater than the second.

We've used three of the methods from the PEAR Date
package. The package also has useful methods for determining if a
year is a leap year, discovering the date of the beginning or end of
the previous or next month, finding the date of the beginning or end
of the previous or next week, finding the previous or next day or
weekday, returning the number of days or weeks in a month, finding
out the day of the week, converting dates to days, and returning
formatted date strings.

Like many other PEAR packages, this one contains almost no
documentation or examples. However, the methods are readable code and
easy to use, and most are simple and reliable applications of the
date functions that are discussed in Chapter 3.
If you followed our PHP installation instructions in Appendix A through Appendix C and our
PEAR installation instructions in Chapter 7,
you'll find Date.php in
/usr/local/lib/php/. The Date package also
includes code in the file TimeZone.php for
working with and finding the date and time in different time zones.
If you're working with dates, PEAR Date is worth
investigation and avoids most of the limitations of the PHP library
functions.

9.2.3.4 Logic, the date function, and MySQL

There are other approaches to working with dates that
don't use PEAR Date or Unix timestamps. Logic and
the date( )
function can be combined to check and
compare days, months, and years, similarly to our approach to testing
times. For example, to check if a user is over 18, you can use this
fragment after exploding the date into the array
$parts:

// Were they born more than 19 years ago?
if (!((intval($parts[3]) < (intval(date("Y") - 19))) ||
// No, so were they born exactly 18 years ago, and
// has the month they were born in passed?
(intval($parts[3]) == (intval(date("Y")) - 18) &&
(intval($parts[2]) < intval(date("m")))) ||
// No, so were they born exactly 18 years ago in this
// month, and was the day today or earlier in the month?
(intval($parts[3]) == (intval(date("Y")) - 18) &&
(intval($parts[2]) ==  intval(date("m"))) &&
(intval($parts[1]) <= intval(date("d"))))))
print "You must be 18+ years of age to use this service.";

You can also use the MySQL functions described in Chapter 15 through an SQL query as a simple
calculator. However, the MySQL approach, which involves communication
with the database, adds a lot more overhead and therefore is often
less desirable than using PHP. However, if one or more dates are
extracted from a database, MySQL date and time functions are a useful
alternative for pre-processing prior to working with dates in
PHP.