Build Your Own DatabaseDriven Website Using PHP amp;amp; MySQL [Electronic resources]

Kevin Yank

نسخه متنی -صفحه : 190/ 62
نمايش فراداده

Regular Expressions

To implement our own markup language, we'll require a script to spot our custom tags in the text of jokes and replace them with their HTML equivalents, before it outputs the joke text to the user's browser. Anyone with experience in regular expressions will know that they're ideal for this sort of work.

A regular expression is a string of text that contains special codes, which allow it to be used with a few PHP functions to search and manipulate other strings of text. This, for example, is a regular expression that searches for the text "PHP" (without the quotes)[1]:

PHP

Not much to it, is there? To use a regular expression, you must be familiar with the regular expression functions available in PHP. ereg is the most basic, and can be used to determine whether a regular expression is satisfied by a particular text string. Consider the code:

$text = 'PHP rules!';
if (ereg('PHP', $text)) {
echo( '$text contains the string "PHP".' );
} else {
echo( '$text does not contain the string "PHP".' );
}

In this example, the regular expression is satisfied because the string stored in variable $text contains "PHP". The above code will thus output the following (note that the single quotes prevent PHP from filling in the value of the variable $text):

$text contains the string "PHP".

eregi is a function that behaves almost identically to ereg, except that it ignores the case of text when it looks for matches:

$text = "What is Php?";
if (eregi("PHP", $text)) {
echo( '$text contains the string "PHP".' );
} else {
echo( '$text does not contain the string "PHP".' );
}

Again, this outputs the same message, despite the fact that the string actually contains Php:

$text contains the string "PHP".

As was mentioned above, there are special codes that may be used in regular expressions. Some of these can be downright confusing and difficult to remember, so if you intend to make extensive use of them, a good reference might come in handy. A tutorial-style reference to standard regular expression syntax may be found at http://www.delorie.com/gnu/docs/rx/rx_1. Let's work our way through a few examples to learn the basic regular expression syntax.

First of all, a caret (^) may be used to indicate the start of the string, while a dollar sign ($) is used to indicate its end:

PHP       Matches "PHP rules!" and "What is PHP?"
^PHP      Matches "PHP rules!" but not "What is PHP?"
PHP$      Matches "I love PHP" but not "What is PHP?"
^PHP$     Matches "PHP" but nothing else

Obviously, you may sometimes want to use ^, $, or other special characters to represent the corresponding character in the search string, rather than the special meaning implied by regular expression syntax. To remove the special meaning of a character, prefix it with a backslash:

\$\$\$      Matches "Show me the $$$!" but not "$10"

Square brackets can be used to define a set of characters that may match. For example, the following regular expression will match any string that contains a digit from 1 to 5 inclusive:

[12345]     Matches "1" and "39", but not "a" or "76"

Ranges of numbers and letters may also be specified.

[1-5]       Same as previous
^[a-z]$     Matches any single lowercase letter
[0-9a-zA-Z] Matches any string with a letter or number

The characters ?, +, and * also have special meanings. Specifically, ? means "the preceding character is optional", + means "one or more of the previous character", and * means "zero or more of the previous character".

bana?na     Matches "banana" and "banna",
           but not "banaana".
bana+na     Matches "banana" and "banaana",
           but not "banna".
bana*na     Matches "banna", "banana", and "banaaana",
           but not "bnana".
^[a-zA-Z]+$ Matches any string of one or more
           letters and nothing else.

Parentheses may be used to group strings together to apply ?, +, or * to them as a whole.

ba(na)+na   Matches "banana" and "banananana",
           but not "bana" or "banaana".

And finally, the a period (.) matches any character except a new line:

^.+$        Matches any string of one or more characters with no line breaks.

There are more special codes and syntax tricks for regular expressions, all of which should be covered in any reference, such as those mentioned above. For now, we have more than enough for our purposes.

[1]This book covers PHP's support for POSIX Regular Expressions. A more complex, more powerful, but less standardized form of regular expressions called Perl Compatible Regular Expressions (PCRE) is also supported by PHP; however, I will not cover it in this book. For more information on PCRE, see http://www.php.net/pcre.