Mastering Regular Expressions (2nd Edition) [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Regular Expressions (2nd Edition) [Electronic resources] - نسخه متنی

Jeffrey E. F. Friedl

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










1.3 The Regular-Expression Frame of Mind


As we'll soon see, complete regular expressions are built up from small buildingblock
units. Each individual building block is quite simple, but since they can be
combined in an infinite number of ways, knowing how to combine them to
achieve a particular goal takes some experience. So, this chapter provides a quick
overview of some regular-expression concepts. It doesn't go into much depth, but
provides a basis for the rest of this book to build on, and sets the stage for important
side issues that are best discussed before we delve too deeply into the regular
expressions themselves.

While some examples may seem silly (because some are silly), they represent the
kind of tasks that you will want to do you just might not realize it yet. If each
point doesn't seem to make sense, don't worry too much. Just let the gist of the
lessons sink in. That's the goal of this chapter.


1.3.1 If You Have Some Regular-Expression Experience


If you're already familiar with regular expressions, much of this overview will not
be new, but please be sure to at least glance over it anyway. Although you may be
aware of the basic meaning of certain metacharacters, perhaps some of the ways
of thinking about and looking at regular expressions will be new.

Just as there is a difference between playing a musical piece well and making
music, there is a difference between knowing about regular expressions and really
understanding them. Some of the lessons present the same information that you
are already familiar with, but in ways that may be new and which are the first
steps to really understanding.


1.3.2 Searching Text Files: Egrep


Finding text is one of the simplest uses of regular expressionsmany text editors
and word processors allow you to search a document using a regular-expression
pattern. Even simpler is the utility egrep. Give egrep a regular expression and some
files to search, and it attempts to match the regular expression to each line of each
file, displaying only those lines in which a match is found. egrep is freely available for many systems, including DOS, MacOS, Windows, Unix, and so on. See this
book's web site, regex.info, for links on how to obtain a copy of egrep
for your system.

Returning to the email example from Section 1.1, the command I actually used to generate
a makeshift table of contents from the email file is shown in Figure 1-1. egrep
interprets the first command-line argument as a regular expression, and any
remaining arguments as the file(s) to search. Note, however, that the single quotes
shown in Figure 1-1 are not part of the regular expression, but are needed by my
command shell.
[3]
When using egrep, I usually wrap the regular expression with single
quotes. Exactly which characters are special, in what contexts, to whom (to the
regular-expression, or to the tool), and in what order they are interpreted are all
issues that grow in importance when you move to regular-expression use in full-
fledged programming languagessomething we'll see starting in the next chapter.

[3] The command shell is the part of the system that accepts your typed commands and actually executes
the programs you request. With the shell I use, the single quotes serve to group the command
argument, telling the shell not to pay too much attention to what's inside. If I didn't use them, the
shell might think, for example, a '*' that I intended to be part of the regular expression was really
part of a filename pattern that it should interpret. I don't want that to happen, so I use the quotes to
"hide" the metacharacters from the shell. Windows users of COMMAND.COM or CMD.EXE should probably
use double quotes instead.



Figure 1. Invoking egrep from the command line




We'll start to analyze just what the various parts of the regex mean in a moment,
but you can probably already guess just by looking that some of the characters
have special meanings. In this case, the parentheses, the
^
, and the
|
characters
are regular-expression metacharacters, and combine with the other characters to
generate the result I want.

On the other hand, if your regular expression doesn't use any of the dozen or so
metacharacters that egrep understands, it effectively becomes a simple "plain text"
search. For example, searching for
cat
in a file finds and displays all lines with
the three letters c·a·t in a row. This includes, for example, any line containing
vacation.

Even though the line might not have the word
cat, the c·a·t sequence in
vacation is still enough to be matched. Since it's there, egrep goes ahead and displays
the whole line. The key point is that regular-expression searching is not
done on a "word" basisegrep can understand the concept of bytes and lines in a
file, but it generally has no idea of English's (or any other language's) words, sentences,
paragraphs, or other high-level concepts.


/ 83