Mastering Regular Expressions (2nd Edition) [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Regular Expressions (2nd Edition) [Electronic resources] - نسخه متنی

Jeffrey E. F. Friedl

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










Chapter 6. Crafting an Efficient Expression


With the regex-directed nature of an NFA engine, as is found in Perl, Java packages,
the .NET languages, Python, and PHP (just to name a few; see the table on
Section 4.1.3 for more), subtle changes in an expression can have major effects on
what or how it matches. Issues that don't matter with a DFA engine become
paramount. The fine control an NFA engine affords allows you to really craft an
expression, although it can sometimes be a source of confusion to the unaware.
This chapter helps you learn this art.

At stake are both correctness and efficiency: matching just what you want and no
more, and doing it quickly. Chapter 4 and 5 examined correctness; here we'll
look at the efficiency-r elated issues of NFA engines, and how to make them work
to our advantage. (DFA-related issues are mentioned when appropriate, but this
chapter is primarily concerned with NFA-based engines.) In a nutshell, the key is to
understand the full implications of backtracking, and to learn techniques to avoid
it where possible. Armed with the detailed understanding of the processing
mechanics, not only will you maximize the speed of matches, you will also be
able to write more complex expressions with confidence.

In This Chapter To arm you well, this chapter first illustrates just how important
these issues can be, then prepar es you for some of the more advanced techniques
pr esented later by reviewing the basic backtracking described in the previous
chapters with a strong emphasis on efficiency and backtracking's global ramifications.
Then we'll look at some of the common internal optimizations that can have
a fairly substantial impact on efficiency, and on how expressions are best written
for implementations that employ them. Finally, I bring it all together with some
killer techniques to construct lightning-fast NFA regexes.

Tests and Backtracks

The examples we'll see here illustrate common situations you might meet when
using regular expressions. When examining a particular example's efficiency, I'll
sometimes report the number of individual tests that the regex engine does during
the course of a match. For example, in matching
marty
against smarty, there are
six individual tests the initial attempt of
m
against s (which fails), then the
matching of
m
against m,
a
against a, and so on. I also often report the number of
backtracks (zero in this example, although the implicit backtrack by the regex
engine's transmission to retry the regex at the second character position could be
counted as one).

I use these exact numbers not because the precision is important, but rather to be
mor e concr ete than words such as "lots," "few," "many," "better," "not too much,"
and so forth. I don't want to imply that using regular expressions with an NFA is an
exercise in counting tests or backtracks; I just want to acquaint you with the relative
qualities of the examples.

Another important thing to realize is that these "precise" numbers probably differ
fr om tool to tool. It's the basic relative perfor mance of the examples that I hope
will stay with you. One important variation among tools is the optimizations they
might employ. A smart enough implementation completely bypasses the application
of a particular regex if it can decide beforehand that the target string cannot
possibly match (in cases, for instance, when the string lacks a particular character
that the engine knows beforehand must be there for any match to be successful). I
discuss these important optimizations in this chapter, but the overall lessons are
generally more important than the specific special cases.

Traditional NFA versus POSIX NFA

It's important to keep in mind the target tool's engine type, Traditional NFA or
POSIX NFA, when analyzing efficiency. As we'll see in the next section, some concer
ns matter to one but not the other. Sometimes a change that has no effect on
one has a great effect on the other. Again, understanding the basics allows you to
judge each situation as it arises.


/ 83