Mastering Regular Expressions (2nd Edition) [Electronic resources] نسخه متنی

8.2 Object Models

When looking at different regex packages in Java (or in any object-oriented language,
for that matter), it's amazing to see how many different object models are
used to achieve essentially the same result. An object model is the set of class
structures through which regex functionality is provided, and can be as simple as
one object of one class that's used for everything, or as complex as having separate
classes and objects for each sub-step along the way. There is not an object
model that stands out as the clear, obvious choice for every situation, so a lot of
variety has evolved.

8.2.1 A Few Abstract Object Models

Stepping back a bit now to think about object models helps prepare you to more
readily grasp an unfamiliar package's model. This section presents several representative
object models to give you a feel for the possibilities without getting
mired in the details of an actual implementation.

Starting with the most abstract view, here are some tasks that need to be done in
using a regular expression:

Setup. . .

[1]Accept a string as
a regex; compile to an internal form.

[2]Associate the regex with the
target text.

Actually apply the regex . . .

[3]Initiate a match attempt.

See the results . . .

[4]Learn whether the match is
successful.

[5]Gain access to further
details of a successful attempt.

[6]Query those details (what matched, where it matched, etc.).

These are the steps for just one match attempt; you might repeat them from [3] to
find the next match in the target string.

Now, let's look at a few potential object models from among the infinite variety
that one might conjure up. In doing so, we'll look at how they deal with matching\s+(\d+)
to the string 'May•16,•1998' to find out that '• 16' is matched overall, and '16' matched within the first set of parentheses (within "group one"). Remember, the goal here is to merely get a general feel for some of the issues at hand
we'll see specifics soon.

8.2.1.1 An "all-in-one" model

In this conceptual model, each regular expression becomes an object that you
then use for everything. It's shown visually in Figure 8-1 below, and in pseudocode here, as it processes all matches in a string:


DoEverythingObj myRegex = new DoEverythingObj("\\s+(\\d+)"); // [1]
. 
.
.
while (myRegex.findMatch("May 16, 1998")) { // [2], [3], [4]
String matched = myRegex.getMatchedText(); // [6]
String num = myRegex.group(1); // [6]
.
.
.
}

As with most models in practice, the compilation of the regex is a separate step,
so it can be done ahead of time (perhaps at program startup), and used later, at
which point most of the steps are combined together, or are implicit. A twist on
this might be to clone the object after a match, in case the results need to be saved
for a while.

Figure 1. An "all-in-one" model

8.2.1.2 A "match state" model

This conceptual model uses two objects, a "Pattern" and a "Matcher." The Pattern
object represents a compiled regular expression, while the Matcher object has all
of the state associated with applying a Pattern object to a particular string. It's
shown visually in Figure 8-2 below, and its use might be described as: "Convert a regex string to a Pattern object. Give a target string to the Pattern object to get a
Matcher object that combines the two. Then, instruct the Matcher to find a match,
and query the Matcher about the result." Here it is in pseudo-code:


PatternObj myPattern = new PatternObj("\\s+(\\d+)"); // [1]
.
.
.
MatcherObj myMatcher = myPattern.MakeMatcherObj("May 16, 1998"); // [2]
while (myMatcher.findMatch()) { // [3], [4]
String matched = myMatcher.getMatchedText(); // [6]
String num     = myMatcher.Group(1); // [6]
.
.
.
}

This might be considered conceptually cleaner, since the compiled regex is in an
immutable (unchangeable) object, and all state is in a separate object. However,
It's not necessarily clear that the conceptual cleanliness translates to any practical
benefit. One twist on this is to allow the Matcher to be reset with a new target
string, to avoid having to make a new Matcher with each string checked.

Figure 2. A "match state" model

8.2.1.3 A "match result" model

This conceptual model is similar to the "all-in-one" model, except that the result of
a match attempt is not a Boolean, but rather a Result object, which you can then
query for the specifics on the match. It's shown visually in Figure 8-3 below, and might be described as: "Convert a regex string to a Pattern object. Give it a target
string and receive a Result object upon success. You can then query the Result
object for specific." Here's one way it might be expressed it in pseudo-code:


PatternObj myPattern = new PatternObj("\\s+(\\d+)"); // [1]
.
.
.
ResultObj myResult = myPattern.findFirst("May 16, 1998"); // [2], [3], [5]
while (myResult.wasSuccessful()) { // [4]
String matched = myResult.getMatchedText(); // [6]
String num = myResult.Group(1); // [6]
.
.
.
myResult = myPattern.findNext(); [3], [5]
}

This compartmentalizes the results of a match, which might be convenient at
times, but results in extra overhead when only a simple true/false result is desired.
One twist on this is to have the Pattern object return null upon failure, to save
the overhead of creating a Result object that just says "no match."

Figure 3. A "match result" model

8.2.2 Growing Complexity

These conceptual models are just the tip of the iceberg, but give you a feel for
some of the differences you'll run into. They cover only simple matches when
you bring in search-and-replace, or perhaps string splitting (splitting a string into
substrings separated by matches of a regex), it can become much more complex.

Thinking about search-and-replace, for example, the first thought may well be that
it's a fairly simple task, and indeed, a simple "replace this with that" interface is
easy to design. But what if the "that" needs to depend on what's matched by the
"this," as we did many times in examples in Chapter 2 (see Section 2.3.6). Or what if you need
to execute code upon every match, using the resulting text as the replacement?
These, and other practical needs, quickly complicate things, which further
increases the variety among the packages.

Mastering Regular Expressions (2nd Edition) [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی