Java in a Nutshell, 5th Edition [Electronic resources] نسخه متنی

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Pattern

java.util.regex

Java 1.4

serializable

This
class represents a regular expression. It
has no public constructor: obtain a Pattern by
calling one of the static compile( ) methods,
passing the string representation of the regular expression, and an
optional bitmask of flags that modify the behavior of the regex.
pattern( ) and flags( ) return
the string form of the regular expression and the bitmask that were
passed to compile( ).

If you want to perform only a single match operation with a regular
expression, and don't need to use any of the flags,
you don't have to create a
Pattern object: simply pass the string
representation of the pattern and the CharSequence
to be matched to the static matches( ) method: the
method returns TRue if the specified pattern
matches the complete specified text, or returns
false otherwise.

Pattern represents a regular expression, but does
not actually define any primitive methods for matching regular
expressions to text. To do that, you must create a
Matcher object that encapsulates a pattern and the
text it is to be compared with. Do this by calling the
matcher( ) method and specifying the
CharSequence you want to match against. See
Matcher for a description of what you can do with
it.

The split( ) methods are the exception to the rule
that you must obtain a Matcher in order to be able
to do anything with a Pattern (although they
create and use a Matcher internally). They take a
CharSequence as input, and split it into
substrings, using text that matches the regular expression as the
delimiter, returning the substrings as a String[
]. The two-argument version of split( )
takes an integer argument that specifies the maximum number of
substrings to break the input into.

Pattern defines the following flags that control various aspects of
how regular expression matching is performed. The flags are the
following:

CANON_EQ

The Unicode standard sometimes allows more than one way to specify
the same character. If this flag is set, characters are compared by
comparing their full canonical decompositions, so that characters
will match even if expressed in different ways. Enabling this flag
typically slows down performance. Unlike all the other flags, there
is no way to temporarily enable this flag within a pattern.

CASE_INSENSITIVE

Match letters without regard to case. By default this flag only
affects the comparisons of ASCII letters. Also set the
UNICODE_CASE flag if you want to ignore the case
of all Unicode characters. You can enable this flag within a pattern
with (?i).

COMMENTS

If this flag is set, then whitespace and comments within a pattern
are ignored. Comments are all characters between a
# and end of line. You can enable this flag within
a pattern with (?x)

DOTALL

If this flag is set, then the . expression matches
any character. If it is not set, then it does not match line
terminator characters. This is also known as
"single-line mode" and you can
enable it within a pattern with (?s).

MULTILINE

If this flag is set, then the ^ and
$ anchors match not only at the beginning and end
of the input string, but also at the beginning and end of any lines
within that string. Within a pattern you can enable this flag with
(?m).

UNICODE_CASE

If this flag is set along with the
CASE_INSENSITIVE flag, then case-insensitive
comparison is done for all Unicode letters, rather than just for
ASCII letters. You can enable both flags within a pattern with
(?iu).

UNIX_LINES

If this flag is set, then only the newline character is considered a
line terminator for the purposes of .,
^, and $. If the flag is not
set, then newlines (\n) carriage returns
(\r) and carriage return newline sequences
(\r\n) are all considered line terminators, as are
the Unicode characters \u0085
("next line")
\u2028 ("line
separator") and \u2029
("paragraph separator"). You can
turn this flag on within a pattern with (?d).

Although the API for the Pattern class is quite
simple, the syntax for the text representation of regular expressions
is fairly complex. A complete tutorial on regular expressions is
beyond the scope of this book. The table below, is a quick-reference
for regular expression syntax. It is very similar to the syntax used
in Perl. Note that many of the syntax elements of a regular
expression include a backslash character, such as
\d to match one of the digits 0-9. Because Java
strings also use the backslash character as an escape, you must
double the backslashes when expressing a regular expression as a
string literal: "\\d". In Java 5.0,
the static quote( ) method quotes all special
characters in a string so that you can match arbitrary text literally
without worrying that punctuation in that text will be interpreted
specially. For complete details on regular expressions see a book
like

Programming Perl by Larry Wall et. al., or

Mastering Regular Expressions by Jeffrey E. F.
Friedl.

Table 16-3. Java regular expression quick reference

Syntax

Matches

Single characters

The character x, as long as
x is not a punctuation character with
special meaning in the regular expression syntax.

The punctuation character p.

The backslash character

Newline character \u000A.

Tab character \u0009.

Carriage return character \u000D.

Form feed character \u000C.

Escape character \u001B.

Bell (alert) character \u0007.

\uxxxx

Unicode character with hexadecimal code
xxxx.

\xxx

Character with hexadecimal code xx.

\0n

Character with octal code n.

\0nn

Character with octal code nn.

\0nnn

Character with octal code nnn, where
nnn <= 377.

\cx

The control character
^x.

Character classes

[...]

One of the characters between the brackets. Characters may be
specified literally, and the syntax also allows the specification of
character ranges, with intersection, union, and subtraction
operators. See specific examples below.

[^...]

Any one character not between the brackets.

[a-z0-9]

Character range: a character between (inclusive) a
and z or 0 and
9.

[0-9[a-fA-F]]

Union of classes: same as [0-9a-fA-F]

[a-z&&[aeiou]]

Intersection of classes: same as [aeiou].

[a-z&&[^aeiou]]

Subtraction: the characters a through
z except for the vowels.

Any character except a line terminator. If the
DOTALL flag is set, then it matches any character
including line terminators.

ASCII digit: [0-9].

Anything but an ASCII digit: [^\d].

ASCII whitespace: [ \t\n\f\r\x0B]

Anything but ASCII whitespace: [^\s].

ASCII word character: [a-zA-Z0-9_].

Anything but ASCII word characters: [^\w].

\p{group}

Any character in the named group. See group names below. Many of the
group names are from POSIX, which is why p is used for this character
class.

\P{group}

Any character not in the named group.

\p{Lower}

ASCII lowercase letter: [a-z].

\p{Upper}

ASCII uppercase: [A-Z].

\p{ASCII}

Any ASCII character: [\x00-\x7f].

\p{Alpha}

ASCII letter: [a-zA-Z].

\p{Digit}

ASCII digit: [0-9].

\p{XDigit}

Hexadecimal digit: [0-9a-fA-F].

\p{Alnum}

ASCII letter or digit: [\p{Alpha}\p{Digit}].

\p{Punct}

ASCII punctuation: one of !"#$%& (
)*+,-./:;<=>?@[\]^_ {|}~].

\p{Graph}

visible ASCII character: [\p{Alnum}\p{Punct}].

\p{Print}

visible ASCII character: same as \p{Graph}.

\p{Blank}

ASCII space or tab: [ \t].

\p{Space}

ASCII whitespace: [ \t\n\f\r\x0b].

\p{Cntrl}

ASCII control character: [\x00-\x1f\x7f].

\p{category}

Any character in the named Unicode category. Category names are one
or two letter codes defined by the Unicode standard. One letter codes
include L for letter, N for
number, S for symbol, Z for
separator, and P for punctuation. Two letter codes
represent subcategories, such as Lu for uppercase
letter, Nd for decimal digit,
Sc for currency symbol, Sm for
math symbol, and Zs for space separator. See
java.lang.Character for a set of constants that
correspond to these subcategories; however, note that the full set of
one- and two-letter codes is not documented in this book.

\p{block}

Any character in the named Unicode block. In Java regular
expressions, block names begin with
"In", followed by mixed-case
capitalization of the Unicode block name, without spaces or
underscores. For example: \p{InOgham} or
\p{InMathematicalOperators}. See
java.lang.Character.UnicodeBlock for a list of
Unicode block names.

Sequences, alternatives, groups, and references

Match x followed by
y.

x|y

Match x or y.

(...)

Grouping. Group subexpression within parentheses into a single unit
that can be used with *, +,
?, |, and so on. Also
"capture" the characters that match
this group for use later.

(?:...)

Grouping only. Group subexpression as with ( ),
but do not capture the text that matched.

Match the same characters that were matched when capturing group
number n was first matched. Be careful
when n is followed by another digit: the
largest number that is a valid group number will be used.

Repetition[1]

zero or one occurrence of x; i.e.,
x is optional.

zero or more occurrences of x.

one or more occurrences of x.

x{n}

exactly n occurrences of
x.

x{n,}

n or more occurrences of
x.

x{n,m}

at least n, and at most
m occurrences of
x.

Anchors[2]

The beginning of the input string, or if the
MULTILINE flag is specified, the beginning of the
string or of any new line.

The end of the input string, or if the MULTILINE
flag is specified, the end of the string or of line within the
string.

A word boundary: a position in the string between a word and a
nonword character.

A position in the string that is not a word boundary.

The beginning of the input string. Like ^, but
never matches the beginning of a new line, regardless of what flags
are set.

The end of the input string, ignoring any trailing line terminator.

The end of the input string, including any line terminator.

The end of the previous match.

(?=x)

A positive look-ahead assertion. Require that the following
characters match x, but do not include
those characters in the match.

(?!x)

A negative look-ahead assertion. Require that the following
characters do not match the pattern x.

(?<=x)

A positive look-behind assertion. Require that the characters
immediately before the position match x,
but do not include those characters in the match.
x must be a pattern with a fixed number of
characters.

(?<!x)

A negative look-behind assertion. Require that the characters
immediately before the position do not match
x. x must be a
pattern with a fixed number of characters.

Miscellaneous

(?>x)

Match x independently of the rest of the
expression, without considering whether the match causes the rest of
the expression to fail to match. Useful to optimize certain complex
regular expressions. A group of this form does not capture the
matched text.

(?onflags-offflags)

Don t match anything, but turn on the flags specified by
onflags, and turn off the flags specified
by offflags. These two strings are
combinations in any order of the following letters and correspond to
the following Pattern constants:
i (CASE_INSENSITIVE),
d (UNIX_LINES),
m (MULTILINE),
s (DOTALL),
u (UNICODE_CASE), and
x (COMMENTS). Flag settings
specified in this way take effect at the point that they appear in
the expression and persist until the end of the expression, or until
the end of the parenthesized group of which they are a part, or until
overridden by another flag setting expression.

(?onflags-offflags:x)

Match x, applying the specified flags to
this subexpression only. This is a noncapturing group, like
(?:...), with the addition of flags.

Don't match anything, but quote all subsequent
pattern text until \E. All characters within such
a quoted section are interpreted as literal characters to match, and
none (except \E) have special meanings.

Don't match anything; terminate a quote started with
\Q.

#comment

If the COMMENT flag is set, pattern text between a
# and the end of the line is considered a comment
and is ignored.

[1] These repetition characters are
known as "greedy quantifiers,"
because they match as many occurrences of x as possible while still
allowing the rest of the regular expression to match. If you want a
"reluctant quantifier" which
matches as few occurrences as possible while still allowing the rest
of the regular expression to match, follow the quantifiers above with
a question mark. For example, use *? instead of *, and use {2,}?
instead of {2,}. Or, if you follow a quantifier with a plus sign
instead of a question mark, then you specify a
"possessive quantifier" which
matches as many occurrences as possible, even if it means that the
rest of the regular expression will not match. Possessive quantifiers
can be useful when you are sure that they will not adversely affect
the rest of the match, because they can be implemented more
efficiently than regular "greedy
quantifiers."

[2] Anchors do not match characters but
instead match the zero-width positions between characters,
"anchoring" the match to a position
at which a specific condition holds.

Figure 16-132. java.util.regex.Pattern

public final class

Pattern implements Serializable {
// No Constructor
// Public Constants
public static final int

CANON_EQ ; =128
public static final int

CASE_INSENSITIVE ; =2
public static final int

COMMENTS ; =4
public static final int

DOTALL ; =32

5.0 public static final int

LITERAL ; =16
public static final int

MULTILINE ; =8
public static final int

UNICODE_CASE ; =64
public static final int

UNIX_LINES ; =1
// Public Class Methods
public static Pattern

compile (String

regex );
public static Pattern

compile (String

regex , int

flags );
public static boolean

matches (String

regex , CharSequence

input );

5.0 public static String

quote (String

s );
// Public Instance Methods
public int

flags ( );
public Matcher

matcher (CharSequence

input );
public String

pattern ( );
public String[ ]

split (CharSequence

input );
public String[ ]

split (CharSequence

input , int

limit );
// Public Methods Overriding Object

5.0 public String

toString ( );
}

Passed To

java.util.Scanner.{findInLine( ),
findWithinHorizon( ), hasNext(
), next( ), skip( ),
useDelimiter( )}, Matcher.usePattern(
)

Returned By

java.util.Scanner.delimiter( ),
Matcher.pattern( )

Java in a Nutshell, 5th Edition [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی