Hack 19. Search for Patterns in Your Files

make everyday searches and Find and Replace operations more
powerful.Regular
expressions are an
extremely
versatile text-matching language that gives you incredible power when
searching your documents and when used with replace operations, can
greatly assist with repetitive changes to blocks of code.
3.5.1. Searching
The basic regular
expression search is easily done. You simply open the Find dialog
through the Edit
with Ctrl-F (Edit.Find). Enable regular expression searching by
ensuring the Use checkbox is selected and the drop-down list has
Regular Expressions selected, as shown in Figure 3-12.
Figure 3-12. Visual Studio .NET 2003 Find dialog with Regular Expressions enabled

enable regular expressions searching, you will need to expand the
Find Options section of the dialog. Figure 3-13
shows the new Find dialog with the Options section expanded.
Figure 3-13. Visual Studio 2005 Find and Replace dialog with Regular Expressions enabled

on Find Next. The next match of your expression will be found in the
document. As with normal matches, clicking Find Next again will find
the next match. The next step is to learn the regular expression
(also known as regex or
regexp) syntax. Searching will never be the
same.
3.5.2. Basic Expressions
Regular expressions can be very
complex, but basic expressions can be easy to master. Unlike normal
searches, regular expressions designate a pattern of characters to
match instead of a constant string. For example, square brackets in a
regular expression define a set of characters (a character
class ). When you execute the search, it
will match any one character out of the set of characters inside the
brackets, so the expression [abcd] would match
a, b,
c, and dbut not
z. You can also specify character ranges inside
the brackets, so [a-d] is equivalent to the
expression [abcd]. If you need to specify more
than one range, simply add it to the first, so
[a-z0-9] will match any letter or number. Regular
expression characters in Visual Studio are not
case
sensitive unless you select the Match Case option in the Find dialog.
This is a departure from most other regular expression syntax.Normal alphabetic characters outside of
special expressions match characters literally, similar to a normal
Find, but can be combined with regular expressions to make them more
flexible. This means that combining the set match with a literal
match gives us a pattern such as var[12], which
will match var1 and var2 but
not var3.
|
3.5.2.1 Matching quantities
Matching a single character isn't particularly
useful, so you want to be able to specify a
quantity. You can either specify an
exact number of times with the pattern
^ n, where
n is the number of matches you are
seeking, or use *, which will match 0 or more
repetitions of the pattern. Thus, [abcd]* would
match ababcd, a, or an
empty string, but not abzd. If you want to be
sure that there is at least one match, you can use the
+ character, which matches one or more repetitions
of the pattern.
3.5.2.2 Preventing matches
Sometimes preventing a match is the desired
behavior. The pattern bool~(ean) uses the
~ operator to match only bool
where it is not followed by ean (the parentheses
group the ean so that the ~
acts on it as a group). It is also possible to specify a set of
characters you do not want matched. In this case, you simply specify
the character set with a ^ before it. Thus, the
expression [^abc] will match any single character
except for those specified.
3.5.3. Basic Replacements
After
you have a search pattern, replacing
it with a constant string is the same as doing a normal Find/Replace,
but the real power of regular expressions comes in the ability to use
the string that was matched in the replacement. In order to do this,
you need to select the portion of the match expression you wish to
reuse by tagging it inside of curly braces.
For example, if you create a Find expression such as
var{[12]}, this will put the 1 or 2 into the first
tagged buffer.In the Replace expression, you can access these buffers with a
backslash. Thus, if you create a Replace expression of
indicator\1, you will change any text reading
"var1" to
"indicator1". If you had a list of
variables like this:
var1they would be converted into:
var2
var3
indicator1You may, of course, create as many tagged expressions as you wish.
indicator2
indicator3
You can also access the entire string that the regular expression
matched with \0.
3.5.4. Advanced Patterns
Basic expressions can take you a long way, but there are a number of
more advanced operators that you can use with regular expressions to
perform even more finely targeted searches.
3.5.4.1 Shortcuts
Several shortcuts for various
character sets are built into the regular
expression framework so that you do not have to constantly specify
sets for commonly used groups. For example, :c
matches one alphabetic characterit is the same as specifying
[a-zA-Z]. Table 3-1 shows a few
of the more useful character sets.
Shortcut | Definition |
---|---|
:c | Alphabetic character. Equivalent to [a-zA-Z]. |
:d | Decimal character. Equivalent to [0-9]. |
:i | Identifier. This shortcut will match an identifier (such as a variable, class, or method name). This is one of the most useful shortcuts. It is equivalent to ([a-zA-Z_$][a-zA-Z0-9_$]*) |
:q | Quoted string. Matches a string bounded by either double quotes or single quotes. |
:Wh | Whitespace. Matches any whitespace character. |
\n | Newline. This can be useful to build multiline strings. |
3.5.4.2 Positional characters
Several characters in
regular expressions indicate that a portion of the regular expression
needs to occur at a specific place in the text being matched. For
example, ^ indicates the beginning of a line. So
in the expression ^a, the a
must occur as the first character in a line for the expression to
match, including whitespace characters such as tabs or spaces; so
^a wouldn't match a line that
starts with spaces or tabs. The positional character
$ matches the end of line, so the pattern
^a$ will match only if a is
the only character on a line.
3.5.4.3 Escaping characters
We have now
discussed several different types of
special characters that are used to designate parts of a regular
expression. If you want to match those characters literally, you need
to tell the regular expression engine to not interpret them as
special portions of an expression. For example, if you wish to match
the character [, you need to escape it with a
backslash and use \[ in the expression.
3.5.4.4 Example expressions
Table 3-2 and Table 3-3 display
a few example expressions to show some of the flexibility of the
syntax.
Find what | Effect |
---|---|
bool:Wh+:w+\(:Wh+int | Matches method definitions that take an integer for the first parameter and return a Boolean |
///.*$ | Matches a C# documentation comment |
/\*([^*]|\n)*\*\/ | Matches a C# multiline comment |
'.*$ | Matches a VB.NET comment |
Find what | Replace with | Effect |
---|---|---|
private:Wh+{:i}:Wh+_{:i}:Wh*; | \0\npublic \1 \2\n{\nget{return _\2;}\nset{ _\2 = value ; }\n} | Takes a private variable definition in the form private int _prop and creates a public property accessor |
(System\.)*String | string | Changes variables and parameters defined as System.String to use the C# native string |
3.5.5. Regular Expressions in Your Code
The .NET Framework includes
classes in the
System.Text.RegularExpressions namespace that allow you to include
functionality using regular expressions in your code. Unfortunately,
the regular expression syntax Microsoft uses inside of the Visual
Studio IDE differs from the syntax used by the .NET regular
expression interpreter. Much of the syntax is identical; however,
many of the character sets and characters indicating the type of
expression or tagged expressions for replacement are different. It is
also important to note that by default the .NET regular expression
classes are case sensitive in their matching.For in-depth coverage of regular expressions, see Mastering
Regular Expressions (O'Reilly).
3.5.6. See Also
[Hack
#100]
Ben Von Handorf