Visual Studio Hacks [Electronic resources]

Andrew Lockhart

نسخه متنی -صفحه : 172/ 44
نمايش فراداده

Hack 19. Search for Patterns in Your Files

Unleash the power of regular expressions to make everyday searches and Find and Replace operations more powerful.

Regular expressions are an extremely versatile text-matching language that gives you incredible power when searching your documents and when used with replace operations, can greatly assist with repetitive changes to blocks of code.

3.5.1. Searching

The basic regular expression search is easily done. You simply open the Find dialog through the Edit Find and Replace Find menu or with Ctrl-F (Edit.Find). Enable regular expression searching by ensuring the Use checkbox is selected and the drop-down list has Regular Expressions selected, as shown in Figure 3-12.

Figure 3-12. Visual Studio .NET 2003 Find dialog with Regular Expressions enabled

In Visual Studio 2005, the Find dialog has changed. To enable regular expressions searching, you will need to expand the Find Options section of the dialog. Figure 3-13 shows the new Find dialog with the Options section expanded.

Figure 3-13. Visual Studio 2005 Find and Replace dialog with Regular Expressions enabled

Enter your regular expression into the Find What text box and click on Find Next. The next match of your expression will be found in the document. As with normal matches, clicking Find Next again will find the next match. The next step is to learn the regular expression (also known as regex or regexp) syntax. Searching will never be the same.

3.5.2. Basic Expressions

Regular expressions can be very complex, but basic expressions can be easy to master. Unlike normal searches, regular expressions designate a pattern of characters to match instead of a constant string. For example, square brackets in a regular expression define a set of characters (a character class ). When you execute the search, it will match any one character out of the set of characters inside the brackets, so the expression [abcd] would match a, b, c, and dbut not z. You can also specify character ranges inside the brackets, so [a-d] is equivalent to the expression [abcd]. If you need to specify more than one range, simply add it to the first, so [a-z0-9] will match any letter or number. Regular expression characters in Visual Studio are not case sensitive unless you select the Match Case option in the Find dialog. This is a departure from most other regular expression syntax.

Normal alphabetic characters outside of special expressions match characters literally, similar to a normal Find, but can be combined with regular expressions to make them more flexible. This means that combining the set match with a literal match gives us a pattern such as var[12], which will match var1 and var2 but not var3.

If you want to match the string var[12], you'd need to escape the special characters, as in var\[12\].

3.5.2.1 Matching quantities

Matching a single character isn't particularly useful, so you want to be able to specify a quantity. You can either specify an exact number of times with the pattern ^ n, where n is the number of matches you are seeking, or use *, which will match 0 or more repetitions of the pattern. Thus, [abcd]* would match ababcd, a, or an empty string, but not abzd. If you want to be sure that there is at least one match, you can use the + character, which matches one or more repetitions of the pattern.

3.5.2.2 Preventing matches

Sometimes preventing a match is the desired behavior. The pattern bool~(ean) uses the ~ operator to match only bool where it is not followed by ean (the parentheses group the ean so that the ~ acts on it as a group). It is also possible to specify a set of characters you do not want matched. In this case, you simply specify the character set with a ^ before it. Thus, the expression [^abc] will match any single character except for those specified.

3.5.3. Basic Replacements

After you have a search pattern, replacing it with a constant string is the same as doing a normal Find/Replace, but the real power of regular expressions comes in the ability to use the string that was matched in the replacement. In order to do this, you need to select the portion of the match expression you wish to reuse by tagging it inside of curly braces. For example, if you create a Find expression such as var{[12]}, this will put the 1 or 2 into the first tagged buffer.

In the Replace expression, you can access these buffers with a backslash. Thus, if you create a Replace expression of indicator\1, you will change any text reading "var1" to "indicator1". If you had a list of variables like this:

var1
var2
var3

they would be converted into:

indicator1
indicator2
indicator3

You may, of course, create as many tagged expressions as you wish. You can also access the entire string that the regular expression matched with \0.

3.5.4. Advanced Patterns

Basic expressions can take you a long way, but there are a number of more advanced operators that you can use with regular expressions to perform even more finely targeted searches.

3.5.4.1 Shortcuts

Several shortcuts for various character sets are built into the regular expression framework so that you do not have to constantly specify sets for commonly used groups. For example, :c matches one alphabetic characterit is the same as specifying [a-zA-Z]. Table 3-1 shows a few of the more useful character sets.

Table 3-1. Commonly used expression shortcuts

Shortcut

Definition

:c

Alphabetic character. Equivalent to [a-zA-Z].

:d

Decimal character. Equivalent to [0-9].

:i

Identifier. This shortcut will match an identifier (such as a variable, class, or method name). This is one of the most useful shortcuts. It is equivalent to ([a-zA-Z_$][a-zA-Z0-9_$]*)

:q

Quoted string. Matches a string bounded by either double quotes or single quotes.

:Wh

Whitespace. Matches any whitespace character.

\n

Newline. This can be useful to build multiline strings.

3.5.4.2 Positional characters

Several characters in regular expressions indicate that a portion of the regular expression needs to occur at a specific place in the text being matched. For example, ^ indicates the beginning of a line. So in the expression ^a, the a must occur as the first character in a line for the expression to match, including whitespace characters such as tabs or spaces; so ^a wouldn't match a line that starts with spaces or tabs. The positional character $ matches the end of line, so the pattern ^a$ will match only if a is the only character on a line.

3.5.4.3 Escaping characters

We have now discussed several different types of special characters that are used to designate parts of a regular expression. If you want to match those characters literally, you need to tell the regular expression engine to not interpret them as special portions of an expression. For example, if you wish to match the character [, you need to escape it with a backslash and use \[ in the expression.

3.5.4.4 Example expressions

Table 3-2 and Table 3-3 display a few example expressions to show some of the flexibility of the syntax.

Table 3-2. Example Find expressions

Find what

Effect

bool:Wh+:w+\(:Wh+int

Matches method definitions that take an integer for the first parameter and return a Boolean

///.*$

Matches a C# documentation comment

/\*([^*]|\n)*\*\/

Matches a C# multiline comment

'.*$

Matches a VB.NET comment

Table 3-3. Example Find/Replace syntax

Find what

Replace with

Effect

private:Wh+{:i}:Wh+_{:i}:Wh*;

\0\npublic \1 \2\n{\nget{return _\2;}\nset{ _\2 = value ; }\n}

Takes a private variable definition in the form private int _prop and creates a public property accessor

(System\.)*String

string

Changes variables and parameters defined as System.String to use the C# native string

3.5.5. Regular Expressions in Your Code

The .NET Framework includes classes in the System.Text.RegularExpressions namespace that allow you to include functionality using regular expressions in your code. Unfortunately, the regular expression syntax Microsoft uses inside of the Visual Studio IDE differs from the syntax used by the .NET regular expression interpreter. Much of the syntax is identical; however, many of the character sets and characters indicating the type of expression or tagged expressions for replacement are different. It is also important to note that by default the .NET regular expression classes are case sensitive in their matching.

For in-depth coverage of regular expressions, see Mastering Regular Expressions (O'Reilly).

3.5.6. See Also

[Hack #100]

Ben Von Handorf