Hack 25 Find Files Faster by Mastering the Indexing Service's Query Language


Got a hard disk filled with many files, and no
easy way to find what you want quickly? Use the Indexing Service and
its query language to get what you want fast.
Packrats like me (and my editor) have
a hard time finding exactly what they want on their hard disk. I have
thousands of files there, some dating back close to ten years, that I
dutifully copy to a new system every time I upgrade my hardware.
After all, who knows when I might need to find the list of books I
planned to take out of the library in 1986?
XP's Search Companion is too slow and the kinds of
searches it can perform are fairly limited. It can't
find files based on properties such as when the file was last printed
or the word count of a file, or using a sophisticated search
language.
The Indexing Service, first used with the Microsoft Internet
Information Services (IIS), is a far more powerful tool. It can
perform searches hundreds of times faster and includes an exceedingly
sophisticated query language you can use for performing searches. It
works by indexing the files on your disk, and then, when you do a
search, it queries that index rather than searching through your
entire hard disk. The indexes that the service creates are called
catalogs.
By default, the Indexing Service is turned off. To activate it, first
run the Search Companion by choosing Start
Change Preferences
Indexing Service option isn't available, and instead
you see Without Indexing Service, it means that the Indexing Service
is already turned on.
When you activate the Indexing Service, it won't
immediately be available. It first has to build an index, which can
take a substantial amount of time, depending on the amount of files
on your hard disk, and your processor speed. It's
best to start the Indexing Service and leave your computer on
overnight so that it can complete indexing.
To turn off the Indexing Service from the Search Companion, choose
Change Preferences
that, you'll use the normal Search Companion. The
index will remain intact; when you do a search, you just
won't search through it. You can always turn the
index back on when you want.
3.6.1 Using the Indexing Service's Query Language
The Indexing
Service's query language is a sophisticated
language, letting you search on file propertiessuch as the
author of documents or the number of bytes in a documentand
also uses Boolean operators and other search criteria.
The language uses tags to define search criteria. For example, to
search for the phrase "That dog
won't hunt," the query would be:
{phrase} That dog won't hunt {/phrase}
There are two basic ways to search for text in the query language,
using either phrase or
freetext. A
phrase search searches for the exact words in
the exact order, like this:
{phrase} old dog barks backwards {/phrase}
The search results will include only files whose text includes that
exact phrase.
A freetext expression search
looks for any words in the phrase and returns files that have any one
of the words in the phrase. It works like the Boolean
OR operator. So, the query:
{freetext} old dog barks backwards {/freetext}
returns many more searches than the phrase query,
since it returns results that contain any of the words in the phrase.
3.6.2 Searching Using Properties
The Indexing
Service's query language's power is
contained in the way it can search not just for text, but also for
document properties. The syntax for searching using properties in a
query is:
{prop name=property name} query {/prop}
where property name is the name of the
property, such as those listed in Table 3-2, and
query is the text you're
searching for. For example, to search for all documents last edited
by Preston Gralla, you would enter:
{prop name=DocLastAuthor} Preston Gralla {/prop}
Queries can use * and
? wildcard characters, as well as Unix-style
regular expression queries (for more on regular expressions, see
Mastering Regular Expressions from
O'Reilly). In order to use these wildcards, you must
use the {regex} tag, like this:
{prop name=filename} {regex} *.xl? {/regex} {/prop}
The Indexing Service indexes not just the text of each document, but
also all the summary information associated with each document. (To
see summary information for any document, right-click on it and
choose Properties
properties in the summary, you can also search for the properties
found in Table 3-2, which lists the most important
properties you can use to search.
Property | Description |
---|---|
Access | The last time the document was accessed. |
All | All available properties. Works with text queries, but not numeric queries. |
AllocSize | The total disk space allocated to the document. |
Contents | The contents of the document. |
Created | The time the document was created. |
Directory | The full directory path in which the document is contained. |
DocAppName | The name of the application in which the document was created. |
DocAuthor | The author of the document. |
DocByteCount | The number of bytes in the document. |
DocCategory | The type of document. |
DocCharCount | The number of characters in the document. |
DocComments | Comments made about the document. |
DocCompany | The name of the company for which the document was written. |
DocCreatedTime | The time spent editing the document. |
DocHiddenCount | The number of hidden slides in a PowerPoint document. |
DocKeyWords | The key words in the document. |
DocLastAuthor | The name of the person who last edited the document. |
DocLastPrinted | The time the document was most recently printed. |
DocLineCount | The number of lines contained in the document. |
DocLastSavedTm | The time that the document was last saved. |
DocManager | The name of the manager of the document's author. |
DocNoteCount | The number of pages with notes in a PowerPoint document. |
DocPageCount | The number of pages in the document. |
DocParaCount | The number of paragraphs in the document. |
DocPartTitles | The names of document parts, such as spreadsheet names in an Excel document or slide titles in a PowerPoint slide show. |
DocRevNumber | The current version number of document. |
DocSlideCount | The number of slides in a PowerPoint document. |
DocTemplate | The name of the document's template. |
DocTitle | The title of the document. |
DocWordCount | The number of words in the document. |
FileName | The filename of the document. |
Path | The path to the document, including the document filename. |
ShortFileName | The 8.3-format name of the document. |
Size | The size of the document, in bytes. |
Write | The date and time the document was last modified. |
3.6.3 Searching Using Operators and Expressions
The query language also lets you
use a variety of operators and expressions for both text
and numbers:
EQUALS and CONTAINS operators
When you're creating a query using text, you can use
the EQUALS
and CONTAINS
operators to narrow your search. Use the EQUALS
operator when you want the exact words matched in the exact order,
like this:
{prop name=DocTitle} EQUALS First Draft of Final Novel {/prop}
This query finds all documents with the title "First
Draft of Final Novel." The query
wouldn't find a document with the title
"Final Draft of First Novel" or
"First Draft of Novel." The
EQUALS operator works like the
phrase expression.
Use the CONTAINS operator when you want to find
any of the words in the document, in the same way you would use the
freetext expression.
Relational operators
Use relational operators when
you're searching using
numbers:
=
Equal to
!=
Not equal to
<
Less than
<=
Less than or equal to
>
Greater than
>=
Greater than or equal to
Date and time expressions
You can use the following formats when
searching using dates and times:
yyyy/mm/dd hh:mm:ss
yyyy-mmmm-dd hh:mm:ss
You can also use date and time expressions in combination
with relational operatorsfor example, to look for files that
were created within the last two days:
{prop name=Created} >-2d {/prop}
Table 3-3 lists the date and time abbreviations
you can use.
Abbreviation | Meaning | Abbreviation | Meaning |
---|---|---|---|
Y | Year | D | Day |
Q | Quarter | H | Hour |
M | Month | N | Minute |
W | Week | S | Second |
Boolean operators
The query language also uses the
Boolean operators detailed in Table 3-4.
Boolean Operator | Long Form | Short Form |
---|---|---|
AND | & | AND |
OR | | | OR |
Unary NOT | ! | NOT |
Binary NOT | &! | AND NOT |
Use the unary NOT when
you're searching using numbers rather than text. For
example, to search for all documents that do not have seven
PowerPoint slides, use the query:
{prop name=DocSlideCount} NOT = 7 {/prop}
Use the binary NOT to narrow
a search, by combining two properties in a query. For example, to
search for all documents with an author of "Preston
Gralla" that are not titled
"Chapter 10", use this query (on
one line):
{prop name=DocAuthor} Preston Gralla {/prop} NOT
{prop name=DocTitle} Chapter 10 {/prop}
Alternative verb forms
You can use the double-asterisk wildcard to search for alternative
forms of verbs in a document. For
example, the query:
{prop name=Contents} run** {/prop}
returns all documents with the word
"ran" or the word
"run."
3.6.4 Ranking the Order of Search Results
If you're doing a
search likely to return many results, you'll want
the most relevant searches to appear at the top of the results, and
the least relevant to appear at the bottom. You can determine the
relative importance of each term in your search and have the results
weighted by that importance, by using the weight
tag. Note that it does not get a closing tag:
{weight value = n} query
The value parameter ranges between 0.000 and
1.000.
If you are searching for the three terms,
"fire,"
"ice," and
"slush," and you want to weight
"fire" most heavily,
"ice" second most heavily, and
"slush" least heavily, you can use
this syntax (on a single line) in your query:
{weight value=1.000}fire AND {weight value=.500}ice AND {weight value=.250}slush
3.6.5 Editing the Indexing Service's "Noise" Filter
You can force the
Indexing Service to
ignore more words when you search, or you can have it ignore fewer
words, simply by editing a text file. In a text file called
noise.eng, usually found in
C:\Windows\System32\, you can find the list of
words that the Indexing Service ignores. (The extension
.eng is for English. Noise filters from other
languages can be found as wellfor example,
noise.deu for German,
noise.fra for French, and so on.)
The noise.eng file contains common articles,
prepositions, pronouns, conjunctions, various forms of common verbs,
and similar words. Open it in Notepad or another text editor, add
words that you want it to ignore, and delete files that you
don't want it to ignore. Then save the file, and the
Indexing Service will follow your new rules.