9.4 The Design
Our application is beginning to
take shape. Figure 9-2 shows the entire design of the Simple Spider.
It has layers that present a model, the service API, and two public
interfaces. There is not yet a controller layer to separate the
interfaces and logic. We'll integrate a controller
in the next chapter.
Figure 9-2. The Simple Spider design
prefer to encapsulate the configuration into its own service to
decouple the rest of the application from its details. This way, the
application can switch configuration systems easily later without
much editing of the code. For this version of the application, the
Configuration service will consist of two class,
ConfigBean and IndexPathBean,
which will encapsulate returning
configuration
settings for the application as a whole
(ConfigBean) and for getting the current path to
the index files (IndexPathBean). The two are
separate classes, as finding the path to the index is a more complex
task than simply reading a configuration file (see the implementation
details below). The configuration settings we will use are property
files, accessed through java.util.Properties.The crawler/indexer service is based on two classes:
IndexLinks, which controls the configuration of
the service in addition to managing the individual pages in the
document domain, and IndexLink, a class modeling a
single page in the search domain and allowing us to parse it looking
for more links to other pages. We will use Lucene (http://jakarta.apache.org/lucene) as our
indexer (and searcher) because it is fast, open source, and widely
adopted in the industry today. The search service is provided through
two more classes, QueryBean and
HitBean. The former models the search input/output
mechanisms, while the latter represents a single result from a larger
result set. Sitting over top of the collection of services are the
two specified user interfaces, the console version
(ConsoleSearch) and a web service
(SearchImpl and its WSDL file).