Archive Meta Data
One of the most important answers the analyst must provide when talking about data storage is what values must be indexed for further searches. Well, we identified a set of indexed values useful for a complete search engine:
meta/@id
creator/@userID
status
category
publicationTime
subject/@type
subject/@name
type/@type
With this information required for each story (meta), we can design the database diagram as shown in Figure 16-1.Figure 16-1 shows a normalized form of database model. The main table is a meta table and contains the following fields definition:
CREATE TABLE [dbo].[Meta] (
[MetaID] [varchar] (10) NOT NULL,
[ContributorID] [int] NOT NULL ,
[StatusID] [int] NOT NULL ,
[CategoryID] [int] NOT NULL ,
[SubjectTypeID] [int] NOT NULL ,
[SubjectNameID] [int] NOT NULL ,
[PublicationTime] [datetime] NOT NULL ,
[ResourceTypeID] [int] NOT NULL ,
[WholeStory] [text] NOT NULL
)
The MetaID field is taken from the XML document and includes the desk ID. The contributor ID refers to a lookup table that contains all information about the contributors:
CREATE TABLE [dbo].[Contributors] (
[ContributorID] [int] NOT NULL ,
[FirstName] [varchar] (50) NOT NULL ,
[LastName] [varchar] (50) NOT NULL ,
[email] [varchar] (100) NOT NULL,
// all other contributor fields here
)
Figure 16-1: Database model.
When the user searches for news about a given contributor, they will search it from the Contributors table. The second lookup table is the Status one:
CREATE TABLE [dbo].[Status] (
[StatusID] [int] NOT NULL ,
[Status] [varchar] (10) NOT NULL
)
This second table contains all available statuses of the story: Draft, In Review, Filed, Embargoed, and Spiked. The third table is the resource categories:
CREATE TABLE [dbo].[Categories] (
[CategoryID] [int] NOT NULL ,
[Category] [varchar] (20) NULL
)
This table contains all possible categories of the resource, such as Analysis, Bulletin, Cartoon, Column, Biography, and so on. SubjectTypes and SubjectNames are two related lookup tables because for each subject type there is a list of subject names available, as stated in the file topic.xml shown in Chapter 13.Then the table structures are defined as follows:
CREATE TABLE [dbo].[SubjectTypes] (
[SubjectTypeID] [int] NOT NULL ,
[SubjectType] [varchar] (50) NOT NULL
)
CREATE TABLE [dbo].[SubjectNames] (
[SubjectNameID] [int] NOT NULL ,
[SubjectTypeID] [int] NOT NULL ,
[SubjectName] [varchar] (50) NULL
)
The last lookup table is the resource type that contains the story types such as Article, Illustration, Photo, Sound, and Video. Its structure is defined as follows:
CREATE TABLE [dbo].[ResourceTypes] (
[ResourceTypeID] [int] NOT NULL ,
[ResourceType] [varchar] (50) NOT NULL
)