Learning Visually with Examples [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Learning Visually with Examples [Electronic resources] - نسخه متنی

Raul F. Chong, Clara Liu, Sylvia F. Qi, Dwaine R. Snow

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










7.10. Multidimensional Clustering Tables and Block Indexes


Multidimensional clustering (MDC), as its name implies, allows for clustering of the physical data pages in multiple dimensions. For example, using the

sales table illustrated previously in Figure 7.15, you can cluster the data based on two dimensions:

sales_person and

year columns. This method of clustering has several benefits over clustering indexes.

  • With MDC, the data pages are physically clustered by several dimensions simultaneously. With clustering indexes, only one cluster index is allowed per table; the other indexes in the table are unclustered.

  • MDC guarantees clustering over time even though frequent

    INSERT operations are performed. Thus, less maintenance and overhead is required. With clustering indexes, this is not the case. As data pages are filled up, a clustered

    INSERT operation may encounter that the row to insert does not fit the right page to maintain the clustering of the data. In such cases, the row may end up on a page that is not close to the other related rows. Clustering indexes require an administrator to perform periodic table reorganizations to recluster the table and set up pages with additional free space to accommodate future clustered

    INSERT requests.

  • MDC uses

    block indexes , indexes that point to an entire block of pages. These are smaller indexes than regular and clustering indexes, which point to a single record.


NOTE

MDC is primarily intended for data warehousing environments; however, it can also work in online transaction processing (OLTP) environments.

7.10.1. MDC Tables


Let's redefine our

sales table as an MDC table, using dimensions

sales_person and

year .


CREATE TABLE sales (

sales_person VARCHAR(30) NOT NULL,

region CHAR(5) NOT NULL,

number_of_sales INT NOT NULL,

year INT

)

ORGANIZE BY DIMENSIONS (sales_person, year)

DB2 places records that have the same s

ales_person and

year values in physical locations that are close together as they are inserted into the table. These locations are called blocks. A block can be treated as an extent. The size of an extent can be defined in the

CREATE TABLESPACE statement. The minimum size for a block is two pages, like extents.

Figure 7.18 illustrates the contents of the

sales table using the new MDC definition. For simplicity, in this example a block can hold only two records.

Figure 7.18. The

sales table defined as an MDC table


cells. Each cell represents a unique combination of the dimension values. If there are X different values for

sales_person , and Y different values for y

ear , there are X*Y number of cells. In slice consists of all the cells that belong to a specific value of a dimension. Figure 7.18 highlights two out of six slices, one for dimension

year with a value of 2000 and the other for dimension

sales_person with a value of Mary.

7.10.2. Block Indexes


Block indexes are pointers to a block, not a single record. A block index points to the beginning of each block, which has a unique block ID (BID). MDC tables use only block indexes. Figure 7.19 shows a comparison between a regular index and a block index.

Figure 7.19. A regular index versus a block index

A block index has the following advantages over a regular index.

  • Block indexes are significantly smaller than regular indexes because they point to a block rather than a record. The reduced size makes index scans much faster.

  • Less maintenance overhead is associated with block indexes. They only need to be updated when adding the first record to a block and removing the last record from a block.

  • Prefetching is done in blocks, thus the amount of I/O is reduced.


An MDC table defined with even a single dimension can benefit from block indexes and can be a viable alternative to a regular table using a clustering index.

When an MDC table is created, a dimension block index is created for each specified dimension. For our

sales table, two dimension block indexes are created, one for the

sales_person dimension and one for the

year dimension, as illustrated in Figure 7.20.

Figure 7.20. Block indexes for

sales_person and

year


A query requesting records that have

sales_person = John can use the

sales_person block index to quickly access all three blocks (block 0, 1, and 2) that satisfy this criteria. Another query can use the

year block index to independently access all blocks that have

year = 2002 (blocks 2, 4, 5, and 8).

In addition to the dimension block indexes, a composite block index is also created during MDC table creation. A composite block index contains all columns across all dimensions and is used to maintain the clustering of data over

INSERT and

UPDATE activity. If a single dimension block index already contains all the dimension key columns, a composite block index is not created.

7.10.3. The Block Map


A block map is an array containing an entry for each block of an MDC table. The entry indicates whether or not a block is in use. Each block has a unique identifier (BID) and also an

IN_USE status bit. When a

DELETE operation removes the last record in a block, DB2 frees the block by changing its

IN_USE status bit and removing its BID from all block indexes. When new records are inserted and they can no longer fit into existing blocks, DB2 first scans for free blocks, looking for ones without the

IN_USE bit set. If a free block is found, DB2 reuses it, updates its

IN_USE bit, and adds its BID to block indexes.

Reusing free blocks greatly reduces fragmentation and in turn minimizes the need to reorganize the MDC table even though pages within the blocks may be fragmented.

7.10.4. Choosing Dimensions for MDC Tables


Choosing the right dimensions for an MDC table is crucial for obtaining the maximum advantages MDC can provide. You should consider the following:

  • Choose columns with the lowest cardinality.

    One of the advantages of using block indexes is that they point to a block rather than a record; therefore, there are fewer pointers to traverse. If each block contains only one record, the block index essentially becomes a regular index. You should try to minimize the number of blocks by increasing the number of records they can contain. You can achieve this by choosing columns with the lowest cardinality, that is, the lowest number of distinct values. For example, a column like

    region , with possible values of North, South, East, and West, is a good choice. A column like

    employee_id , which uniquely identifies each employee of a company that has 100,000 employees, is definitely a bad choice.

  • Choose the correct block size (extent size).

    MDC tables allocate space in blocks. The entire block is allocated even if only one record is inserted. For example, if your block can hold 100 pages, and on average only 10 records are inserted per block (assuming only one record can fit in a page), then 90% of the space is wasted. Thus, make sure you choose the correct block size.

  • Choose the right number of dimensions.

    The higher the number of dimensions, the more possible combinations you can have, and therefore the higher the number of possible cells. If there are many cells, each cell will likely contain only a few records, and if that is the case, the block size needs to be set to a small number.


NOTE

The Design Advisor tool can make recommendations on what dimensions to choose for a given table.


/ 312