Intelligent Agents for Data Mining and Information Retrieval [Electronic resources]

Masoud Mohammadian

نسخه متنی -صفحه : 171/ 131

l xmlns="http://www.w3.org/1999/l">

ASSOCIATION RULES DESCRIPTION

Given a transaction database DB, I={I₁,I₂,

… ,I_m}is a set of itemsets with m different itemsets in DB. Each transaction T in DB is a set of items (i.e., itemsets), so T

⊆ I.

Definition 1

Itemset P is defined as A₁

∩ A₂

∩…∩ A_k, A_i

∈ I(i=1,2,

… ,k), and P containing k items is called k-itemset.

Definition 2

The support of itemset P is defined as

σ (P/DB)=the support account containing P in DB/the total transaction amount in DB=|A/DB|/|DB|.

Definition 3

If A and B are two itemsets, and A

∩ B=

Φ , then the confidence of association rule A

↠ B in DB is defined as

ψ (A

↠ B /DB)=

σ (A

∩ B /DB)/

σ ( A /DB).

Definition 4

Let the minimum support be

σ _min. Then the set of k frequent itemsets and the set of k non-frequent itemsets are defined separately as:

To mine efficacious association rules in DB, minimum support

σ _min and minimum confidence

ψ _min must first be defined. Mining association rules find all of the association rules satisfying

σ (A

∩ B /DB)

≥

σ _min and

ψ (A

↠ B /DB)

≥

ψ _min in DB. Owing to the fact that the result of

ψ ( A

↠ B /DB) can be gotten from the value of

σ (A

∩ B /DB) and

σ (A /DB), the key to mining association rule A

↠ B is to generate the set of k frequent itemsets. Therefore, the substantive study at present focuses on generating the set of k frequent itemsets (see Agrawal & Srikant, 1994; Feng et al., 1998; Zhang et al., 2000), which is the key to heightening the mining efficiency. We also focus on pattern match, which is the key to generating k frequent itemsets. The corresponding Apriori algorithm is as follows:

C₁={candidate 1-itemsets}

L₁={c

∈ C₁|c.count

≥σ _min }

For (k=2; L_k

− 1

≠

Φ ; k++)

C_k=apriori-gen(L_k

− 1)

Count_support(C_k)

L_k ={c

∈ C1|c.counte

≥σ _min}

Resultset=

∪ L_k

Here, C_k is candidate k-itemsets, L_k is k-itemsets, Count_support(C_k) is to count the support count of candidate k-itemsets, C_k, apriori-gen(L_k

− 1) is to generate C_k, which includes two steps. First, join L_k

− 1 into k-itemsets. This is called the join step:

insert into C_k
select P.A₁, P.A₂,
… , P.A_k
− 1,Q. A_k
− 1
from L_k
− 1 P inner join L_k
− 1 Q
where P.A₁= Q.A₁, P.A₂= Q.A₂,
… , P.A_k
− 2= Q.A_k
− 2, P.A_k
− 1< Q.A _k
− 1

Then, delete any (k

− 1)-subitemsets of C_k which not be included in L_k

− 1. This is called the prune step:

For all itemsets c

∈ C_k For all k-1_subitemsets s of c If (s

∉ L_k-1), then Delete c from C_k and get the candidate k-itemsets C_k.

During the mining of association rules, pattern match mainly occurs in Count_support(C_k), which is the account of the support count of candidate k-itemsets. The resulting account is a match between the k-itemsets constructed by all the k items, compounded by each transaction in transaction data set and the set of candidate k-itemsets C_k(k=1,2,

… ). From the above, we know the pattern match of mining association rules is the match between any k-itemsets from each transaction of transaction data set whose item number is not less than k and any one itemset in the set of candidate k-itemsets.