l xmlns="http://www.w3.org/1999/l">
Given a transaction database DB, I={I1,I2,
… ,Im}is a set of itemsets with m different itemsets in DB. Each transaction T in DB is a set of items (i.e., itemsets), so T
⊆ I.
Itemset P is defined as A1
∩ A2
∩…∩ Ak, Ai
∈ I(i=1,2,
… ,k), and P containing k items is called k-itemset.
The support of itemset P is defined as
σ (P/DB)=the support account containing P in DB/the total transaction amount in DB=|A/DB|/|DB|.
If A and B are two itemsets, and A
∩ B=
Φ , then the confidence of association rule A
↠ B in DB is defined as
ψ (A
↠ B /DB)=
σ (A
∩ B /DB)/
σ ( A /DB).
Let the minimum support be
σ min. Then the set of k frequent itemsets and the set of k non-frequent itemsets are defined separately as:
To mine efficacious association rules in DB, minimum support
σ min and minimum confidence
ψ min must first be defined. Mining association rules find all of the association rules satisfying
σ (A
∩ B /DB)
≥
σ min and
ψ (A
↠ B /DB)
≥
ψ min in DB. Owing to the fact that the result of
ψ ( A
↠ B /DB) can be gotten from the value of
σ (A
∩ B /DB) and
σ (A /DB), the key to mining association rule A
↠ B is to generate the set of k frequent itemsets. Therefore, the substantive study at present focuses on generating the set of k frequent itemsets (see Agrawal & Srikant, 1994; Feng et al., 1998; Zhang et al., 2000), which is the key to heightening the mining efficiency. We also focus on pattern match, which is the key to generating k frequent itemsets. The corresponding Apriori algorithm is as follows:
L1={c
∈ C1|c.count
≥σ min }
For (k=2; Lk − 1
≠
Φ ; k++)
Ck=apriori-gen(Lk − 1
Count_support(Ck)
Lk ={c
∈ C1|c.counte
≥σ min}
Resultset=
∪ Lk
Next
Here, Ck is candidate k-itemsets, Lk is k-itemsets, Count_support(Ck) is to count the support count of candidate k-itemsets, Ck, apriori-gen(Lk − 1 − 1
insert into Ck
select P.A1, P.A2,
… , P.Ak − 1 − 1
from Lk − 1 − 1
where P.A1= Q.A1, P.A2= Q.A2,
… , P.Ak − 2 − 2 − 1 − 1
Then, delete any (k
− 1)-subitemsets of Ck which not be included in Lk − 1
For all itemsets c∈ Ck For all k-1_subitemsets s of c If (s
∉ Lk-1), then Delete c from Ck and get the candidate k-itemsets Ck.
During the mining of association rules, pattern match mainly occurs in Count_support(Ck), which is the account of the support count of candidate k-itemsets. The resulting account is a match between the k-itemsets constructed by all the k items, compounded by each transaction in transaction data set and the set of candidate k-itemsets Ck(k=1,2,
… ). From the above, we know the pattern match of mining association rules is the match between any k-itemsets from each transaction of transaction data set whose item number is not less than k and any one itemset in the set of candidate k-itemsets.