Intelligent Agents for Data Mining and Information Retrieval [Electronic resources]

Masoud Mohammadian

نسخه متنی -صفحه : 171/ 131
نمايش فراداده

l xmlns="http://www.w3.org/1999/l">

  • ASSOCIATION RULES DESCRIPTION

    Given a transaction database DB, I={I1,I2,

    … ,Im}is a set of itemsets with m different itemsets in DB. Each transaction T in DB is a set of items (i.e., itemsets), so T

    ⊆ I.

    Definition 1

    Itemset P is defined as A1

    ∩ A2

    ∩…∩ Ak, Ai

    ∈ I(i=1,2,

    … ,k), and P containing k items is called k-itemset.

    Definition 2

    The support of itemset P is defined as

    σ (P/DB)=the support account containing P in DB/the total transaction amount in DB=|A/DB|/|DB|.

    Definition 3

    If A and B are two itemsets, and A

    ∩ B=

    Φ , then the confidence of association rule A

    ↠ B in DB is defined as

    ψ (A

    ↠ B /DB)=

    σ (A

    ∩ B /DB)/

    σ ( A /DB).

    Definition 4

    Let the minimum support be

    σ min. Then the set of k frequent itemsets and the set of k non-frequent itemsets are defined separately as:

    To mine efficacious association rules in DB, minimum support

    σ min and minimum confidence

    ψ min must first be defined. Mining association rules find all of the association rules satisfying

    σ (A

    ∩ B /DB)

    σ min and

    ψ (A

    ↠ B /DB)

    ψ min in DB. Owing to the fact that the result of

    ψ ( A

    ↠ B /DB) can be gotten from the value of

    σ (A

    ∩ B /DB) and

    σ (A /DB), the key to mining association rule A

    ↠ B is to generate the set of k frequent itemsets. Therefore, the substantive study at present focuses on generating the set of k frequent itemsets (see Agrawal & Srikant, 1994; Feng et al., 1998; Zhang et al., 2000), which is the key to heightening the mining efficiency. We also focus on pattern match, which is the key to generating k frequent itemsets. The corresponding Apriori algorithm is as follows:

    C1={candidate 1-itemsets}

  • L1={c

    ∈ C1|c.count

    ≥σ min }

  • For (k=2; Lk

    − 1

    Φ ; k++)

  • Ck=apriori-gen(Lk

    − 1)

  • Count_support(Ck)

  • Lk ={c

    ∈ C1|c.counte

    ≥σ min}

  • Resultset=

    ∪ Lk

  • Next

  • Here, Ck is candidate k-itemsets, Lk is k-itemsets, Count_support(Ck) is to count the support count of candidate k-itemsets, Ck, apriori-gen(Lk

    − 1) is to generate Ck, which includes two steps. First, join Lk

    − 1 into k-itemsets. This is called the join step:

    • insert into Ck

      select P.A1, P.A2,

      … , P.Ak

      − 1,Q. Ak

      − 1

      from Lk

      − 1 P inner join Lk

      − 1 Q

      where P.A1= Q.A1, P.A2= Q.A2,

      … , P.Ak

      − 2= Q.Ak

      − 2, P.Ak

      − 1< Q.A k

      − 1

    Then, delete any (k

    − 1)-subitemsets of Ck which not be included in Lk

    − 1. This is called the prune step:

    For all itemsets c

    ∈ Ck For all k-1_subitemsets s of c If (s

    ∉ Lk-1), then Delete c from Ck and get the candidate k-itemsets Ck.

    During the mining of association rules, pattern match mainly occurs in Count_support(Ck), which is the account of the support count of candidate k-itemsets. The resulting account is a match between the k-itemsets constructed by all the k items, compounded by each transaction in transaction data set and the set of candidate k-itemsets Ck(k=1,2,

    … ). From the above, we know the pattern match of mining association rules is the match between any k-itemsets from each transaction of transaction data set whose item number is not less than k and any one itemset in the set of candidate k-itemsets.