apriori¶
Computes the frequent itemsets in a transaction set.
Synopsis¶
apriori (x, maxNumProducts)
Required Arguments¶
- float
x[[]]
(Input) - Array of size
n
× 2, each row of which represents a transaction id and item id pair. - int
maxNumProducts
(Input) - Maximum number of unique items or products that may be present in the
transactions.
maxNumProducts
must be greater than or equal to the number of items inx
.
Return Value¶
A data structure containing the frequent itemsets in the transaction set
x
. If no value can be computed, then None
is returned. To release
this space, use freeAprioriItemsets.
Optional Arguments¶
maxSetSize
, int (Input)Maximum size of an itemset. Only frequent itemsets with
maxSetSize
or fewer items are considered in the analysis.Default:
maxSetSize
= 5.minSupport
, float (Input)Minimum percentage of transactions in which an item or itemset must be present to be considered frequent.
minSupport
must be in the interval [0,1].Default:
minSupport
= 0.1.associationRules
, floatconfidence
, floatlift
, structureassocRules
(Input/Output)- Computes the strong association rules among itemsets.
- float
confidence
(Input) - The minimum confidence used to determine the strong association rules.
confidence
must be in the interval [0,1].lift
is the other criterion that determines whether an association is “strong.” If either criterion,confidence
orlift
, is exceeded, the association rule is considered “strong.” - float
lift
(Input) - The minimum lift used to determine the strong association rules.
lift
must be non-negative.confidence
is the other criterion that determines whether an association is “strong.” If either criterion,confidence
orlift
, is exceeded, the association rule is considered “strong.” - structure
assocRules
(Output) - A data structure containing the strong association rules among the
itemsets. If no value can be computed, then
None
is returned. To release this space, use freeAssociationRules.
Description¶
The function apriori
performs the Apriori algorithm for association rule
discovery. Association rules are statements of the form, “if X, then Y”,
given with some measure of confidence. The main application for association
rule discovery is market basket analysis, where X and Y are products or
groups of products, and the occurrences are individual transactions, or
“market baskets.” The results help sellers learn relationships between the
different products they sell, supporting better marketing decisions. There
are other applications for association rule discovery, such as the problem
areas of text mining and bioinformatics. The Apriori algorithm
(Agrawal and Srikant, 1994) is one of the
most popular algorithms for association rule discovery in transactional
datasets.
For distributed data or data larger than physical memory, see aggrApriori.
In the first and most critical stage, the Apriori algorithm mines the transactions for frequent itemsets. An itemset is frequent if it appears in more than a minimum number of transactions. The number of transactions containing an itemset is known as its “support”, and the minimum support (as a percentage of transactions) is a control parameter in the algorithm. The algorithm begins by finding the frequent single items. Then the algorithm generates all two-item sets from the frequent single items and determines which among them are frequent. From the collection of frequent pairs, Apriori forms candidate three-item subsets and determines which are frequent, and so on. The algorithm stops when either a maximum itemset size is reached, or when none of the candidate itemsets are frequent. In this way, the Apriori algorithm exploits the apriori-property: for an itemset to be frequent, all of its proper subsets must also be frequent. At each step the problem is reduced to only the frequent subsets.
In the second stage, the algorithm generates association rules. These are of the form, \(X\Rightarrow Y\) (read, “if X, then Y”), where Y and X are disjoint frequent itemsets. The confidence measure associated with the rule is defined as the proportion of transactions containing X that also contain Y. Denote the support of X (the number of transactions containing X) as \(S_X\), and \(S_Z\) is the support of \(Z=X\cup Y\). The confidence of the rule \(X\Rightarrow Y\) is the ratio, \(S_Z/S_X\). Note that the confidence ratio is the conditional probability
where \(P\left[XY\right]\) denotes the probability of both X and Y. The probability of an itemset X is estimated by \(S_X/N\), where N is the total number of transactions.
Another measure of the strength of the association is known as the lift, which is the ratio \((S_ZN)/(S_X S_Y)\). Lift values close to 1.0 suggest the sets are independent, and that they occur together by chance. Large lift values indicate a strong association. A minimum confidence threshold and a lift threshold can be specified.
Example¶
This example applies Apriori to find the frequent itemsets and strong association rules. The data are 50 transactions involving five different product IDs. The minimum support percentage is set to 0.30, giving a minimum required support of 15 transactions.
from numpy import *
from pyimsl.stat.apriori import apriori
from pyimsl.stat.writeAprioriItemsets import writeAprioriItemsets
from pyimsl.stat.writeAssociationRules import writeAssociationRules
maxNumProducts = 50
maxSetSize = 10
minPctSupport = 0.30
x = array([[1, 3], [1, 2], [1, 1], [2, 1], [2, 2], [2, 4], [2, 5],
[3, 3], [4, 4], [4, 3], [4, 5], [4, 1], [5, 5], [6, 1],
[6, 2], [6, 3], [7, 5], [7, 3], [7, 2], [8, 3], [8, 4],
[8, 1], [8, 5], [8, 2], [9, 4], [10, 5], [10, 3], [11, 2],
[11, 3], [12, 4], [13, 4], [14, 2], [14, 3], [14, 1], [15, 3],
[15, 5], [15, 1], [16, 2], [17, 3], [17, 5], [17, 1], [18, 5],
[18, 1], [18, 2], [18, 3], [19, 2], [20, 4], [21, 1], [21, 4],
[21, 2], [21, 5], [22, 5], [22, 4], [23, 2], [23, 5], [23, 3],
[23, 1], [23, 4], [24, 3], [24, 1], [24, 5], [25, 3], [25, 5],
[26, 1], [26, 4], [26, 2], [26, 3], [27, 2], [27, 3], [27, 1],
[27, 5], [28, 5], [28, 3], [28, 4], [28, 1], [28, 2], [29, 4],
[29, 5], [29, 2], [30, 2], [30, 4], [30, 3], [31, 2], [32, 5],
[32, 1], [32, 4], [33, 4], [33, 1], [33, 5], [33, 3], [33, 2],
[34, 3], [35, 5], [35, 3], [36, 3], [36, 5], [36, 4], [36, 1],
[36, 2], [37, 1], [37, 3], [37, 2], [38, 4], [38, 2], [38, 3],
[39, 3], [39, 2], [39, 1], [40, 2], [40, 1], [41, 3], [41, 5],
[41, 1], [41, 4], [41, 2], [42, 5], [42, 1], [42, 4], [43, 3],
[43, 2], [43, 4], [44, 4], [44, 5], [44, 2], [44, 3], [44, 1],
[45, 4], [45, 5], [45, 3], [45, 2], [45, 1], [46, 2], [46, 4],
[46, 5], [46, 3], [46, 1], [47, 4], [47, 5], [48, 2], [49, 1],
[49, 4], [49, 3], [50, 3], [50, 4]])
associationRules = {"confidence": 0.8, "lift": 2.0}
# Compute and print the strong association rules.
itemsets = apriori(x, maxNumProducts,
associationRules=associationRules,
maxSetSize=maxSetSize,
minSupport=minPctSupport)
writeAprioriItemsets(itemsets)
writeAssociationRules(associationRules["assocRules"])
Output¶
Frequent Itemsets (Out of 50 Transactions):
Size Support Itemset
1 27 { 1 }
1 30 { 2 }
1 33 { 3 }
1 27 { 4 }
1 27 { 5 }
2 20 { 1 2 }
2 22 { 1 3 }
2 16 { 1 4 }
2 19 { 1 5 }
2 22 { 2 3 }
2 16 { 2 4 }
2 15 { 2 5 }
2 16 { 3 4 }
2 19 { 3 5 }
2 17 { 4 5 }
3 17 { 1 2 3 }
3 15 { 1 3 5 }
Association Rules (itemset X implies itemset Y):
X = {1} ==> Y = {3}
supp(X)=27, supp(Y)=33, supp(X U Y)=22
conf= 0.81, lift=1.23
X = {1 2} ==> Y = {3}
supp(X)=20, supp(Y)=33, supp(X U Y)=17
conf= 0.85, lift=1.29
Warning Errors¶
IMSLS_MIN_SUPPORT_NOT_MET |
No items met minimum support of #. |
Fatal Errors¶
IMSLS_NEED_IARG_GE |
"name" = # . “name “ must be
greater than or equal to # . |