Association Rule Mining

 Association Rule

What is Association Rule?

Association rule mining is a data mining technique used to discover interesting relationships, patterns, or associations between items in large datasets, often used in market basket analysis. The goal is to find rules that predict the occurrence of an item or itemset based on the occurrence of other items.

Example:

In a grocery store dataset:

  • Rule: If a customer buys bread, they are likely to also buy butter.
  • This can be written as: {bread} → {butter}.

Association rules are composed of two parts:

  1. Antecedent (LHS): The item(s) on the left-hand side of the rule (e.g., {bread}).
  2. Consequent (RHS): The item(s) on the right-hand side of the rule (e.g., {butter}).

Applications of Association Rule Mining:

  • Market Basket Analysis: Discover product combinations that are frequently bought together in retail stores.
  • Recommender Systems: Suggest additional items to customers based on their purchase history.
  • Healthcare: Find associations between medical symptoms and diagnoses.
  • Web Usage Mining: Analyze website activity to recommend content or products based on user behavior.

Support in Association Rule Mining:

Support is a measure of how frequently an item or itemset appears in the dataset. It is a key metric used in association rule mining to identify frequent itemsets.

Definition of Support:

The support of an itemset is defined as the proportion (or percentage) of transactions in which the itemset appears, relative to the total number of transactions.

Mathematically:

Support(A)=Number of transactions containing ATotal number of transactions\text{Support} (A) = \frac{\text{Number of transactions containing A}}{\text{Total number of transactions}}

Where:

  • A is an itemset (e.g., {milk, bread}).
  • The numerator is the number of transactions where the itemset A occurs.
  • The denominator is the total number of transactions.

Example of Support:

Consider a dataset with 10 transactions:

TransactionItems
1{milk, bread}
2{bread, butter}
3{milk}
4{milk, bread}
5{butter, bread}
6{milk, butter}
7{milk, bread}
8{bread}
9{butter, milk}
10{milk, bread, butter}

If you want to calculate the support of the itemset {milk, bread}:

  1. Count the number of transactions that contain both milk and bread.

    • Transactions 1, 4, 7, and 10 contain both items. So, there are 4 such transactions.
  2. The total number of transactions is 10.

  3. The support for {milk, bread} is:

    Support(milk,bread)=410=0.4\text{Support}({milk, bread}) = \frac{4}{10} = 0.4

    This means the itemset {milk, bread} appears in 40% of all transactions.

How min_support=0.005 Works:

In your case, you set min_support=0.005, which means you want to filter out itemsets that appear in less than 0.5% of transactions.

  • If your dataset has 9,835 transactions, then: Minimum support threshold=0.005×9835=49.175 transactions\text{Minimum support threshold} = 0.005 \times 9835 = 49.175 \text{ transactions} Therefore, any itemset that appears in fewer than ~49 transactions will be ignored, as its support would be less than 0.005 (or 0.5%).

Why is Support Important?

  • Support helps ensure that only commonly occurring itemsets are considered, reducing the number of itemsets being analyzed.
  • Higher support thresholds mean you are looking for more frequently occurring itemsets.
  • Lower support thresholds mean you are allowing itemsets that may occur less frequently but might still be interesting.


Confidence:

  • Definition: Confidence is the proportion of transactions that contain the antecedent in which the rule’s consequent is also found.
  • Formula: Confidence (A → B)=Support (A and B)Support (A)\text{Confidence (A → B)} = \frac{\text{Support (A and B)}}{\text{Support (A)}}
  • Meaning: It measures how often the rule is correct. In other words, how often does the consequent (right-hand side item) appear in transactions where the antecedent (left-hand side item) appears?
  • Example: If the rule is {milk} → {bread} and the confidence is 0.8, this means that 80% of the time when milk is bought, bread is also bought.

5. Lift:

  • Definition: Lift is the ratio of the observed support of the rule to the expected support if the antecedent and consequent were independent. It tells you how much more likely the consequent is to occur when the antecedent occurs compared to it occurring randomly.
  • Formula: Lift (A → B)=Confidence (A → B)Support (B)\text{Lift (A → B)} = \frac{\text{Confidence (A → B)}}{\text{Support (B)}}
  • Meaning: Lift quantifies the strength of the relationship between the antecedent and consequent:
    • Lift = 1: The antecedent and consequent are independent.
    • Lift > 1: The occurrence of the antecedent increases the likelihood of the consequent.
    • Lift < 1: The occurrence of the antecedent decreases the likelihood of the consequent.
  • Example: If the rule {milk} → {bread} has a lift of 3.0, this means that buying milk makes it three times more likely that bread will also be bought compared to random chance.

Summary:

  • Antecedents: The items that trigger the rule.
  • Consequents: The items that are predicted to appear given the antecedents.
  • Support: The frequency of the itemset/rule in the dataset.
  • Confidence: How often the rule holds true when the antecedent is present.
  • Lift: The strength of the rule compared to random chance

Download (groceries) Dataset


Go To Jupyter Notebook👇👇👇👇👇

टिप्पणी पोस्ट करा

0 टिप्पण्या