A Beginner’s Guide to the FP-Growth Algorithm in Python

Spread the love

FP-Growth Algorithm in Python

In the world of data mining and frequent pattern mining, the FP-Growth algorithm is a powerful tool. It efficiently discovers frequent itemsets in large transaction datasets without the need to generate candidate itemsets explicitly. This tutorial will provide a comprehensive introduction to the FP-Growth algorithm, its inner workings, and practical examples to help beginners grasp its concepts effectively.

fp growth algorithm in python

Table of Contents

  1. Introduction to Frequent Pattern Mining
  2. What is the FP-Growth Algorithm?
  3. The FP-Tree Data Structure
  4. Building the FP-Tree
  5. Mining Frequent Patterns
  6. Implementing FP-Growth in Python
  7. Example: Market Basket Analysis
  8. Conclusion

1. Introduction to Frequent Pattern Mining

Frequent Pattern Mining (FPM) is a data mining technique used to discover patterns that frequently appear in datasets. In the context of market basket analysis, FPM helps uncover associations between products that customers often purchase together. One of the widely used algorithms for FPM is the FP-Growth algorithm.

2. What is the FP-Growth Algorithm?

The FP-Growth (Frequent Pattern Growth) algorithm is a highly efficient method for finding frequent itemsets in transactional databases. Unlike the Apriori algorithm, which generates candidate itemsets and scans the database multiple times, FP-Growth builds a compact data structure called the FP-Tree, reducing computational overhead.

3. The FP-Tree Data Structure

At the core of the FP-Growth algorithm is the FP-Tree data structure. The FP-Tree represents the transaction database in a way that simplifies the process of finding frequent patterns. It consists of nodes and links that connect the nodes, where each node represents an item in the dataset.

4. Building the FP-Tree

To build the FP-Tree, follow these steps:

Step 1: Scan the Database

Scan the transaction database to count the frequency of each item. This information is used to identify frequent items.

Step 2: Sort Items by Frequency

Sort the frequent items in descending order of their frequency. This step is crucial for the efficiency of the algorithm.

Step 3: Create the FP-Tree

Start with an empty root node. For each transaction in the database, add the items to the tree following these rules:

  • If the item exists in the current branch, increment its count.
  • If the item doesn’t exist, create a new node and link it to the current branch.
  • Maintain the conditional FP-Tree for each item.

5. Mining Frequent Patterns

Once the FP-Tree is constructed, you can efficiently mine frequent patterns using a recursive technique. Starting from the least frequent item, you can generate conditional patterns and build conditional FP-Trees for each item.

6. Implementing FP-Growth in Python

Let’s illustrate FP

-Growth with a Python implementation using a sample dataset. You can use the mlxtend library for this purpose. First, install the library using pip:

pip install mlxtend
Here’s a basic example of implementing FP-Growth in Python:
from mlxtend.frequent_patterns import fpgrowth
import pandas as pd

# Sample transaction dataset
data = pd.DataFrame({
    'Transaction': ['T1', 'T2', 'T3', 'T4', 'T5'],
    'Items': [['A', 'B', 'D'], ['B', 'C', 'E'], ['A', 'B', 'D', 'E'], ['A', 'E'], ['A', 'B', 'C', 'E']]
})

# Convert the dataset to a one-hot encoded format
one_hot = data['Items'].str.join('|').str.get_dummies()

# Use FP-Growth to find frequent itemsets
frequent_itemsets = fpgrowth(one_hot, min_support=0.4, use_colnames=True)

print(frequent_itemsets)

In this example, we first prepare the transaction dataset and convert it into a one-hot encoded format. Then, we use the fpgrowthfunction to find frequent itemsets with a minimum support of 0.4.

 

7. Example: Market Basket Analysis

Let’s apply the FP-Growth algorithm step by step to perform Market Basket Analysis on a sample transaction dataset. In this scenario, we aim to discover associations between products that customers often purchase together.

Step 1: Prepare the Transaction Dataset

We start with a transaction dataset that records what items were purchased together in different transactions:

 

Transaction Items
T1 {Milk, Bread, Butter}
T2 {Milk, Bread, Diapers}
T3 {Milk, Eggs, Diapers}
T4 {Milk, Bread, Butter}
T5 {Bread, Butter}

Step 2: Convert to One-Hot Encoding

We convert the transaction dataset into a one-hot encoded format, where each column represents a unique item, and each row corresponds to a transaction. A “1” in a cell indicates the presence of the item in the transaction; otherwise, it’s marked as “0.”

Transaction Milk Bread Butter Diapers Eggs
T1 1 1 1 0 0
T2 1 1 0 1 0
T3 1 0 0 1 1
T4 1 1 1 0 0
T5 0 1 1 0 0

Step 3: Apply FP-Growth

Now, we’ll use the FP-Growth algorithm to find frequent itemsets in this one-hot encoded dataset. We’ll set a minimum support threshold (min_support) to identify which itemsets are frequent.

from mlxtend.frequent_patterns import fpgrowth
import pandas as pd

# Load the one-hot encoded dataset
data = pd.DataFrame({
    'Transaction': ['T1', 'T2', 'T3', 'T4', 'T5'],
    'Milk': [1, 1, 1, 1, 0],
    'Bread': [1, 1, 0, 1, 1],
    'Butter': [1, 0, 0, 1, 1],
    'Diapers': [0, 1, 1, 0, 0],
    'Beer': [0, 0, 1, 0, 0]
})

# Use FP-Growth to find frequent itemsets with min_support=0.4
frequent_itemsets = fpgrowth(data.drop('Transaction', axis=1), min_support=0.4, use_colnames=True)

print(frequent_itemsets)

Step 4: Interpret the Results

The FP-Growth algorithm will output the frequent itemsets based on the minimum support threshold. In this example, with a minimum support of 0.4, you will get a list of itemsets and their support values. The support value indicates the proportion of transactions in which the itemset appears.

Step 5: Extract Association Rules

After identifying frequent itemsets, you can further extract association rules to understand the relationships between items. Association rules typically consist of antecedents (items on the left-hand side) and consequents (items on the right-hand side).

For instance, you might discover a rule like:

{Milk} => {Bread}

This rule suggests that customers who buy milk are likely to also purchase bread. The strength of the association can be measured using metrics like confidence and lift.

By following these steps, you can perform Market Basket Analysis using the FP-Growth algorithm to uncover valuable insights into customer purchasing behavior and product associations.

Complete Code

from mlxtend.frequent_patterns import fpgrowth
import pandas as pd

# Load the one-hot encoded dataset
data = pd.DataFrame({
    'Transaction': ['T1', 'T2', 'T3', 'T4', 'T5'],
    'Milk': [1, 1, 1, 1, 0],
    'Bread': [1, 1, 0, 1, 1],
    'Butter': [1, 0, 0, 1, 1],
    'Diapers': [0, 1, 1, 0, 0],
    'Beer': [0, 0, 1, 0, 0]
})

# Use FP-Growth to find frequent itemsets with min_support=0.4
frequent_itemsets = fpgrowth(data.drop('Transaction', axis=1), min_support=0.4, use_colnames=True)

print(frequent_itemsets)

Running the code on your system

To run the code you provided on your local system, you’ll need to have Python installed along with the necessary libraries, including mlxtend and pandas. You can follow these steps:

  1. Install Python: If you don’t already have Python installed, download and install it from the official Python website (https://www.python.org/downloads/) based on your operating system.
  2. Install Required Libraries:
    • Open your command prompt (Windows) or terminal (macOS/Linux).
    • Use pip, the Python package manager, to install the required libraries
  3. pip install mlxtend pandas
    
  4. Create a Python File: Open a text editor or code editor of your choice (e.g., Visual Studio Code, PyCharm, or a simple text editor like Notepad) and paste your code into a new file. Save the file with a .py extension (e.g., fpgrowth_example.py) in a directory of your choice.
  5. Run the Python Script: Open your command prompt or terminal, navigate to the directory where you saved the Python script, and execute it using the python command:
  6. cd path/to/your/script/directory
    python fpgrowth_example.py
    
    1. Replace path/to/your/script/directory with the actual path to the directory where you saved the script.
    2. Output: After running the script, you should see the output in your terminal, which will display the frequent itemsets found using FP-Growth.

    Make sure you have Python and the required libraries installed correctly, and ensure that the script file is saved with a .py extension. This should allow you to run the code successfully on your local system.

Source code of the tutorial

You can find source code of the tutorial at our Github repository.

fp growth algorithm source code github

8. Conclusion

The FP-Growth algorithm is a powerful tool for frequent pattern mining in large datasets. Its ability to construct the FP-Tree data structure and efficiently find frequent patterns makes it a valuable addition to a data miner’s toolkit. By understanding its inner workings and practicing with practical examples, beginners can harness the potential of FP-Growth in various data mining applications.

Learning Resources:

  1. Data Science A-Z™: Real-Life Data Science Exercises Included
    Description
    : Learn to create Machine Learning Algorithms in Python and R from two Data Science experts. Code templates included.
  2. Deep Learning A-Z™ 2023: Neural Networks, AI & ChatGPT Bonus
    Description:
    Learn to create Deep Learning Algorithms in Python from two Machine Learning & Data Science experts. Templates included.

These courses cover a wide range of topics in data science and machine learning and are highly rated by learners. You can explore them to enhance your skills in these domains.

Related Articles:

Previous Article

Check Also

apriori algorithm in python

A Beginner’s Guide to Apriori Algorithm in Python

Spread the loveLast updated:8th October, 2023A Beginner’s Guide to Apriori Algorithm in Python Association Rule …