Clustering in Power BI: A Step-by-Step Guide
Clustering is a data analysis technique used to group similar data points together based on their characteristics. In Power BI, clustering helps you identify patterns, segment data, and make better decisions. This guide explains two approaches: using Power BI’s built-in clustering and using the Python visual.
1. Clustering in Power BI (Without Python)
Step 1: Prepare Your Data
-
Ensure your dataset contains numeric fields for clustering (e.g., Sales, Profit, Quantity).
-
You can also use categorical fields as a in Values filed (e.g., Region, Product Category).
Step 2: Create a Scatter Plot
-
Go to Visualizations → Select Scatter Chart.
-
Assign X-axis and Y-axis numeric fields.
-
Optionally, add a Legend field to color the points.
Step 3: Enable Clustering
-
Select the scatter plot.
-
click on (...) three dots More options ---> click Automatically find Clusters
-
Set up the setting if required, set the cluster size as per your requirement, by default it is Auto - it will create 6 clusters then click OK
-
Power BI will automatically group data into clusters based on similarity.
Step 4: Customize Clusters
-
Choose the number of clusters manually or let Power BI decide automatically.
-
Clusters will be color-coded for easy visualization.
-
You can hover over points to see cluster membership.
Use case: Segment customers based on purchase behavior, identify high-profit regions, or group products based on sales and quantity.
2. Clustering in Power BI Using Python Visual
Sometimes you need more advanced clustering or want to experiment with different algorithms. Power BI allows you to use Python scripts for clustering.
Step 1: Prepare Your Data
-
Load your dataset in Power BI.
-
Ensure numeric fields are included (e.g., Sales, Profit, Quantity).
Step 2: Add a Python Visual
-
Click on the Python visual icon in the Visualizations pane.
-
Drag the numeric fields you want to cluster into the Values section.
-
Power BI automatically makes these fields available as a dataframe called
dataset
in Python.
Step 3: Run Python for Clustering
-
Use Python to apply clustering algorithms like K-Means or Hierarchical Clustering.
-
The output can be scatter plots with clusters, color-coded points, or even advanced 3D visualizations.
Step 4: Customize Clusters
-
Change the number of clusters or clustering method depending on your analysis needs.
-
Python allows flexible clustering on multiple dimensions, which may not be possible with the built-in clustering tool.
Use case: Segment products by multiple metrics, perform customer profiling, or analyze patterns in large datasets like Superstore or sales records.
3. Tips for Effective Clustering in Power BI
-
Always use numeric fields for X and Y axes.
-
Add Legend fields for better visualization.
-
Use Python visual when you need advanced customization or more than 2 dimensions.
-
Keep the number of clusters meaningful for interpretation.
-
Hover over points to understand cluster composition.
Clustering in Power BI Using Python Visual
Clustering is the process of grouping similar data points based on patterns in the data. In Power BI, the Python visual allows you to apply advanced clustering algorithms, like K-Means, for more flexibility compared to the built-in clustering option.
Step 1: Prepare Your Data
-
Load your dataset into Power BI (e.g., Sample Superstore dataset).
-
Make sure your dataset contains numeric fields for clustering, such as:
-
Sales
-
Profit
-
Quantity
-
Step 2: Add a Python Visual
-
Click the Python visual (Py icon) in the Visualizations pane.
-
Drag the numeric fields you want to cluster into the Values section.
-
Power BI automatically passes these fields to Python as a dataframe called
dataset
.
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏