Power BI project on Healthcare Dataset
Problem Statement
Dataset Description(Download Dataset)
Columns:
-
Name, Age, Gender, Blood Type, Medical Condition, Date of Admission, Doctor, Hospital, Insurance Provider, Billing Amount, Room Number, Admission Type, Discharge Date, Medication, Test Results
Assumption: table is called Admissions
after import. (If different, replace with your actual table name.)
Power BI Step-by-Step Guide for Healthcare Dataset
Step 1: Connect to Your Dataset
(A) If using CSV:
-
Click Home → Get Data → Text/CSV.
-
A file browser will open.
-
Navigate to your saved folder and select
healthcare_patients_cleaned.csv
. -
Click Open.
(B) If using Excel:
-
Click Home → Get Data → Excel Workbook.
-
Browse and select
healthcare_patients_cleaned.xlsx
. -
Click Open.
Step 2: Preview Your Dataset
-
Power BI opens a preview window.
-
You’ll see your dataset with columns like:
Name, Age, Gender, Blood Type, Medical Condition, Date of Admission, Doctor, Hospital, Insurance Provider, Billing Amount, Room Number, Admission Type, Discharge Date
.
👉 At this stage you have two options:
-
Load → Loads the dataset directly.
-
Transform Data → Opens Power Query Editor to clean and prepare data.
✅ Choose "Transform Data".
Step 3: Open Power Query Editor
-
You’ll see your dataset in a table-like view.
-
On the right side, you’ll see Applied Steps (Power BI records every change).
-
On the top ribbon, you’ll see Transform and Add Column options.
👉 This is the main place to clean, shape, and enrich the dataset.
Step 4: Correct Column Data Types
⚡ This is very important for analysis.
-
Text Columns → Name, Gender, Blood Type, Medical Condition, Doctor, Hospital, Insurance Provider, Admission Type.
-
Whole Number → Age, Room Number.
-
Decimal Number → Billing Amount.
-
Date → Date of Admission, Discharge Date.
-
(Derived Columns – will add later) → LengthOfStay, BillingPerDay.
Step 5: Create New Columns (if required)
Sometimes raw data isn’t enough. In Power Query Editor, you can add calculated columns that make analysis easier.
(a) Create LengthOfStay
(in days)
-
Select Add Column → Custom Column.
-
Enter formula:
Rename column → LengthOfStay
.
Data Type → Whole Number.
(b) Create BillingPerDay
-
Again, go to Add Column → Custom Column.
-
Enter formula:
[Billing Amount] / [LengthOfStay]
Rename column → BillingPerDay
.
Data Type → Decimal Number.
(c) Clean Name Column (Optional)
Patient names have random capitalizations.
-
Select Name → Right-click → Transform → Capitalize Each Word.
👉 Example: “DaNnY sMitH” → “Danny Smith”.
(d) Standardize Gender Column
Sometimes Gender entries may vary (“Male”, “male”, “M”).
-
Select Gender → Right-click → Transform → Format → Capitalize Each Word.
-
Replace abbreviations if any (e.g.,
M → Male
).
(e) Handle Missing/Incorrect Data
-
If any ####### still appears in date columns → recheck Data Type → Date.
-
Remove duplicates: Home → Remove Rows → Remove Duplicates.
-
Handle nulls: Transform → Replace Values (e.g., Missing Insurance → "Self-Pay").
Open Modeling → New table
-
In Power BI Desktop click Modeling (top) → New table.
Paste this robust DAX to create the Date table
This version guards against missing dates by using sensible fallbacks.
Verify the new table
-
Go to Data view and open the
Date
table. -
You should see a continuous list of dates and added columns (Year, MonthName, MonthNumber, Quarter, etc.).
Mark as a Date Table
-
Select the
Date
table → Modeling → Mark as date table → Mark as date table. -
Choose the
Date
column when prompted and confirm. -
This step lets Power BI use time-intelligence functions properly.
Create relationships
-
Go to Model view.
-
Drag
Date[Date]
ontoHealthcare[Date of Admission]
to create a one-to-many relationship (Date → Healthcare). -
Optionally: if you need to analyze by discharge date too, you can create a second relationship from
Date[Date]
→Healthcare[Discharge Date]
. (If both exist, keep the Discharge relationship inactive and useUSERELATIONSHIP()
in measures when needed.) Sort MonthName by MonthNumber (for correct chronological labels)
-
Select the
Date
table → click theMonthName
column → Column tools → Sort by Column → MonthNumber. -
This ensures visuals show January → February → ... instead of alphabetical order.
-
-
Create a simple test measure (to validate the Date table)
-
Modeling → New measure, paste: following code
Total Admissions = COUNTROWS('Healthcare')
Build a simple line chart:
Date[Date]
on X axis,Total Admissions
on Y axis.-
Verify you see a continuous timeline (months/dates with zero admissions will appear because Date table is continuous).
-
Create time-intelligence measures (examples)
-
After marking Date table, these will work properly:
Following DAXBilling YTD = TOTALYTD( SUM('Healthcare'[Billing Amount]), 'Date'[Date])
Admissions LY = CALCULATE([Total Admissions], SAMEPERIODLASTYEAR('Date'[Date]))- Admissions YTD = TOTALYTD([Total Admissions], 'Date'[Date])
- Research questions
1) How do admissions vary over time (monthly / yearly trends)?
Why: staffing, seasonal planning.
Required columns: Date of Admission
, PatientID
(or Name), Hospital
, Medical Condition
.
Visuals
-
Line chart: X =
Date[MonthYear]
, Y =Total Admissions
. -
Slicer:
Hospital
,Medical Condition
.
2) Which medical conditions drive the longest Length of Stay (LOS)?
Why: pathway redesign, targeted interventions.
Required columns: Medical Condition
, LengthOfStay
(create in Power Query if not present).
DAX measures : Avg LOS = AVERAGE('Patients'[LengthOfStay])
Visuals
-
Bar chart: Axis =
Medical Condition
, Values =Avg LOS
(sort descending).
3) Are emergency admissions more costly than elective/urgent ones?
Why: ER resourcing and cost monitoring.
Required columns: Admission Type
, Billing Amount
.
DAX measures :
Total Billing = SUM('Patients'[Billing Amount])
Avg Billing = AVERAGE('Patients'[Billing Amount])
Avg Billing - Emergency =
CALCULATE( [Avg Billing], FILTER('Patients', 'Patients'[AdmissionType] = "Emergency") )
Avg Billing - Elective =
CALCULATE( [Avg Billing], FILTER('Patients', 'Patients'[AdmissionType] = "Elective") )
Pct Emergency Admissions =
DIVIDE( CALCULATE(COUNTROWS('Patients'), 'Patients'[AdmissionType] = "Emergency"), [Total Admissions] )
Visuals
-
Clustered bar:
AdmissionType
on axis,Avg Billing
on values. -
KPI card:
% Emergency Admissions
.
4) Which hospitals / doctors generate the highest billing?
Why: vendor negotiation, cost control.
Required columns: Hospital
, Doctor
, Billing Amount
.
DAX measures :
Total Billing = SUM('Patients'[Billing Amount])
Visuals
-
Bar chart: Hospital vs
Total Billing
(Top N filter = top 10). -
Table: Doctor, Count of Admissions, Total Billing, Avg LOS.
5) Are older patients more expensive / longer stays?
Why: geriatric care planning.
Required columns: Age
, Billing Amount
, LengthOfStay
.
Create AgeGroup
(Add Column → Custom Column):
if [Age] < 18 then "Child"
else if [Age] <= 35 then "Young Adult"
else if [Age] <= 60 then "Adult"
else "Senior"
DAX measures :
Avg LOS by AgeGroup = AVERAGE('Patients'[LengthOfStay])
Avg Billing by AgeGroup = AVERAGE('Patients'[Billing Amount])
Visuals
-
Bar charts:
AgeGroup
vsAvg Billing
,AgeGroup
vsAvg LOS
. -
Scatter:
Age
(X) vsBilling Amount
(Y), add trendline (Analytics pane).
6) Insurance provider analysis — who has the highest average claim?
Why: negotiate rates; check claim patterns.
Required columns: Insurance Provider
, Billing Amount
.
Map insurers to categories (Public/Private) if needed:
-
Add Column → Conditional Column:
-
If
Insurance Provider
contains "Medicare" → "Public" else "Private". - Total Billing = SUM('Patients'[Billing Amount])
- Avg Billing = AVERAGE('Patients'[Billing Amount])
- Claims Count = COUNTROWS('Patients')
Visuals
-
Bar chart:
Insurance Provider
vsAvg Billing
.
7) Bed occupancy & capacity planning (occupied beds per day)
Why: operational planning and surge readiness.
Required columns: Date of Admission
, Discharge Date
and Date
table.
DAX measure (occupied beds on a date context)
Occupied Beds =
VAR CurrentDate = MAX('Date'[Date])
RETURN
CALCULATE(
COUNTROWS('Patients'),
FILTER(ALL('Patients'),
'Patients'[Date of Admission] <= CurrentDate &&
'Patients'[Discharge Date] >= CurrentDate
)
)
Visuals
-
Line chart: X =
Date[Date]
, Y =Occupied Beds
. -
Compare to capacity (card or gauge).
8) Patient segmentation (clustering): find cohorts (low cost vs high cost)
Why: tailor management to patient types.
Required columns: Age
, LengthOfStay
, Billing Amount
, Medical Condition
, AdmissionType
.
Approach
-
Use Python visual in Power BI or export to a notebook.
-
Standardize numeric features (scaling).
-
Run KMeans (k = 2..4).
-
Append cluster label back to dataset.
Visuals
-
Scatter:
LengthOfStay
vsBilling Amount
colored by cluster. -
Cluster profile table (avg age, avg billing per cluster).
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏