Power BI project on Healthcare Dataset

 

Power BI project on Healthcare Dataset

Problem Statement

In today’s data-driven environment, organizations across industries are challenged not only to collect vast amounts of data but also to transform it into actionable insights that support strategic decision-making. Traditional reporting techniques often fall short in providing interactive, real-time, and predictive capabilities. As a result, decision-makers struggle with fragmented dashboards, static reports, and limited forecasting tools that hinder proactive business planning.

Dataset Description(Download Dataset)

Columns:

  • Name, Age, Gender, Blood Type, Medical Condition, Date of Admission, Doctor, Hospital, Insurance Provider, Billing Amount, Room Number, Admission Type, Discharge Date, Medication, Test Results

Assumption: table is called Admissions after import. (If different, replace with your actual table name.)

Power BI Step-by-Step Guide for Healthcare Dataset


Step 1: Connect to Your Dataset

(A) If using CSV:

  • Click Home → Get Data → Text/CSV.

  • A file browser will open.

  • Navigate to your saved folder and select healthcare_patients_cleaned.csv.

  • Click Open.

(B) If using Excel:

  • Click Home → Get Data → Excel Workbook.

  • Browse and select healthcare_patients_cleaned.xlsx.

  • Click Open.


Step 2: Preview Your Dataset

  • Power BI opens a preview window.

  • You’ll see your dataset with columns like:
    Name, Age, Gender, Blood Type, Medical Condition, Date of Admission, Doctor, Hospital, Insurance Provider, Billing Amount, Room Number, Admission Type, Discharge Date.

👉 At this stage you have two options:

  • Load → Loads the dataset directly.

  • Transform Data → Opens Power Query Editor to clean and prepare data.

Choose "Transform Data".


Step 3: Open Power Query Editor

  • You’ll see your dataset in a table-like view.

  • On the right side, you’ll see Applied Steps (Power BI records every change).

  • On the top ribbon, you’ll see Transform and Add Column options.

👉 This is the main place to clean, shape, and enrich the dataset.


Step 4: Correct Column Data Types

⚡ This is very important for analysis.

  • Text Columns → Name, Gender, Blood Type, Medical Condition, Doctor, Hospital, Insurance Provider, Admission Type.

  • Whole Number → Age, Room Number.

  • Decimal Number → Billing Amount.

  • Date → Date of Admission, Discharge Date.

  • (Derived Columns – will add later) → LengthOfStay, BillingPerDay.


Step 5: Create New Columns (if required)

Sometimes raw data isn’t enough. In Power Query Editor, you can add calculated columns that make analysis easier.


(a) Create LengthOfStay (in days)

  • Select Add Column → Custom Column.

  • Enter formula:


Duration.Days([Discharge Date] - [Date of Admission])

  • Rename column → LengthOfStay.

  • Data Type → Whole Number.

  • (b) Create BillingPerDay

    • Again, go to Add Column → Custom Column.

    • Enter formula: 

  • [Billing Amount] / [LengthOfStay]  

  • Rename column → BillingPerDay.

  • Data Type → Decimal Number.

  • (c) Clean Name Column (Optional)

    Patient names have random capitalizations.

    • Select Name → Right-click → Transform → Capitalize Each Word.
      👉 Example: “DaNnY sMitH” → “Danny Smith”.

  • (d) Standardize Gender Column

    Sometimes Gender entries may vary (“Male”, “male”, “M”).

    • Select Gender → Right-click → Transform → Format → Capitalize Each Word.

    • Replace abbreviations if any (e.g., M → Male).

  • (e) Handle Missing/Incorrect Data

    • If any ####### still appears in date columns → recheck Data Type → Date.

    • Remove duplicates: Home → Remove Rows → Remove Duplicates.

    • Handle nulls: Transform → Replace Values (e.g., Missing Insurance → "Self-Pay").

  • Creating the Date Table 
  • Open Modeling → New table

    • In Power BI Desktop click Modeling (top) → New table.

    • Paste this robust DAX to create the Date table

    • This version guards against missing dates by using sensible fallbacks. 

    DAX: 
    Date = 
    VAR MinDateRaw = MINX(ALL('Healthcare'), 'Healthcare'[Date of Admission])
    VAR MaxDateRaw = MAXX(ALL('Healthcare'), 'Healthcare'[Discharge Date])
    VAR StartDate = COALESCE(MinDateRaw, DATE(2020,1,1))    -- fallback if no admission dates
    VAR EndDate   = COALESCE(MaxDateRaw, TODAY())           -- fallback to today if no discharge
    RETURN
    ADDCOLUMNS(
      CALENDAR(StartDate, EndDate),
      "Year", YEAR([Date]),
      "MonthNumber", MONTH([Date]),
      "MonthName", FORMAT([Date], "MMMM"),
      "MonthYear", FORMAT([Date], "MMM yyyy"),
      "Quarter", "Q" & FORMAT([Date], "Q"),
      "WeekOfYear", WEEKNUM([Date], 2),
      "Day", DAY([Date]),
      "DayOfWeekNumber", WEEKDAY([Date], 2),
      "DayOfWeek", FORMAT([Date], "dddd"),
      "IsWeekend", IF(WEEKDAY([Date],2) >= 6, TRUE(), FALSE())
    )

  • Verify the new table

    • Go to Data view and open the Date table.

    • You should see a continuous list of dates and added columns (Year, MonthName, MonthNumber, Quarter, etc.).

  • Mark as a Date Table

    • Select the Date table → Modeling → Mark as date table → Mark as date table.

    • Choose the Date column when prompted and confirm.

    • This step lets Power BI use time-intelligence functions properly.

  • Create relationships

    • Go to Model view.

    • Drag Date[Date] onto Healthcare[Date of Admission] to create a one-to-many relationship (Date → Healthcare).

    • Optionally: if you need to analyze by discharge date too, you can create a second relationship from Date[Date]Healthcare[Discharge Date]. (If both exist, keep the Discharge relationship inactive and use USERELATIONSHIP() in measures when needed.)

    • Sort MonthName by MonthNumber (for correct chronological labels)

      • Select the Date table → click the MonthName column → Column tools → Sort by Column → MonthNumber.

      • This ensures visuals show January → February → ... instead of alphabetical order.

    • Create a simple test measure (to validate the Date table)

      • Modeling → New measure, paste: following code   

      •  Total Admissions = COUNTROWS('Healthcare')

        • Build a simple line chart: Date[Date] on X axis, Total Admissions on Y axis.

        • Verify you see a continuous timeline (months/dates with zero admissions will appear because Date table is continuous).

        1. Create time-intelligence measures (examples)

          • After marking Date table, these will work properly:

        Following DAX 

        Billing YTD = TOTALYTD( SUM('Healthcare'[Billing Amount]), 'Date'[Date])

        Admissions LY = CALCULATE([Total Admissions], SAMEPERIODLASTYEAR('Date'[Date]))
      • Admissions YTD = TOTALYTD([Total Admissions], 'Date'[Date])

    • Research questions 

    1) How do admissions vary over time (monthly / yearly trends)?

    Why: staffing, seasonal planning.

    Required columns: Date of Admission, PatientID (or Name), Hospital, Medical Condition

  • Visuals

    • Line chart: X = Date[MonthYear], Y = Total Admissions.

    • Slicer: Hospital, Medical Condition.

  • 2) Which medical conditions drive the longest Length of Stay (LOS)?

    Why: pathway redesign, targeted interventions.

    Required columns: Medical Condition, LengthOfStay (create in Power Query if not present). 

  • DAX measures  : Avg LOS = AVERAGE('Patients'[LengthOfStay])  

    Visuals

    • Bar chart: Axis = Medical Condition, Values = Avg LOS (sort descending).  

  • 3) Are emergency admissions more costly than elective/urgent ones?

    Why: ER resourcing and cost monitoring.

    Required columns: Admission Type, Billing Amount.  

  • DAX measures  :  

  • Total Billing = SUM('Patients'[Billing Amount])

    Avg Billing = AVERAGE('Patients'[Billing Amount])


    Avg Billing - Emergency = 

    CALCULATE( [Avg Billing], FILTER('Patients', 'Patients'[AdmissionType] = "Emergency") )


    Avg Billing - Elective = 

    CALCULATE( [Avg Billing], FILTER('Patients', 'Patients'[AdmissionType] = "Elective") )


    Pct Emergency Admissions = 

    DIVIDE( CALCULATE(COUNTROWS('Patients'), 'Patients'[AdmissionType] = "Emergency"), [Total Admissions] )   

  • Visuals

    • Clustered bar: AdmissionType on axis, Avg Billing on values.

    • KPI card: % Emergency Admissions.   

  • 4) Which hospitals / doctors generate the highest billing?

    Why: vendor negotiation, cost control.

    Required columns: Hospital, Doctor, Billing Amount.  

  • DAX measures  : 

  • Total Billing = SUM('Patients'[Billing Amount])  

    Visuals

    • Bar chart: Hospital vs Total Billing (Top N filter = top 10).

    • Table: Doctor, Count of Admissions, Total Billing, Avg LOS.    

  • 5) Are older patients more expensive / longer stays?

    Why: geriatric care planning.

    Required columns: Age, Billing Amount, LengthOfStay.  

  • Create AgeGroup (Add Column → Custom Column):  

  • if [Age] < 18 then "Child" 

    else if [Age] <= 35 then "Young Adult" 

    else if [Age] <= 60 then "Adult" 

    else "Senior"  

  • DAX measures :  

    Avg LOS by AgeGroup = AVERAGE('Patients'[LengthOfStay])

  • Avg Billing by AgeGroup = AVERAGE('Patients'[Billing Amount])  

  • Visuals

    • Bar charts: AgeGroup vs Avg Billing, AgeGroup vs Avg LOS.

    • Scatter: Age (X) vs Billing Amount (Y), add trendline (Analytics pane).   

  • 6) Insurance provider analysis — who has the highest average claim?

    Why: negotiate rates; check claim patterns.

    Required columns: Insurance Provider, Billing Amount.   

  • Map insurers to categories (Public/Private) if needed:

    • Add Column → Conditional Column:

      • If Insurance Provider contains "Medicare" → "Public" else "Private".    

    • Total Billing = SUM('Patients'[Billing Amount])
    • Avg Billing = AVERAGE('Patients'[Billing Amount])
    • Claims Count = COUNTROWS('Patients')  
  • Visuals

    • Bar chart: Insurance Provider vs Avg Billing.  

  • 7) Bed occupancy & capacity planning (occupied beds per day)

    Why: operational planning and surge readiness.

    Required columns: Date of Admission, Discharge Date and Date table.   

  •  DAX measure (occupied beds on a date context) 

    Occupied Beds = 

    VAR CurrentDate = MAX('Date'[Date])

    RETURN

    CALCULATE(

        COUNTROWS('Patients'),

        FILTER(ALL('Patients'),

            'Patients'[Date of Admission] <= CurrentDate &&

            'Patients'[Discharge Date] >= CurrentDate

        )

    )   

  • Visuals

    • Line chart: X = Date[Date], Y = Occupied Beds.

    • Compare to capacity (card or gauge).  

  • 8) Patient segmentation (clustering): find cohorts (low cost vs high cost)

    Why: tailor management to patient types.

    Required columns: Age, LengthOfStay, Billing Amount, Medical Condition, AdmissionType.   

  • Approach

    1. Use Python visual in Power BI or export to a notebook.

    2. Standardize numeric features (scaling).

    3. Run KMeans (k = 2..4).

    4. Append cluster label back to dataset.  

  • from sklearn.preprocessing import StandardScaler
  • from sklearn.cluster import KMeans

  • X = df[['Age','LengthOfStay','Billing Amount']].dropna()
  • scaler = StandardScaler().fit_transform(X)
  • kmeans = KMeans(n_clusters=3, random_state=42).fit(scaler)
  • df['cluster'] = kmeans.labels_  

    Visuals

    • Scatter: LengthOfStay vs Billing Amount colored by cluster.

    • Cluster profile table (avg age, avg billing per cluster).  


  • टिप्पणी पोस्ट करा

    0 टिप्पण्या