Practical Program: Data Aggregation in Python

Practical Program: Data Aggregation in Python

This program demonstrates data aggregation using Pandas to analyze a dataset. Aggregation operations such as `sum`, `mean`, `count`, and `max` is applied to group and summarize the data.

Scenario: Sales Analysis  
We have a dataset of sales transactions for multiple regions, products, and salespersons. The goal is to aggregate and analyze the data to extract insights.
Code Implementation
import pandas as pd
# Sample dataset
data = {
    'Region': ['North', 'North', 'South', 'South', 'East', 'East', 'West', 'West'],
    'Salesperson': ['Alice', 'Bob', 'Alice', 'Charlie', 'Eva', 'David', 'Frank', 'Grace'],
    'Product': ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'D'],
    'Sales': [200, 300, 400, 500, 250, 450, 600, 550],
    'Units_Sold': [10, 15, 20, 25, 12, 18, 30, 28],
    'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar', 'Apr', 'Apr']
}
# Create a DataFrame
df = pd.DataFrame(data)
print("Original Data:")
print(df)
# --- 1. Group by Region and Aggregate Sales and Units Sold ---
region_agg = df.groupby('Region').agg({
    'Sales': 'sum',
    'Units_Sold': ['sum', 'mean']
})
print("\nAggregated Sales and Units Sold by Region:")
print(region_agg)
# --- 2. Group by Product and Find Total Sales and Maximum Units Sold ---
product_agg = df.groupby('Product').agg({
    'Sales': 'sum',
    'Units_Sold': 'max'
})
print("\nTotal Sales and Max Units Sold by Product:")
print(product_agg)
# --- 3. Group by Region and Product for Combined Analysis ---
region_product_agg = df.groupby(['Region', 'Product']).agg({
    'Sales': 'sum',
    'Units_Sold': 'mean'
})
print("\nSales and Average Units Sold by Region and Product:")
print(region_product_agg)
# --- 4. Aggregate Data by Salesperson ---
salesperson_agg = df.groupby('Salesperson').agg({
    'Sales': ['sum', 'max'],
    'Units_Sold': ['mean', 'count']
})
print("\nSales and Units Sold Aggregation by Salesperson:")
print(salesperson_agg)
# --- 5. Aggregate Data by Month ---
month_agg = df.groupby('Month').agg({
    'Sales': 'sum',
    'Units_Sold': 'sum'
})
print("\nMonthly Sales and Total Units Sold:")
print(month_agg)
# --- Optional: Resetting Index for Flattened Output ---
region_agg_reset = region_agg.reset_index()
print("\nFlattened Aggregated Data by Region:")
print(region_agg_reset)

Explanation of Code
1. Group by Single Column:
   - Aggregates data by `Region` or `Product` using multiple functions such as `sum` and `mean`.
2. Group by Multiple Columns:
   - Groups by both `Region` and `Product` for a detailed breakdown of sales and units sold.
3. Custom Aggregation:
   - Uses dictionaries to apply different aggregation functions to specific columns.
4. Advanced Aggregation:
   - Aggregates data by `Salesperson` and `Month` to analyze performance and monthly trends.
5. Reset Index:
   - Converts the hierarchical index into a flat data frame for easier readability.

Sample Output
Original Data:
   Region Salesperson Product  Sales      Units_Sold Month
0   North     Alice               A        200              10           Jan
1   North     Bob                 A        300              15           Jan
2   South      Alice              B        400              20           Feb
3   South      Charlie          B        500              25           Feb
4    East        Eva               C        250              12           Mar
5    East        David            C        450              18           Mar
6    West       Frank           D        600              30           Apr
7    West       Grace           D        550              28           Apr

Aggregated Sales and Units Sold by Region:
              Sales       Units_Sold           
              sum         sum       mean
Region                               
East      700            30          15.0
North     500          25          12.5
South     900          45           22.5
West     1150          58           29.0

Total Sales and Max Units Sold by Product:
                 Sales  Units_Sold
Product                    
A              500          15
B              900          25
C              700          18
D             1150          30

Sales and Average Units Sold by Region and Product:
                                       Sales      Units_Sold
Region Product                     
East           C                    700       15.0
North        A                    500       12.5
South         B                    900       22.5
West          D                   1150       29.0

Sales and Units Sold Aggregation by Salesperson:
                                Sales            Units_Sold       
                          sum  max       mean count
Salesperson                            
Alice                 600  400           15.0     2
Bob                   300  300           15.0     1
Charlie             500  500           25.0     1
David               450  450           18.0     1
Eva                   250  250           12.0     1
Frank               600  600           30.0     1
Grace               550  550           28.0     1

Monthly Sales and Total Units Sold:
                  Sales      Units_Sold
Month                  
Apr             1150             58
Feb              900              45
Jan              500              25
Mar             700              30

Flattened Aggregated Data by Region:
  Region  Sales  Units_Sold_sum  Units_Sold_mean
0   East    700               30                       15.0
1  North    500              25                       12.5
2  South    900              45                       22.5
3   West   1150              58                       29.0

Applications of Data Aggregation
  • Sales Analysis: Summarize sales by regions, products, or time periods.
  • Performance Tracking: Monitor salesperson performance.
  • Trend Analysis: Understand trends over months or quarters.
  • Inventory Management: Aggregate units sold for forecasting.
This program demonstrates essential aggregation techniques for effective data summarization and insights extraction in Python.

Comments