GroupBy and Aggregation Functions
One of the most powerful features in Pandas is the ability to group data and perform calculations on each group. This is useful when analyzing patterns across categories like sales per region, average scores per class, or revenue by product.
The groupby()
method splits your data into groups based on the values in one or more columns. Once grouped, you can apply aggregation functions such as:
sum()
– total value per groupmean()
– average value per groupcount()
– number of rows per groupmax()
/min()
– highest / lowest value in each group
A Real-Life Example
Imagine you have a dataset of sales transactions from different cities. You might want to:
- Calculate total sales for each city
- Find the average transaction amount per store
- Count how many transactions happened in each region
Pandas makes this easy with just a few lines of code.
Syntax Overview
Here’s a simple pattern:
df.groupby("ColumnName")["TargetColumn"].agg("aggregation_function")
You can also use .agg()
to apply multiple functions at once:
df.groupby("Category")["Amount"].agg(["sum", "mean", "count"])
This gives you a compact summary of key statistics for each group — ideal for data summaries and dashboards.
What’s Next?
You’ll now try grouping and summarizing data using real examples in the Jupyter Notebook.