Calculate Summary Statistics with Pandas

How can we calculate the mean, standard deviation, etc., of large datasets all at once?

Defining and calculating functions for each item individually can be a very cumbersome task.

However, using the describe() method of DataFrames allows you to calculate summary statistics at once, including the number of entries, mean, standard deviation, minimum, and maximum values.

Calculate Summary Statistics
import pandas as pd

data_frame = pd.DataFrame({
    'Item': ['Apple', 'Banana', 'Strawberry', 'Grapes'],
    'Sales': [1000, 2000, 1500, 3000]
})

# Calculate summary statistics
summary_stats = data_frame.describe()
print(summary_stats)

The code data_frame.describe() returns a DataFrame with summary statistics (mean, standard deviation, minimum, maximum, etc.) of the DataFrame.

describe() Method Output
            Sales
count     4.000000
mean   1875.000000
std     866.025404
min    1000.000000
25%    1375.000000
50%    1750.000000
75%    2250.000000
max    3000.000000

The meanings of each term are as follows:

count: Number of entries
mean: Mean value
std: Standard deviation
min: Minimum value
25%, 50%, 75%: Percentiles
max: Maximum value

Handling Missing Values

Missing values in a dataset refer to instances where data is absent.

Pandas provides various methods to handle missing values.

Handling Missing Values Example
import pandas as pd

data_frame = pd.DataFrame({
    'Item': ['Apple', 'Banana', 'Strawberry', None],
    'Sales': [1000, 2000, 1500, None]
})

# Check for missing values
missing_values = data_frame.isnull()

# Replace missing values with 0
data_frame_filled = data_frame.fillna(0)

print(data_frame_filled)

Missing Values Replacement Result
         Item   Sales
0       Apple  1000.0
1      Banana  2000.0
2  Strawberry  1500.0
3           0     0.0

Code Explanation

data_frame.isnull() returns a DataFrame indicating the positions of missing values with True.
data_frame.fillna(0) returns a DataFrame where missing values are replaced with 0.
Instead of data_frame.fillna(0), you can use data_frame.dropna() to remove rows containing missing values.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Handling Missing Values​

Code Explanation​

Want to learn more?

Handling Missing Values

Code Explanation