Introduction to scipy.stats
The scipy.stats module is one of the most powerful and versatile parts of SciPy.
It provides a full suite of tools for statistical analysis — including probability distributions, hypothesis testing, and summary statistics.
Setting Up
First, import the required modules:
import numpy as np
from scipy import stats
Example 1: Summary Statistics
You can use scipy.stats to calculate key descriptive statistics such as the mean, median, and mode — all in just a few lines of code.
data = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11]
mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data, keepdims=True)
print("Mean:", mean)
print("Median:", median)
print("Mode:", mode.mode[0], "Frequency:", mode.count[0])
In this example, we calculate the mean, median, and mode of the dataset.
Example 2: Hypothesis Testing
You can also use scipy.stats to perform hypothesis testing, such as a one-sample t-test to compare a sample mean against a known value.
# Test if the mean of data is significantly different from 5
t_stat, p_value = stats.ttest_1samp(data, 5)
print("t-statistic:", t_stat)
print("p-value:", p_value)
If the p-value is below 0.05, we reject the null hypothesis — meaning the sample mean is statistically different from 5.
Example 3: Probability Distributions
You can also use scipy.stats to work with probability distributions, such as generating the probability density function (PDF) for a normal curve.
x = np.linspace(-3, 3, 100)
pdf = stats.norm.pdf(x, loc=0, scale=1)
print("First 5 PDF values:", pdf[:5])
In this example, we generate the PDF of a normal distribution with a mean of 0 and a standard deviation of 1.
Key Takeaways
scipy.stats is your go-to toolkit for statistical analysis in Python. It provides powerful methods for:
- Summary statistics – quick descriptive insights
- Hypothesis testing – comparing groups or means
- Probability distributions – modeling and simulation
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.