Skip to main content
Practice

Introduction to scipy.stats

The scipy.stats module is one of the most powerful parts of SciPy.

It provides tools for statistical analysis, including probability distributions, statistical tests, and summary statistics.


Setting Up

First, import the required modules:

Import NumPy and SciPy Stats
import numpy as np
from scipy import stats

Example 1: Summary Statistics

You can use scipy.stats to calculate summary statistics like the mean, median, and mode.

Summary Statistics
data = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11]

mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data, keepdims=True)

print("Mean:", mean)
print("Median:", median)
print("Mode:", mode.mode[0], "Frequency:", mode.count[0])

In this example, we calculate the mean, median, and mode of the dataset.


Example 2: Hypothesis Testing

You can use scipy.stats to perform a one-sample t-test.

One-Sample t-Test
# Test if the mean of data is significantly different from 5
t_stat, p_value = stats.ttest_1samp(data, 5)

print("t-statistic:", t_stat)
print("p-value:", p_value)

If the p-value is less than 0.05, we reject the null hypothesis and conclude that the mean is significantly different from 5.


Example 3: Probability Distributions

You can use scipy.stats to generate the probability density function (PDF) of a normal distribution.

Normal Distribution PDF
x = np.linspace(-3, 3, 100)
pdf = stats.norm.pdf(x, loc=0, scale=1)

print("First 5 PDF values:", pdf[:5])

In this example, we generate the PDF of a normal distribution with a mean of 0 and a standard deviation of 1.


Key Takeaways

scipy.stats is the go-to module for statistical analysis in Python. It provides tools for:

  • Summary statistics
  • Hypothesis testing
  • Probability distributions

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.