Distribution Plots (histplot, kdeplot)
Visualizing data distributions helps you understand how your data is spread, detect patterns, and identify potential outliers.
Seaborn provides two main tools for this:
histplot()
– shows the frequency distribution of a dataset.kdeplot()
– shows the probability density function (smoothed distribution curve).
Using histplot()
The histplot()
function creates a histogram that shows how many data points fall into each range (bin).
Basic Histogram
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
sns.histplot(data=tips, x="total_bill")
plt.title("Distribution of Total Bills")
plt.show()
Key points:
x
specifies the variable to plot.- The plot is divided into bins (intervals) along the X-axis.
- The height of each bar shows how many observations fall into that bin.
Using kdeplot()
The kdeplot()
function displays a smooth curve representing the estimated probability density of the data.
Basic KDE Plot
sns.kdeplot(data=tips, x="total_bill")
plt.title("KDE of Total Bills")
plt.show()
Key points:
- KDE = Kernel Density Estimate (a smoothed version of the histogram).
- Good for showing trends in continuous data.
- Can be combined with
histplot()
for more context.
Combining Histogram and KDE
You can combine both in a single histplot()
by setting kde=True
:
Histogram with KDE Overlay
sns.histplot(data=tips, x="total_bill", kde=True)
plt.title("Total Bill Distribution with KDE")
plt.show()
In the next Jupyter Notebook, you will experiment with:
- Changing bin sizes in histograms.
- Adding hue categories to compare groups.
- Styling KDE plots for clarity.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.