Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Manipulating Data with DataFrames

A DataFrame in Pandas is a table-like data structure for systematically handling data, similar to Excel.

A DataFrame is a 2-dimensional array composed of multiple series, with both rows and columns.

Below is a simple code example that creates a DataFrame containing items and sales data and demonstrates data manipulation.

Data Manipulation Example
import pandas as pd

# Create DataFrame
data_frame = pd.DataFrame({
'Item': ['Apple', 'Banana', 'Strawberry', 'Grape'],
'Sales': [1000, 2000, 1500, 3000]
})

# Select a specific column
sales = data_frame['Sales']
print("sales:", sales)

# Filter rows based on a condition
filtered_data = data_frame[data_frame['Sales'] > 1500]
print("filtered_data:", filtered_data)

# Sort data
sorted_data = data_frame.sort_values(by='Sales', ascending=False)
print("sorted_data:", sorted_data)

  1. sales = data_frame['Sales'] selects only the 'Sales' column from the DataFrame, returning it as a series.
Output of print(sales)
0    1000
1 2000
2 1500
3 3000
Name: Sales, dtype: int64

  1. filtered_data = data_frame[data_frame['Sales'] > 1500] filters and creates a new DataFrame with rows where the 'Sales' value is greater than 1500.
Output of print(filtered_data)
      Item  Sales
1 Banana 2000
3 Grape 3000

  1. sorted_data = data_frame.sort_values(by='Sales', ascending=False) sorts the DataFrame in descending order based on the 'Sales' column.
Output of print(sorted_data)
      Item  Sales
3 Grape 3000
1 Banana 2000
2 Strawberry 1500
0 Apple 1000

Calculating Maximum, Minimum, and Average Values

Methods for calculating the maximum, minimum, and average values of a specific DataFrame column are as follows:

  • max(): Maximum value
  • min(): Minimum value
  • mean(): Average value

Below is an example that calculates the maximum, minimum, and average values of the 'Sales' column.


Calculating Maximum, Minimum, and Average Values
import pandas as pd

data_frame = pd.DataFrame({
'Item': ['Apple', 'Banana', 'Strawberry', 'Grape'],
'Sales': [1000, 2000, 1500, 3000]
})

# Maximum value
max_sales = data_frame['Sales'].max()
# Output: Maximum value: 3000
print(f'Maximum value: {max_sales}')

# Minimum value
min_sales = data_frame['Sales'].min()
# Output: Minimum value: 1000
print(f'Minimum value: {min_sales}')

# Average value
mean_sales = data_frame['Sales'].mean()
# Output: Average value: 1875.0
print(f'Average value: {mean_sales}')

With Pandas, you can easily perform a variety of data manipulations and calculations.


Reference

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.