Skip to main content
Practice

Python Library for Data Processing, Pandas

When dealing with data structured around axes such as sales by item, or customer inflow by time, this data is typically represented in a tabular format consisting of rows and columns.

Pandas is one of the most widely used libraries in Python for handling tabular data.

By utilizing Pandas, you can systematically perform various tasks from basic loading and saving of data, to filtering and sorting, and even statistical analysis.


The Two Key Data Structures in Pandas

The core data structures in Pandas are Series and DataFrame.


1. Series

A Series is a one-dimensional data structure similar to a column in an Excel spreadsheet.

Data is sequentially ordered, similar to a Python list (array).

Each data point has a unique index (identifier of the data's position), and you can access the data using this index.

Example of Creating a Series
import pandas as pd

# Creating a series
data_series = pd.Series([10, 20, 30, 40])

print(data_series)
# Output
# 0 10
# 1 20
# 2 30
# 3 40
# dtype: int64

2. DataFrame

A DataFrame is a two-dimensional data structure consisting of multiple Series.

It has both rows and columns, and each column can have different data types.

Its structure is similar to that of an Excel sheet (spreadsheet).

Example of Creating a DataFrame
import pandas as pd

# Creating a DataFrame of sales by item
data_frame = pd.DataFrame({
'Item': ['Apple', 'Banana', 'Strawberry', 'Grapes'],
'Sales': [1000, 2000, 1500, 3000]
})

print(data_frame)
# Output
# Item Sales
# 0 Apple 1000
# 1 Banana 2000
# 2 Strawberry 1500
# 3 Grapes 3000

In the code example above, a DataFrame is created with the columns Item and Sales.

For instance, the code 'Item': ['Apple', 'Banana', 'Strawberry', 'Grapes'] creates a Series similar to a column in an Excel spreadsheet, and these series are combined to form a DataFrame.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.