Python Library for Data Processing, Pandas
When dealing with data structured around axes such as sales by item, or customer inflow by time, this data is typically represented in a tabular format consisting of rows
and columns
.
Pandas
is one of the most widely used libraries in Python for handling tabular data.
By utilizing Pandas, you can systematically perform various tasks from basic loading and saving of data, to filtering and sorting, and even statistical analysis.
The Two Key Data Structures in Pandas
The core data structures in Pandas are Series
and DataFrame
.
1. Series
A Series is a one-dimensional data structure
similar to a column in an Excel spreadsheet.
Data is sequentially ordered, similar to a Python list (array).
Each data point has a unique index (identifier of the data's position), and you can access the data using this index.
import pandas as pd
# Creating a series
data_series = pd.Series([10, 20, 30, 40])
print(data_series)
# Output
# 0 10
# 1 20
# 2 30
# 3 40
# dtype: int64
2. DataFrame
A DataFrame is a two-dimensional data structure
consisting of multiple Series.
It has both rows and columns, and each column can have different data types.
Its structure is similar to that of an Excel sheet (spreadsheet).
import pandas as pd
# Creating a DataFrame of sales by item
data_frame = pd.DataFrame({
'Item': ['Apple', 'Banana', 'Strawberry', 'Grapes'],
'Sales': [1000, 2000, 1500, 3000]
})
print(data_frame)
# Output
# Item Sales
# 0 Apple 1000
# 1 Banana 2000
# 2 Strawberry 1500
# 3 Grapes 3000
In the code example above, a DataFrame is created with the columns Item
and Sales
.
For instance, the code 'Item': ['Apple', 'Banana', 'Strawberry', 'Grapes']
creates a Series similar to a column in an Excel spreadsheet, and these series are combined to form a DataFrame.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.