网站使用培训方案海尔集团企业网站建设分析
Pandas 的透视表函数主要为 pivot() 和 pivot_table(),主要的功能为对 DataFrame 的行和列进行重新组合来重塑数据。
一、pivot 函数
pivot 函数只能对数据进行重塑,不能进行聚合
1. 数据准备
import pandas as pddf1 = pd.DataFrame({'department_id': [1, 2, 3, 3, 1, 1],'month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar'],'revenue': [8000, 9000, 10000, 7000, 8000, 6000,]
})df2 = pd.DataFrame({'department_id': [1, 1, 2, 3, 3, 1, 2],'month': ['Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Mar', 'Feb'],'revenue': [8000, 9000, 10000, 7000, 8000, 6000, 10000]
})print(df1)
print(df2)
 
df1department_id month  revenue
0              1   Jan     8000
1              2   Jan     9000
2              3   Jan    10000
3              3   Feb     7000
4              1   Feb     8000
5              1   Mar     6000df2department_id month  revenue
0              1   Jan     8000
1              1   Jan     9000
2              2   Jan    10000
3              3   Jan     7000
4              3   Feb     8000
5              1   Mar     6000
6              2   Feb    10000
 
2. 基本语法
"""调用方式一:pd.pivot(df, index=, columns=, values=)调用方式二:df.pivot(index=, columns=, values=)参数:1. index:指定作为行索引的列2. columns:指定作为列索引的列3. values:指定作为数据值的列
"""
dp = df1.pivot(index="department_id", columns="month", values="revenue")
# dp = pd.pivot(df1, index="department_id", columns="month", values="revenue")
print(dp)# dp2 = df2.pivot(index="department_id", columns="month", values="revenue") 
# error,Index contains duplicate entries, cannot reshape,存在重复数据,pivot 不能聚合,所以报错
 
month             Feb      Jan     Mar
department_id                         
1              8000.0   8000.0  6000.0
2                 NaN   9000.0     NaN
3              7000.0  10000.0     NaN
 
二、pivot_table 函数
pivot_table 函数既能对数据进行重塑,也可以进行聚合
1. 数据准备
import pandas as pddf1 = pd.DataFrame({'department_id': [1, 2, 3, 3, 1, 1],'month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar'],'revenue': [8000, 9000, 10000, 7000, 8000, 6000,]
})df2 = pd.DataFrame({'department_id': [1, 1, 2, 3, 3, 1, 2],'month': ['Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Mar', 'Feb'],'revenue': [8000, 9000, 10000, 7000, 8000, 6000, 10000]
})print(df1)
print(df2)
 
df1department_id month  revenue
0              1   Jan     8000
1              2   Jan     9000
2              3   Jan    10000
3              3   Feb     7000
4              1   Feb     8000
5              1   Mar     6000df2department_id month  revenue
0              1   Jan     8000
1              1   Jan     9000
2              2   Jan    10000
3              3   Jan     7000
4              3   Feb     8000
5              1   Mar     6000
6              2   Feb    10000
 
2. 基本语法
"""调用方式一:pd.pivot_table(df, index=, columns=, values=, aggfunc=)调用方式二:df.pivot_table(index=, columns=, values=, aggfunc=)参数:1. index:指定作为行索引的列2. columns:指定作为列索引的列3. values:指定作为数据值的列4. aggfunc:指定聚合的函数
"""
dp = pd.pivot_table(df1, index="department_id", columns="month", values="revenue")
# dp = df1.pivot_table(index="department_id", columns="month", values="revenue")
print(dp)
 
month             Feb      Jan     Mar
department_id                         
1              8000.0   8000.0  6000.0
2                 NaN   9000.0     NaN
3              7000.0  10000.0     NaN
 
# pivot_table 函数会对重复数据进行聚合,默认是 mean 函数
dp2 = pd.pivot_table(df2, index="department_id", columns="month", values="revenue")
print(dp2)
 
month              Feb      Jan     Mar
department_id                          
1                  NaN   8500.0  6000.0
2              10000.0  10000.0     NaN
3               8000.0   7000.0     NaN
 
# 使用 aggfunc 参数指定聚合函数
dp3 = pd.pivot_table(df2, index="department_id", columns="month", values="revenue", aggfunc="sum")
print(dp3)
 
month              Feb      Jan     Mar
department_id                          
1                  NaN  17000.0  6000.0
2              10000.0  10000.0     NaN
3               8000.0   7000.0     NaN
