gpt4 book ai didi

python - 在 python 中如何使用数据透视表输出进行下一步分析?

转载 作者:太空宇宙 更新时间:2023-11-04 02:52:38 25 4
gpt4 key购买 nike

示例数据

District    Taluka  Circle  Crop    Yield_2006  Yield_2007  Yield_2008  Yield_2009
AHMEDNAGAR AKOLE AKOLE PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE KOTUL PADDY 637.2 1007.4 919.7 323.9
AHMEDNAGAR AKOLE RAJUR PADDY 857.8 1227.1 1114.5 506.5
AHMEDNAGAR AKOLE SAMSHE PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE BRAMHA PADDY 637.2 1007.4 919.7 323.9
AHMEDNAGAR AKOLE VIRGAO PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE SHENDI PADDY 857.8 1227.1 1114.5 506.5
AHMEDNAGAR AKOLE SAKWADI PADDY 857.8 1227.1 1114.5 506.5
AMRAVATI DHARNI DHARNI PADDY 590 888.6 437.8 201.9
AMRAVATI DHARNI DHULAT PADDY 489.7 863.3 277 227.8
AMRAVATI DHARNI HARSUL PADDY 590 888.6 437.8 201.9
AMRAVATI DHARNI SIKHEDA PADDY 489.7 863.3 277 227.8
AMRAVATI CHIKARA CHHDARA PADDY 539.8 698.5 388.9 373.8
AMRAVATI CHIKARA SEDOH PADDY 539.8 698.5 388.9 338.2
AMRAVATI CHIKARA CHURNI PADDY 539.8 698.5 388.9 338.2

代码:

>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> Data=pd.read_csv("/home/desktop/Desktop/noonion.csv")
>>> Data1 =Data[['District','Taluka','Circle','Crop', 'Yield_2006', 'Yield_2007','Yield_2008','Yield_2009']]
>>> pivot=pd.pivot_table(Data1,values=["Yield_2006", "Yield_2007", "Yield_2008", "Yield_2009"],index=["District","Crop"],aggfunc=[np.mean],fill_value=False)
>>> pivot.head()
mean
Yield_2006 Yield_2007 Yield_2008 Yield_2009
District Crop
AHMEDNAGAR BAJRA 781.804124 884.185567 770.402062 767.814433
BLACKGRAM 298.888889 517.722222 80.166667 608.166667
COTTON 722.241667 1000.156250 863.227083 870.489583
GREENGRAM 514.166667 660.938596 212.971930 512.380702
GROUNDNUT 843.243590 919.384615 815.717949 842.012821

现在,我想使用这个枢轴输出,

例如:我想创建一个新列“Average_Yield”,它是每种裁剪的 Yield_2006 到 Yield_2009 的平均值。

如何创建一个新列,在其中获得 yield-2006 到 yield-2009 的平均值,其中我的“Average_yield”列值四舍五入到小数点后 4 位?

最佳答案

您可以先从 aggfunc 中删除 [] 以便在列中不返回 MultiIndex 然后使用 mean按行 (axis=1) 与 round :

pivot=pd.pivot_table(Data1,values=["Yield_2006", "Yield_2007", "Yield_2008", "Yield_2009"],
index=["District","Crop"],
aggfunc=np.mean,fill_value=False)

pivot['Average_Yield'] = pivot.mean(axis=1).round(4)
print (pivot)
Yield_2006 Yield_2007 Yield_2008 Yield_2009 \
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000

Average_Yield
District Crop
AHMEDNAGAR PADDY 851.2188
AMRAVATI PADDY 495.8571

对于选择列,可以使用 loc子集:

pivot['Average_Yield'] = pivot.loc[:,'Yield_2006':'Yield_2007'].mean(axis=1).round(4)
print (pivot)
Yield_2006 Yield_2007 Yield_2008 Yield_2009 \
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000

Average_Yield
District Crop
AHMEDNAGAR PADDY 1011.6563
AMRAVATI PADDY 669.8643
pivot['Average_Yield'] = pivot[['Yield_2006','Yield_2007']].mean(axis=1).round(4)
print (pivot)
Yield_2006 Yield_2007 Yield_2008 Yield_2009 \
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000

Average_Yield
District Crop
AHMEDNAGAR PADDY 1011.6563
AMRAVATI PADDY 669.8643

关于python - 在 python 中如何使用数据透视表输出进行下一步分析?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43367712/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com