gpt4 book ai didi

python - Pandas:按行划分多索引数据帧

转载 作者:行者123 更新时间:2023-11-30 22:57:22 25 4
gpt4 key购买 nike

我有一个带有多索引(面板)的数据框,我想将每个组()和每行的值按特定年份进行划分。

>>> fields
Out[39]: ['emplvl', 'population', 'estab', 'estab_pop', 'emp_pop']
>>> df[fields]
Out[40]:
emplvl population estab estab_pop emp_pop
county year
1001 2003 11134.500000 46800 801.75 0.017131 0.237917
2004 11209.166667 48366 824.00 0.017037 0.231757
2005 11452.166667 49676 870.75 0.017529 0.230537
2006 11259.250000 51328 862.50 0.016804 0.219359
2007 11403.333333 52405 879.25 0.016778 0.217600
2008 11272.833333 53277 890.25 0.016710 0.211589
2009 11003.833333 54135 877.00 0.016200 0.203267
2010 10693.916667 54632 877.00 0.016053 0.195745
2011 10627.000000 NaN 862.00 NaN NaN
2012 10136.916667 NaN 841.75 NaN NaN
1003 2003 51372.250000 151509 4272.00 0.028196 0.339071
2004 53450.583333 156266 4536.25 0.029029 0.342049
2005 56110.250000 162183 4880.50 0.030093 0.345969
2006 59291.000000 168121 5067.50 0.030142 0.352669
2007 62600.083333 172404 5337.25 0.030958 0.363101
2008 62611.500000 175827 5529.25 0.031447 0.356097
2009 58947.666667 179406 5273.75 0.029396 0.328571
2010 58139.583333 183195 5171.25 0.028228 0.317364
2011 59581.000000 NaN 5157.75 NaN NaN
2012 60440.250000 NaN 5171.75 NaN NaN

要除以的行

>>> df[fields].loc[df.index.get_level_values('year') == 2007, fields]
Out[32]:
emplvl population estab estab_pop emp_pop
county year
1001 2007 11403.333333 52405 879.25 0.016778 0.217600
1003 2007 62600.083333 172404 5337.25 0.030958 0.363101

但是,两者

df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields], axis=0)
df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields], axis=1)

给了我一个充满NaN的数据框,可能是因为pandas试图除以year索引,但没有找到任何可以分割的东西。

为了弥补这一点,我也尝试过

df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields].values)

这给了我ValueError:传递值的形状是(5, 2),索引暗示(5, 20)

最佳答案

我想你可以 reset_index与 df1 ,然后使用 div :

fields = ['emplvl', 'population', 'estab', 'estab_pop', 'emp_pop'] 

df1 = df.loc[df.index.get_level_values('year') == 2007, fields].reset_index(level=1)
print df1
year emplvl population estab estab_pop emp_pop
county
1001 2007 11403.333333 52405.0 879.25 0.016778 0.217600
1003 2007 62600.083333 172404.0 5337.25 0.030958 0.363101

print df.div(df1[fields], axis=0)
emplvl population estab estab_pop emp_pop
county year
1001 2003 0.976425 0.893045 0.911857 1.021039 1.093369
2004 0.982973 0.922927 0.937162 1.015437 1.065060
2005 1.004282 0.947925 0.990333 1.044761 1.059453
2006 0.987365 0.979449 0.980950 1.001550 1.008084
2007 1.000000 1.000000 1.000000 1.000000 1.000000
2008 0.988556 1.016640 1.012511 0.995947 0.972376
2009 0.964966 1.033012 0.997441 0.965550 0.934131
2010 0.937789 1.042496 0.997441 0.956789 0.899563
2011 0.931920 NaN 0.980381 NaN NaN
2012 0.888943 NaN 0.957350 NaN NaN
1003 2003 0.820642 0.878802 0.800412 0.910782 0.933820
2004 0.853842 0.906394 0.849923 0.937690 0.942022
2005 0.896329 0.940715 0.914422 0.972059 0.952818
2006 0.947139 0.975157 0.949459 0.973642 0.971270
2007 1.000000 1.000000 1.000000 1.000000 1.000000
2008 1.000182 1.019855 1.035974 1.015796 0.980711
2009 0.941655 1.040614 0.988102 0.949545 0.904902
2010 0.928746 1.062591 0.968898 0.911816 0.874038
2011 0.951772 NaN 0.966368 NaN NaN
2012 0.965498 NaN 0.968992 NaN NaN

关于python - Pandas:按行划分多索引数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36678611/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com