gpt4 book ai didi

python - Pandas 数据框中的行操作

转载 作者:行者123 更新时间:2023-12-01 02:24:21 24 4
gpt4 key购买 nike

我有一个具有这种格式的世界指标数据集

country     year    indicatorName       value
USA 1970 Agricultural Land ...
USA 1970 Crop production ...
...
USA 2000 Agricultural Land ...
USA 2000 Crop production ...
...
Mexico 1970 Agricultural Land ...
Mexico 1970 Crop production ...
...
Mexico 2000 Agricultural Land ...
Mexico 2000 Crop production ...

这里有一些指标我没有包含,但这两个是我感兴趣的。我想划分对应的valueCrop productionAgricultural Landcountryyear 。我们将结果命名为crop_prod_density .

我不知道如何继续

df.groupby(['country', 'year'])

如何从此处执行此操作以产生以下输出:

  1. 添加新行指示器

country year indicatorName value
USA 1970 Agricultural Land ...
USA 1970 Crop production ...
USA 1970 crop_prod_density ...

  • 为分组(国家/地区、年份)的所有行添加具有相同值的新列
  • country year indicatorName value crop_prod_density
    USA 1970 Agricultural Land ... us_value_1970
    USA 1970 Crop production ... us_value_1970
    ...
    Mexico 2000 Agricultural Land ... mx_value_2000
    Mexico 2000 Crop production ... mx_value_2000

  • 仅包含此列值的新数据框
  • country year crop_prod_density
    USA 1970 us_value_1970
    ...
    USA 2000 us_value_2000
    ...
    Mexico 1970 mx_value_1970
    ...
    Mexico 2000 mx_value_2000

    最佳答案

    您可以先通过set_index reshape 形状与 unstack然后除以 div :

    print (df)
    country year indicatorName value
    0 USA 1970 Agricultural Land 10
    1 USA 1970 Crop production 2
    2 USA 2000 Agricultural Land 10
    3 USA 2000 Crop production 3
    4 Mexico 1970 Agricultural Land 10
    5 Mexico 1970 Crop production 5
    6 Mexico 2000 Agricultural Land 10
    7 Mexico 2000 Crop production 4

    df = (df.set_index(['country','year','indicatorName'])['value']
    .unstack()
    .assign(crop_prod_density=lambda x: x['Crop production'].div(x['Agricultural Land'])))
    print (df)
    indicatorName Agricultural Land Crop production crop_prod_density
    country year
    Mexico 1970 10 5 0.5
    2000 10 4 0.4
    USA 1970 10 2 0.2
    2000 10 3 0.3

    然后通过 stack reshape 回来:

    df1 = df.stack().reset_index(name='value')
    print (df1)
    country year indicatorName value
    0 Mexico 1970 Agricultural Land 10.0
    1 Mexico 1970 Crop production 5.0
    2 Mexico 1970 crop_prod_density 0.5
    3 Mexico 2000 Agricultural Land 10.0
    4 Mexico 2000 Crop production 4.0
    5 Mexico 2000 crop_prod_density 0.4
    6 USA 1970 Agricultural Land 10.0
    7 USA 1970 Crop production 2.0
    8 USA 1970 crop_prod_density 0.2
    9 USA 2000 Agricultural Land 10.0
    10 USA 2000 Crop production 3.0
    11 USA 2000 crop_prod_density 0.3

    对于原始的新列附加到索引新列,但最后需要将列的顺序更改为 reindex :

    df2 =(df.set_index(['crop_prod_density'], append=True)
    .stack()
    .reset_index(name='value')
    .reindex(columns=['country','year','indicatorName','value','crop_prod_density']))
    print (df2)
    country year indicatorName value crop_prod_density
    0 Mexico 1970 Agricultural Land 10 0.5
    1 Mexico 1970 Crop production 5 0.5
    2 Mexico 2000 Agricultural Land 10 0.4
    3 Mexico 2000 Crop production 4 0.4
    4 USA 1970 Agricultural Land 10 0.2
    5 USA 1970 Crop production 2 0.2
    6 USA 2000 Agricultural Land 10 0.3
    7 USA 2000 Crop production 3 0.3

    最后删除不必要的列并从 MultiIndex 创建列:

    df3 = (df.drop(['Crop production','Agricultural Land'], axis=1)
    .reset_index()
    .rename_axis(None, 1))
    print (df3)
    country year crop_prod_density
    0 Mexico 1970 0.5
    1 Mexico 2000 0.4
    2 USA 1970 0.2
    3 USA 2000 0.3

    关于python - Pandas 数据框中的行操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47546355/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com