gpt4 book ai didi

python - pandas *有效地*按组将有效值复制到其他行

转载 作者:行者123 更新时间:2023-12-01 08:22:09 25 4
gpt4 key购买 nike

"Premature optimisation is the root of all evil (but it's nice to have once you have an ugly solution)" D.Knuth

 鉴于此数据集

from io import StringIO
import pandas as pd

csv = StringIO("""country,year,surface,ground,tot_water,enviro,depend
Yemen,2012,2,1.5,2.1,0.55,0
Yemen,2013,,,,,
Yemen,2014,2,1.5,2.1,,0
Yemen,2015,,,,,
Yemen,2016,,,,,
Yemen,2017,,,,0.55,
Zambia,1995,,,,,
Zambia,1996,,,,,
Zambia,1997,104.8,47,104.8,31.48,23.47""")

df = pd.read_csv(csv)
df

Out[0]:
country year surface ground tot_water enviro depend
0 Yemen 2012 2.0 1.5 2.1 0.55 0.00
1 Yemen 2013 NaN NaN NaN NaN NaN
2 Yemen 2014 2.0 1.5 2.1 NaN 0.00
3 Yemen 2015 NaN NaN NaN NaN NaN
4 Yemen 2016 NaN NaN NaN NaN NaN
5 Yemen 2017 NaN NaN NaN 0.55 NaN
6 Zambia 1995 NaN NaN NaN NaN NaN
7 Zambia 1996 NaN NaN NaN NaN NaN
8 Zambia 1997 104.8 47.0 104.8 31.48 23.47

我想应用 ['surface', 'ground', 'tot_water', 'enviro'] 列中的有效值并将它们复制到所有国家/地区。我有一个解决方案,但需要进行一些优化。

伪代码:

  1. 循环遍历每个国家/地区并将数据帧过滤到该子集
  2. 查找该列的第一个有效值的索引
  3. 将该列的值设置为等于该有效值

我的解决方案

vars_ = ['surface', 'ground', 'tot_water', 'enviro']
# for each country
for country in df.country.unique():
# and each value in the
filter_ = df.country == country
for var in vars_:
valid_ix = df[filter_][var].first_valid_index()
df.loc[filter_, var] = df[var][valid_ix]

df

Out[]:
country year surface ground tot_water enviro depend
0 Yemen 2012 2.0 1.5 2.1 0.55 0.00
1 Yemen 2013 2.0 1.5 2.1 0.55 NaN
2 Yemen 2014 2.0 1.5 2.1 0.55 0.00
3 Yemen 2015 2.0 1.5 2.1 0.55 NaN
4 Yemen 2016 2.0 1.5 2.1 0.55 NaN
5 Yemen 2017 2.0 1.5 2.1 0.55 NaN
6 Zambia 1995 104.8 47.0 104.8 31.48 NaN
7 Zambia 1996 104.8 47.0 104.8 31.48 NaN
8 Zambia 1997 104.8 47.0 104.8 31.48 23.47

必须有一种更有效的方法。在中等规模的数据集上,这需要相当长的时间,而且 for 循环也很难看。任何建议/帮助将不胜感激!

最佳答案

您可以按国家/地区对数据进行分组并使用 bfill 和 ffill

df.groupby('country').bfill().ffill()

country year surface ground tot_water enviro depend
0 Yemen 2012 2.0 1.5 2.1 0.55 0.00
1 Yemen 2013 2.0 1.5 2.1 0.55 0.00
2 Yemen 2014 2.0 1.5 2.1 0.55 0.00
3 Yemen 2015 2.0 1.5 2.1 0.55 0.00
4 Yemen 2016 2.0 1.5 2.1 0.55 0.00
5 Yemen 2017 2.0 1.5 2.1 0.55 0.00
6 Zambia 1995 104.8 47.0 104.8 31.48 23.47
7 Zambia 1996 104.8 47.0 104.8 31.48 23.47
8 Zambia 1997 104.8 47.0 104.8 31.48 23.47

关于python - pandas *有效地*按组将有效值复制到其他行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54561274/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com