gpt4 book ai didi

python - 从多个列中获取最近的观察和日期

转载 作者:太空狗 更新时间:2023-10-30 00:38:12 26 4
gpt4 key购买 nike

采用以下玩具 DataFrame:

data = np.arange(35, dtype=np.float32).reshape(7, 5)
data = pd.concat((
pd.DataFrame(list('abcdefg'), columns=['field1']),
pd.DataFrame(data, columns=['field2', '2014', '2015', '2016', '2017'])),
axis=1)

data.iloc[1:4, 4:] = np.nan
data.iloc[4, 3:] = np.nan

print(data)
field1 field2 2014 2015 2016 2017
0 a 0.0 1.0 2.0 3.0 4.0
1 b 5.0 6.0 7.0 NaN NaN
2 c 10.0 11.0 12.0 NaN NaN
3 d 15.0 16.0 17.0 NaN NaN
4 e 20.0 21.0 NaN NaN NaN
5 f 25.0 26.0 27.0 28.0 29.0
6 g 30.0 31.0 32.0 33.0 34.0

我想用两个字段替换“年”列 (2014-2017):最近的非空观察和该观察的相应年份。假设 field1 是唯一键。 (我不想做任何 groupby 操作,每条记录只有 1 行。)即:

  field1  field2   obs  date
0 a 0.0 4.0 2017
1 b 5.0 7.0 2015
2 c 10.0 12.0 2015
3 d 15.0 17.0 2015
4 e 20.0 21.0 2014
5 f 25.0 29.0 2017
6 g 30.0 34.0 2017

我已经走到这一步了:

pd.melt(data, id_vars=['field1', 'field2'], 
value_vars=['2014', '2015', '2016', '2017'])\
.dropna(subset=['value'])

field1 field2 variable value
0 a 0.0 2014 1.0
1 b 5.0 2014 6.0
2 c 10.0 2014 11.0
3 d 15.0 2014 16.0
4 e 20.0 2014 21.0
5 f 25.0 2014 26.0
6 g 30.0 2014 31.0
# ...

但我正在为如何回到所需的格式而苦苦挣扎。

最佳答案

也许:

d2 = data.melt(id_vars=["field1", "field2"], var_name="date", value_name="obs").dropna(subset=["obs"])
d2["date"] = d2["date"].astype(int)
df = d2.loc[d2.groupby(["field1", "field2"])["date"].idxmax()]

这给了我

   field1  field2  date   obs
21 a 0.0 2017 4.0
8 b 5.0 2015 7.0
9 c 10.0 2015 12.0
10 d 15.0 2015 17.0
4 e 20.0 2014 21.0
26 f 25.0 2017 29.0
27 g 30.0 2017 34.0

关于python - 从多个列中获取最近的观察和日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47782361/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com