gpt4 book ai didi

python - pandas 按组填充多列

转载 作者:行者123 更新时间:2023-12-04 13:29:12 25 4
gpt4 key购买 nike

dataset like this one (CSV 格式),其中有几列带有值,我该如何使用 fillna旁边 df.groupby("DateSent")min()/3 填充所有需要的列组的?

In [5]: df.head()
Out[5]:
ID DateAcquired DateSent data value measurement values
0 1 20210518 20220110 6358.434713 556.0 317.869897 3.565781
1 1 20210719 20220210 6508.458382 1468.0 774.337509 5.565384
2 1 20210719 20220310 6508.466246 1.0 40.837533 1.278085
3 1 20200420 20220410 6507.664194 48.0 64.335047 1.604183
4 1 20210328 20220510 6508.451227 0.0 40.337486 1.270236
根据 this other thread on SO ,一种方法是一一:
df["data"]        = df.groupby("DateSent")["data"].transform(lambda x: x.fillna(x.min()/3))
df["value"] = df.groupby("DateSent")["value"].transform(lambda x: x.fillna(x.min()/3))
df["measurement"] = df.groupby("DateSent")["measurement"].transform(lambda x: x.fillna(x.min()/3))
df["values"] = df.groupby("DateSent")["values"].transform(lambda x: x.fillna(x.min()/3))
在我有 100000 个这样的列的原始数据集中,我可以在技术上循环遍历所有所需的列名。但是有没有更好/更快的方法来做到这一点?也许在 pandas 中已经实现了一些东西?

最佳答案

您可以这样做的一种方法是将所有要归入列表的列 - 我假设您需要所有 numerical 列(ID、DateAcquired、DataSent 除外)

fti = [i for i in df.iloc[:,3:].columns if df[i].dtypes != 'object'] # features to impute
然后,您可以创建一个新的 df ,只有估算值:
imputed = df.groupby("DateSent")[fti].transform(lambda x: x.fillna(x.min()/3))

imputed.head(5)
data value measurement values
0 6358.434713 556.0 317.869897 3.565781
1 6508.458382 1468.0 774.337509 5.565384
2 6508.466246 1.0 40.837533 1.278085
3 6507.664194 48.0 64.335047 1.604183
4 6508.451227 0.0 40.337486 1.270236
最后你可以 concat :
res = pd.concat([df[df.columns.symmetric_difference(imputed.columns)],imputed],axis=1)

res.head(15)

DateAcquired DateSent ID data value measurement values
0 20210518 20220110 1 6358.434713 556.0 317.869897 3.565781
1 20210719 20220210 1 6508.458382 1468.0 774.337509 5.565384
2 20210719 20220310 1 6508.466246 1.0 40.837533 1.278085
3 20200420 20220410 1 6507.664194 48.0 64.335047 1.604183
4 20210328 20220510 1 6508.451227 0.0 40.337486 1.270236
5 20210518 20220610 1 6508.474031 3.0 15.000000 0.774597
6 20210108 20220110 2 6508.402472 897.0 488.837335 4.421933
7 20210110 20220210 2 6508.410493 52.0 111.000000 2.107131
8 20210119 20220310 2 6508.419065 800.0 440.337387 4.196844
9 20210108 20220410 2 6508.426063 89.0 84.837408 1.842144
10 20200109 20220510 2 6507.647600 978.0 529.334996 4.601456
11 20210919 20220610 2 6508.505563 1566.0 823.337655 5.738772
12 20211214 20220612 2 6508.528918 152.0 500.000000 4.472136
13 20210812 20220620 2 6508.497936 668.0 374.337631 3.869561
14 20210909 20220630 2 6508.506350 489.0 284.837657 3.375427

关于python - pandas 按组填充多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66122511/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com