gpt4 book ai didi

Updating a dataframe with some missing values using a subset dataframe(使用子集数据帧更新具有某些缺失值的数据帧)

转载 作者:bug小助手 更新时间:2023-10-25 15:11:44 27 4
gpt4 key购买 nike



I am trying to update the missing values of a dataframe in pandas with a smaller subset but cannot seem to get pd.merge, df.loc or pd.join to work.

我正在尝试用较小的子集更新熊猫中数据帧的缺失值,但似乎无法使pd.merge、df.loc或pd.Join起作用。


The scenario is like this: I have a Dataframe df such that:

场景如下:我有一个Dataframe DF,这样:


df = pd.DataFrame({"EmpId":[1,2,3,...,99,100],
"Name":['Fred','Barney','Wilma',...,'Bam-Bam','Pebbles'],
"Age":[40,35,NaN,...,5,NaN]}

And I get a new dataframe df1 like:

我得到了一个新的数据帧df1,如:


df1 = pd.DataFrame({"EmpId":[3,...,100],
"Age":[30,...,6]})

The "EmpId"'s in df1 are a non-sequential set of id's which exist in df with "Age" values which are NaN in df. I am trying to fill the missing entries in df without duplicating or otherwise affecting the existing values.
I have tried pd.merge, which tries to add df1 as new columns in df (even when using suffixes=(False,False), pd.join has a similar effect)
I have tried using df.loc[df.EmpId == df1.EmpId, 'Age'] = df1.loc[df1.EmpId == df.EmpId, 'Age'] but whilst I can parse the information I require, won't seem to update df, it continues to have the NaN values.
I have tried df.update(df1) but get a Value Error.
I've even tried a for...if... construct with df.loc but none of these seem to work as I intend.
df and df1 have different shapes.
If anyone has any ideas where I'm going wrong, I would appreciate your input.

Df1中的“EmpID”S是一组不连续的id,它们存在于df中,具有在df中为NaN的“Age”值。我正在尝试在不复制或以其他方式影响现有值的情况下填充df中缺少的条目。我尝试过pd.merge,它试图将df1作为新列添加到df中(即使使用Suffixs=(FALSE,FALSE),pd.Join也有类似的效果)。我已经尝试使用df.loc[df.EmpID==df1.EmpId,‘Age’]=df1.loc[df1.EmpID==df.EmpID,‘Age’],但是虽然我可以解析所需的信息,但似乎不会更新df,它仍然具有NaN值。我尝试了df.update(Df1),但得到一个值错误。我甚至试过...如果..。使用df.loc构建,但这些似乎都不像我想要的那样工作。Df和df1具有不同的形状。如果任何人对我的错误之处有任何想法,我将感谢您的意见。


更多回答

Your minimal example should be compete, please do not use ... and provide the exact matching expected output.

您的最小示例应该是竞争,请不要使用...并提供完全匹配的预期输出。

I think if you use set_index to make "EmpId" the index, then df.update(df1) should work, but I'm not sure why it's giving a ValueError now.

我认为如果您使用set_index将“EmpID”设置为索引,那么df.update(Df1)应该可以工作,但我不确定为什么它现在会给出一个ValueError。

优秀答案推荐

If I understand the problem correctly, you can use merge + additional operations on result columns, while making sure you don't change the original values with ifnull function:

如果我正确理解了这个问题,您可以对结果列使用Merge+附加操作,同时确保不会使用ifull函数更改原始值:


df = pd.DataFrame({"EmpId":[1,2,3,99,100],
"Name":['Fred','Barney','Wilma','Bam-Bam','Pebbles'],
"Age":[40,35,np.NaN,5,np.NaN]})

df1 = pd.DataFrame({"EmpId":[3,100],
"Age":[30,6]})

df = df.merge(df1, on="EmpId", how="left", suffixes=("", "_filled"))

def ifnull(val, replace):
if val is None or pd.isna(val):
return replace
return val

df["Age"] = df[["Age", "Age_filled"]] \
.apply(lambda row: ifnull(row["Age"], row["Age_filled"]), axis=1)

df.drop("Age_filled", axis=1, inplace=True)
print(df)

Output:

产出:


   EmpId     Name   Age
0 1 Fred 40.0
1 2 Barney 35.0
2 3 Wilma 30.0
3 99 Bam-Bam 5.0
4 100 Pebbles 6.0

This will only work if EmpId are in fact unique, as in your example.

这只有在EmpID实际上是唯一的情况下才会起作用,如您的示例所示。


更多回答

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com