gpt4 book ai didi

Python Pandas 将 NaN 行替换为另一个数据帧中具有相同日期索引的行

转载 作者:太空宇宙 更新时间:2023-11-03 15:53:58 25 4
gpt4 key购买 nike

我有两个数据框,看起来像这样:

2001-01-03 00:00:00      NaN      NaN      NaN      NaN  NaN
2001-01-03 00:01:00 0.95110 0.95110 0.95110 0.95110 4.0
2001-01-03 00:02:00 0.95100 0.95110 0.95100 0.95110 4.0
2001-01-03 00:03:00 0.95100 0.95100 0.95100 0.95100 4.0
2001-01-03 00:04:00 0.95090 0.95090 0.95090 0.95090 4.0
2001-01-03 00:05:00 0.95100 0.95100 0.95100 0.95100 4.0

我想做的是将一个 df 中的任何 NaN 行替换为另一个 df 中具有相同日期索引的行。

我尝试过这样的事情:

df = df.apply(lambda x: df2.ix[x['row']] if x.isnull().any() else x)

但它只会抛出一堆错误,即使我可以让它工作,也可能不是最好的方法。据我了解,也许可以使用 .update() 来做到这一点,但我一直无法理解它,所以如果有人可以提供一些帮助,我将非常感激。

最佳答案

您可以使用DataFrame.combine :

df = df1.combine_first(df2)

或者DataFrame.fillna :

df = df1.fillna(df2)

或者DataFrame.update :

df1.update(df2)
print (df1)

但两个 DataFrames 中需要相同的列名称。

示例:

df1 = pd.DataFrame({1: {pd.Timestamp('2001-01-03 00:01:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:03:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:02:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:04:00'): 0.95089999999999997}, 2: {pd.Timestamp('2001-01-03 00:01:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:03:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:02:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:04:00'): 0.95089999999999997}, 3: {pd.Timestamp('2001-01-03 00:01:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:03:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:02:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:04:00'): 0.95089999999999997}, 4: {pd.Timestamp('2001-01-03 00:01:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:03:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:02:00'): 0.95109999999999995, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:04:00'): 0.95089999999999997}, 5: {pd.Timestamp('2001-01-03 00:01:00'): 4.0, pd.Timestamp('2001-01-03 00:03:00'): 4.0, pd.Timestamp('2001-01-03 00:02:00'): 4.0, pd.Timestamp('2001-01-03 00:00:00'): np.nan, pd.Timestamp('2001-01-03 00:05:00'): 4.0, pd.Timestamp('2001-01-03 00:04:00'): 4.0}})
df2 = pd.DataFrame({1: {pd.Timestamp('2001-01-03 00:01:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): 0.95089999999999997}, 2: {pd.Timestamp('2001-01-03 00:01:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): 0.95089999999999997}, 3: {pd.Timestamp('2001-01-03 00:01:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): 0.95089999999999997}, 4: {pd.Timestamp('2001-01-03 00:01:00'): 0.95099999999999996, pd.Timestamp('2001-01-03 00:00:00'): 0.95089999999999997}, 5: {pd.Timestamp('2001-01-03 00:01:00'): 4.0, pd.Timestamp('2001-01-03 00:00:00'): 4.0}})

print (df1)
1 2 3 4 5
2001-01-03 00:00:00 NaN NaN NaN NaN NaN
2001-01-03 00:01:00 0.9511 0.9511 0.9511 0.9511 4.0
2001-01-03 00:02:00 0.9510 0.9511 0.9510 0.9511 4.0
2001-01-03 00:03:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:04:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:05:00 0.9510 0.9510 0.9510 0.9510 4.0

print (df2)
1 2 3 4 5
2001-01-03 00:00:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:01:00 0.9510 0.9510 0.9510 0.9510 4.0
df = df1.combine_first(df2)
print (df)
1 2 3 4 5
2001-01-03 00:00:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:01:00 0.9511 0.9511 0.9511 0.9511 4.0
2001-01-03 00:02:00 0.9510 0.9511 0.9510 0.9511 4.0
2001-01-03 00:03:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:04:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:05:00 0.9510 0.9510 0.9510 0.9510 4.0

df = df1.fillna(df2)
print (df)
1 2 3 4 5
2001-01-03 00:00:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:01:00 0.9511 0.9511 0.9511 0.9511 4.0
2001-01-03 00:02:00 0.9510 0.9511 0.9510 0.9511 4.0
2001-01-03 00:03:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:04:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:05:00 0.9510 0.9510 0.9510 0.9510 4.0

df1.update(df2)
print (df1)
1 2 3 4 5
2001-01-03 00:00:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:01:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:02:00 0.9510 0.9511 0.9510 0.9511 4.0
2001-01-03 00:03:00 0.9510 0.9510 0.9510 0.9510 4.0
2001-01-03 00:04:00 0.9509 0.9509 0.9509 0.9509 4.0
2001-01-03 00:05:00 0.9510 0.9510 0.9510 0.9510 4.0

关于Python Pandas 将 NaN 行替换为另一个数据帧中具有相同日期索引的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40976525/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com