gpt4 book ai didi

python - Pandas:如何将多个数据帧 reshape 为一种通用形式?

转载 作者:行者123 更新时间:2023-12-04 10:06:24 24 4
gpt4 key购买 nike

(可重复的例子和最后的微弱尝试)

我有两个数据帧,df1 和 df2:

df1:

    Col_A   Col_B   Col_D
1 NaN 21 NaN
2 10 NaN 33
4 12 23 38

df2:
    Col_C   Col_E
2 22 44
3 NaN 45
5 4 48

我想找到一个这样的常见形式:

df_common:
    Col_A   Col_B   Col_C   Col_D   Col_E
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN

...我在哪里得到了 union所有列名和行索引,以及 NaN仅值:

然后我想填写 df1 和 df2 的值(仍然在两个单独的表中),以便我最终得到:

df1_desired
    Col_A   Col_B   Col_C   Col_D   Col_E
1 NaN 21 NaN NaN NaN
2 10 NaN NaN 33 NaN
3 NaN NaN NaN NaN NaN
4 12 23 NaN NaN NaN
5 NaN NaN NaN 38 NaN

df2_resired:
    Col_A   Col_B   Col_C   Col_D   Col_E
1 NaN NaN NaN NaN NaN
2 NaN NaN 22 NaN 44
3 NaN NaN NaN NaN 35
4 NaN NaN NaN NaN NaN
5 NaN NaN 4 NaN 48

我尝试了各种尝试 pd.merge()df.update()没有成功

但我已经接受了这样一个事实,我什至不知道如何正确地称呼这个特殊的挑战。感谢您的任何建议!

可重现的示例:
import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Col_A': {1: np.nan, 2: '10', 4: '12'},
'Col_B': {1: '21', 2: np.nan, 4: '23'},
'Col_D': {1: np.nan, 2: '33', 4: '38'}})

df2 = pd.DataFrame({'Col_C': {2: '22', 3: np.nan, 5: '4'},
'Col_E': {2: 44, 3: 45, 5: 48}})

df1_desired = pd.DataFrame({'Col_A': {1: np.nan, 2: '10', 3: np.nan, 4: '12', 5: np.nan},
'Col_B': {1: '23', 2: np.nan, 3: np.nan, 4: '23', 5: np.nan},
'Col_C': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan},
'Col_D': {1: np.nan, 2: '22', 3: np.nan, 4: np.nan, 5: '4'},
'Col_E': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan}})

df2_desired = pd.DataFrame({'Col_A': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan},
'Col_B': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan},
'Col_C': {1: np.nan, 2: '22', 3: np.nan, 4: np.nan, 5: '4'},
'Col_D': {1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan, 5: np.nan},
'Col_E': {1: np.nan, 2: '44', 3: '35', 4: np.nan, 5: '48'}})

# find the commons
common_cols = sorted(list(set().union(list(df1),list(df2))))
common_rows = sorted(list(set().union(list(df1.index),list(df2.index))))
df_common = pd.DataFrame(np.nan, index=common_rows, columns=common_cols)

# attempt at reshaping df1 with pd.merge
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
df1_reshaped = pd.merge(df_common, df1, how='left', left_index=True, right_index=True)

# attempt at dropping duplicates for df1
#df1_reshaped = df1_reshaped[df1_reshaped.columns.drop(list(df1_reshaped.filter(regex='_x')))]
#df1_reshaped.columns = df_common.columns

# attempt with df.update()
# https://stackoverflow.com/questions/9787853/join-or-merge-with-overwrite-in-pandas
df1_updated=df_common.update(df1)

最佳答案

您可以使用:

s=df1.combine_first(df2)
df1=df1.reindex_like(s)
Col_A Col_B Col_C Col_D Col_E
1 NaN 21.0 NaN NaN NaN
2 10.0 NaN NaN 33.0 NaN
3 NaN NaN NaN NaN NaN
4 12.0 23.0 NaN 38.0 NaN
5 NaN NaN NaN NaN NaN

关于python - Pandas:如何将多个数据帧 reshape 为一种通用形式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61568461/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com