gpt4 book ai didi

python - 如何合并重叠的列

转载 作者:行者123 更新时间:2023-12-02 05:13:17 25 4
gpt4 key购买 nike

我有两个这样的数据集

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'id': [1, 2,3,4,5], 'first': [np.nan,np.nan,1,0,np.nan], 'second': [1,np.nan,np.nan,np.nan,0]})
df2 = pd.DataFrame({'id': [1, 2,3,4,5, 6], 'first': [np.nan,1,np.nan,np.nan,0, 1], 'third': [1,0,np.nan,1,1, 0]})

我想要得到

result = pd.merge(df1, df2,  left_index=True, right_index=True,on='id', how= 'outer')
result['first']= result[["first_x", "first_y"]].sum(axis=1)
result.loc[(result['first_x'].isnull()) & (result['first_y'].isnull()), 'first'] = np.nan
result.drop(['first_x','first_y'] , 1)

id second third first
0 1 1.0 1.0 NaN
1 2 NaN 0.0 1.0
2 3 NaN NaN 1.0
3 4 NaN 1.0 0.0
4 5 0.0 1.0 0.0
5 6 NaN 0.0 1.0

问题是真实的数据集包含大约 200 个变量,而我的路很长。如何让它变得更容易?谢谢

最佳答案

您应该能够使用combine_first :

>>> df1.set_index('id').combine_first(df2.set_index('id'))
first second third
id
1 NaN 1 1
2 1 NaN 0
3 1 NaN NaN
4 0 NaN 1
5 0 0 1
6 1 NaN 0

关于python - 如何合并重叠的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45575165/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com