gpt4 book ai didi

python - Pandas - 沿非索引轴连接两个 df,合并非索引轴上具有相同值的行

转载 作者:行者123 更新时间:2023-12-02 09:22:36 26 4
gpt4 key购买 nike

我有两个 pandas 数据框,我想将它们合并为一个。我希望生成的数据帧沿非索引轴排序,在我的例子中是 'seconds_since_start' 。我想合并 'seconds_since_start' 具有相同值的行。我还想保留两个数据框之间的唯一列。

显示给定的输入和所需的输出可能更容易。

<小时/>
df_a = """
valid_a,value_a,seconds_since_start
2000-02-15 14:47:00,12.3,0.0
2000-02-15 15:59:00,20.6,30.0
2000-02-15 16:51:00,20.3,120.0
2000-02-15 17:52:00,22.6,200.0
"""

df_b = """
valid_b,value_b,seconds_since_start
2019-12-24 14:54:00,12.4,20.0
2019-12-24 15:54:00,18.7,30.0
2019-12-24 16:54:00,19.2,90.0
2019-12-24 17:54:00,20.8,250.0
"""

df_desired_output = """
valid_a,valid_b,value_a,value_b,seconds_since_start
2000-02-15 14:47:00,,12.3,,0.0
,2019-12-24 14:54:00,,12.4,20.0
2000-02-15 15:59:00,2019-12-24 15:54:00,20.6,18.7,30.0
,2019-12-24 16:54:00,,19.2,90.0
2000-02-15 16:51:00,,20.3,,120.0
2000-02-15 17:52:00,,22.6,,200.0
,2019-12-24 17:54:00,,20.8,250.0
"""

from io import StringIO
import pandas as pd
import numpy as np

df_a = StringIO(df_a)
df_a = pd.read_csv(df_a)
df_a['valid_a'] = pd.to_datetime(df_a['valid_a']) # convert 'valid' column to pd.datetime objects
df_a = df_a.set_index('valid_a') # set the 'valid' as index

df_b = StringIO(df_b)
df_b = pd.read_csv(df_b)
df_b['valid_b'] = pd.to_datetime(df_b['valid_b']) # convert 'valid' column to pd.datetime objects
df_b = df_b.set_index('valid_b') # set the 'valid' as index

df_desired_output = StringIO(df_desired_output)
df_desired_output = pd.read_csv(df_desired_output)


print('input dataframe A\n', df_a)
print('input dataframe B\n', df_b)
print('desired output dataframe\n', df_desired_output)

df_new = pd.concat([df_a, df_b], sort=False) # can't sort by 'seconds_since_start' from here so I do it on the next line
df_new = df_new.sort_values(by='seconds_since_start') # sort
print('actual output\n', df_new) # fails to merge rows that have the same value for 'seconds_since_start'
<小时/>

输出

input dataframe A
value_a seconds_since_start
valid_a
2000-02-15 14:47:00 12.3 0.0
2000-02-15 15:59:00 20.6 30.0
2000-02-15 16:51:00 20.3 120.0
2000-02-15 17:52:00 22.6 200.0
input dataframe B
value_b seconds_since_start
valid_b
2019-12-24 14:54:00 12.4 20.0
2019-12-24 15:54:00 18.7 30.0
2019-12-24 16:54:00 19.2 90.0
2019-12-24 17:54:00 20.8 250.0
desired output dataframe
valid_a valid_b ... value_b seconds_since_start
0 2000-02-15 14:47:00 NaN ... NaN 0.0
1 NaN 2019-12-24 14:54:00 ... 12.4 20.0
2 2000-02-15 15:59:00 2019-12-24 15:54:00 ... 18.7 30.0
3 NaN 2019-12-24 16:54:00 ... 19.2 90.0
4 2000-02-15 16:51:00 NaN ... NaN 120.0
5 2000-02-15 17:52:00 NaN ... NaN 200.0
6 NaN 2019-12-24 17:54:00 ... 20.8 250.0

[7 rows x 5 columns]
actual output
value_a seconds_since_start value_b
2000-02-15 14:47:00 12.3 0.0 NaN
2019-12-24 14:54:00 NaN 20.0 12.4
2000-02-15 15:59:00 20.6 30.0 NaN
2019-12-24 15:54:00 NaN 30.0 18.7
2019-12-24 16:54:00 NaN 90.0 19.2
2000-02-15 16:51:00 20.3 120.0 NaN
2000-02-15 17:52:00 22.6 200.0 NaN
2019-12-24 17:54:00 NaN 250.0 20.8

最佳答案

只需将索引添加到列

df_new = pd.concat([df_a.assign(valid_a=df_a.index), df_b.assign(valid_b=df_b.index)], sort=False)  
df_new = df_new.sort_values(by='seconds_since_start')

关于python - Pandas - 沿非索引轴连接两个 df,合并非索引轴上具有相同值的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59535767/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com