gpt4 book ai didi

python pandas 在两个轴上连接数据帧(在所有方向合并)

转载 作者:行者123 更新时间:2023-12-01 01:11:04 28 4
gpt4 key购买 nike

如果这是重复的,我深表歉意,我不知道我面临的问题的相关关键字。我正在读取一系列数据帧,并尝试将它们连接到一个大型数据帧中。 dfs是时间序列数据,有些需要垂直附加,有些需要水平附加。有几千个,但这个例子捕获了问题。

问题是我找不到可以在不事先了解正在读入的内容的情况下处理双向合并的 pandas 函数。

import pandas as pd


df1 = pd.DataFrame({'GMT_date': ["01-01-2000","02-01-2000","03-01-2000","04-01-2000","05-01-2000"] ,
'LOWI_Precipitation_24_Hour_(mm)': [2.2, 0, 0.2, 0.4, 0]})

df2 = pd.DataFrame({'GMT_date': ["06-01-2000","07-01-2000","08-01-2000","09-01-2000","10-01-2000"] ,
'LOWI_Precipitation_24_Hour_(mm)': [0.1, 0.1, 3.1, 4.0, 0]})

df3 = pd.DataFrame({'GMT_date': ["06-01-2000","07-01-2000","08-01-2000","09-01-2000","10-01-2000"] ,
'TGPY_Precipitation_24_Hour_(mm)': [0.6, 0.7, 5.1, 1.0, 2.0]})

df_list = [df1, df2, df3]

merged_df = pd.DataFrame()

for df in df_list:
merged_df = merged_df.merge(df, how='outer', left_index=True, right_index=True)
print(merged_df)

GMT_date_x LOWI_Precipitation_24_Hour_(mm)_x GMT_date_y LOWI_Precipitation_24_Hour_(mm)_y GMT_date TGPY_Precipitation_24_Hour_(mm)
0 01-01-2000 2.2 06-01-2000 0.1 06-01-2000 0.6
1 02-01-2000 0.0 07-01-2000 0.1 07-01-2000 0.7
2 03-01-2000 0.2 08-01-2000 3.1 08-01-2000 5.1
3 04-01-2000 0.4 09-01-2000 4.0 09-01-2000 1.0
4 05-01-2000 0.0 10-01-2000 0.0 10-01-2000 2.0

正如您所看到的,这些列已为每个 df 分配了一个新列,我希望它们位于单个列中。我还尝试了以下合并:

merged_df  = pd.concat([merged_df , df], sort=True)

产生:

     GMT_date  LOWI_Precipitation_24_Hour_(mm)  TGPY_Precipitation_24_Hour_(mm)
0 01-01-2000 2.2 NaN
1 02-01-2000 0.0 NaN
2 03-01-2000 0.2 NaN
3 04-01-2000 0.4 NaN
4 05-01-2000 0.0 NaN
0 06-01-2000 0.1 NaN
1 07-01-2000 0.1 NaN
2 08-01-2000 3.1 NaN
3 09-01-2000 4.0 NaN
4 10-01-2000 0.0 NaN
0 06-01-2000 NaN 0.6
1 07-01-2000 NaN 0.7
2 08-01-2000 NaN 5.1
3 09-01-2000 NaN 1.0
4 10-01-2000 NaN 2.0

现在日期位于我想要的单列中,但日期列中有重复项。我尝试过:

merged_df = merged_df.join(df)

这会引发错误:

ValueError: columns overlap but no suffix specified: Index(['GMT_date', 'LOWI_Precipitation_24_Hour_(mm)'], dtype='object')

我还尝试将索引设置为 GMT_date:

for df in df_list:
df = df.set_index('GMT_date')
merged_df = pd.concat([merged_df, df], axis=1)
print(merged_df)

LOWI_Precipitation_24_Hour_(mm) LOWI_Precipitation_24_Hour_(mm) TGPY_Precipitation_24_Hour_(mm)
01-01-2000 2.2 NaN NaN
02-01-2000 0.0 NaN NaN
03-01-2000 0.2 NaN NaN
04-01-2000 0.4 NaN NaN
05-01-2000 0.0 NaN NaN
06-01-2000 NaN 0.1 0.6
07-01-2000 NaN 0.1 0.7
08-01-2000 NaN 3.1 5.1
09-01-2000 NaN 4.0 1.0
10-01-2000 NaN 0.0 2.0

这会将日期解析为单列,并且没有重复的日期,但现在我将数据分散到具有相同名称的 2 列中?

我一定缺少一个简单的函数或参数来解决单行中两个方向的合并问题?

我尝试查看:pd.merge_asof,但它似乎只返回 2 个 dfs 之间共享的日期。

我希望有一个简单的修复方法?感觉就像不是 'how' : {'left', 'right', 'outer', 'inner'} 我需要一个合并中的 'all' 选项。

最佳答案

我有另一个解决方案,在连接所有数据帧后用 groupby 合并重复列

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'GMT_date': ["01-01-2000", "02-01-2000", "03-01-2000", "04-01-2000", "05-01-2000"],
'LOWI_Precipitation_24_Hour_(mm)': [2.2, 0, 0.2, 0.4, 0]})

df2 = pd.DataFrame({'GMT_date': ["06-01-2000", "07-01-2000", "08-01-2000", "09-01-2000", "10-01-2000"],
'LOWI_Precipitation_24_Hour_(mm)': [0.1, 0.1, 3.1, 4.0, 0]})

df3 = pd.DataFrame({'GMT_date': ["06-01-2000", "07-01-2000", "08-01-2000", "09-01-2000", "10-01-2000"],
'TGPY_Precipitation_24_Hour_(mm)': [0.6, 0.7, 5.1, 1.0, 2.0]})

df_list = [df1, df2, df3]
merged_df = pd.DataFrame()

for df in df_list:
df = df.set_index('GMT_date')

merged_df = pd.concat([merged_df, df], axis=1)

merged_df = merged_df.groupby(level=0, axis=1).agg(np.max)

print(merged_df)

输出:

            LOWI_Precipitation_24_Hour_(mm)  TGPY_Precipitation_24_Hour_(mm)
01-01-2000 2.2 NaN
02-01-2000 0.0 NaN
03-01-2000 0.2 NaN
04-01-2000 0.4 NaN
05-01-2000 0.0 NaN
06-01-2000 0.1 0.6
07-01-2000 0.1 0.7
08-01-2000 3.1 5.1
09-01-2000 4.0 1.0
10-01-2000 0.0 2.0
<小时/>

你可以使用

merged_df = merged_df.groupby(level=0, axis=1).agg(np.max if np.nan else sum) # if there is `Nan` then it will take `np.max` else take sum of values

关于python pandas 在两个轴上连接数据帧(在所有方向合并),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54865816/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com