gpt4 book ai didi

python - 连接多个不同长度的Dataframe

转载 作者:行者123 更新时间:2023-12-01 03:37:48 24 4
gpt4 key购买 nike

我有 88 个不同长度的不同数据帧,我需要将它们连接起来。所有这些都位于一个目录中,我使用以下 python 脚本来生成这样一个数据框。

这是我尝试过的,

 path = 'GTFS/' 
files = os.listdir(path)

files_txt = [os.path.join(path,i) for i in files if i.endswith('.tsv')]

## Change it into dataframe
dfs = [pd.DataFrame.from_csv(x, sep='\t')[[6]] for x in files_txt]
##Concatenate it
merged = pd.concat(dfs,axis=1)

由于每个数据帧的长度或形状不同,因此会抛出以下错误消息,

ValueError: Shape of passed values is (88, 57914), indices imply (88, 57905)

我的目标是按列连接成具有 88 列的单个数据帧,因为我的输入是 88 个单独的数据帧,我需要在脚本中使用其中的第七列。在这种情况下,对于连接数据帧,任何解决方案或建议都会很棒谢谢

最佳答案

关键是创建不同数据帧的列表,然后连接该列表而不是单独的连接。

我创建了 10 个 df,其中填充了一列的随机长度数据,并将其保存到 csv 文件中以模拟您的数据。

import pandas as pd
import numpy as np
from random import randint


#generate 10 df and save to seperate csv files
for i in range(1,11):
dfi = pd.DataFrame({'a':np.arange(randint(2,11))})
csv_file = "file{0}.csv".format(i)
dfi.to_csv(csv_file, sep='\t')
print "saving file", csv_file

然后我们将这 10 个 csv 文件读取到单独的数据帧中并保存到 列表

#read previously saved csv files into 10 seperate df
# and add to list
frames = []
for x in range(1,10):
csv_file = "file{0}.csv".format(x)
newdf = pd.DataFrame.from_csv(csv_file, sep='\t')
frames.append(newdf)

最后,我们连接列表

#concatenate frames list
result = pd.concat(frames, axis=1)
print result

结果是将 10 个可变长度帧按列连接成单个 df

saving file file1.csv
saving file file2.csv
saving file file3.csv
saving file file4.csv
saving file file5.csv
saving file file6.csv
saving file file7.csv
saving file file8.csv
saving file file9.csv
saving file file10.csv
a a a a a a a a a
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0
1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1 1.0
2 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2 2.0
3 3.0 3.0 3.0 3.0 3.0 NaN 3.0 3 NaN
4 4.0 4.0 4.0 4.0 4.0 NaN NaN 4 NaN
5 5.0 5.0 5.0 5.0 5.0 NaN NaN 5 NaN
6 6.0 6.0 6.0 6.0 6.0 NaN NaN 6 NaN
7 NaN 7.0 7.0 7.0 7.0 NaN NaN 7 NaN
8 NaN 8.0 NaN NaN 8.0 NaN NaN 8 NaN
9 NaN NaN NaN NaN 9.0 NaN NaN 9 NaN
10 NaN NaN NaN NaN NaN NaN NaN 10 NaN

希望这是您正在寻找的。关于合并、连接和连接的一个很好的例子可以在 here 找到。 .

关于python - 连接多个不同长度的Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40137372/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com