gpt4 book ai didi

python - 从 Jupyter notebook 文件夹打开多个 pickle 文件不起作用

转载 作者:行者123 更新时间:2023-12-03 10:08:25 28 4
gpt4 key购买 nike

我在服务器上使用 jupyter notebook(文件夹不在我的电脑上)。我有一个包含 30 个数据框的文件夹,这些数据框具有完全相同的列。它们都保存在下一个路径中:

Reut/folder_no_one/here_the_files_located

我想将它们全部打开并连接起来。我知道我可以做这样的事情:

df1=pd.read_pickle('table1')
df2=pd.read_pickle('table2')
df3=pd.read_pickle('table3')
...
#and then concat

但我确信有更好、更聪明的方法来做到这一点。我试图打开所有文件并将它们分别保存如下:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'

{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

但是我得到了这个错误:

--------------------------------------------------------------------------- TypeError Traceback (most recent calllast) in ----> 1 {f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

TypeError: 'str' object is not callable

我试过玩和放不同版本的路径,也没有放路径(因为我的笔记本是那些文件所在的地方),但我总是遇到同样的错误。

*值得一提的是,当笔记本也在该文件夹中时,我可以在不指定路径的情况下打开这些文件。

我的最终目标是自动将所有这些表打开并合并为一个大表。

编辑:我也试过这个:

path = r'file_name/file_location_with_all_pickles'
all_files = glob.glob(path + "/*.pkl")

li = []

for filename in all_files:
df = pd.read_pickle(filename)
li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

还有

path_to_files = r'file_name/file_location_with_all_pickles'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pkl"):
print(table)
tables.append(pd.read_pickle(table))

但是这两种情况我都报错

ValueError: No objects to concatenatewhen I try to concat. also when I tell it to print the filename/table it does nothing. also if inside the loop I try to print just ordinary string (like print('hello'), nothing happens.it seems like there is problem with the path but when I open one specific pickle like this:

pd.read_pickle(r'file_name/file_location_with_all_pickles/specific_table.pkl')

它打开了。

'更新:

这最终对我有用:

import pandas as pd
import glob

path = r'folder' # use your path
all_files = glob.glob(path + "/*.pkl")

li = []

for filename in all_files:
df = pd.read_pickle(filename)
li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

从这里 ( Open multiple pickle files from Jupyter notebook folder doesn't work )

最佳答案

怎么样:

path_to_files = r'Reut/here_the_files_located'
df = pd.concat([pd.read_pickle(f'{path_to_files}/table{num}.pickle') for num in range(1, 33)])

这相当于:

path_to_files = r'Reut/here_the_files_located'
tables = []
for num in range(1, 33):
filename = f'{path_to_files}/table{num}.pickle'
print(filename)
tables.append(pd.read_pickle(filename))

df = pd.concat(tables)

输出:

Reut/here_the_files_located/table1.pickle
Reut/here_the_files_located/table2.pickle
Reut/here_the_files_located/table3.pickle
Reut/here_the_files_located/table4.pickle
Reut/here_the_files_located/table5.pickle
Reut/here_the_files_located/table6.pickle
Reut/here_the_files_located/table7.pickle
Reut/here_the_files_located/table8.pickle
Reut/here_the_files_located/table9.pickle
Reut/here_the_files_located/table10.pickle
Reut/here_the_files_located/table11.pickle
Reut/here_the_files_located/table12.pickle
Reut/here_the_files_located/table13.pickle
Reut/here_the_files_located/table14.pickle
Reut/here_the_files_located/table15.pickle
Reut/here_the_files_located/table16.pickle
Reut/here_the_files_located/table17.pickle
Reut/here_the_files_located/table18.pickle
Reut/here_the_files_located/table19.pickle
Reut/here_the_files_located/table20.pickle
Reut/here_the_files_located/table21.pickle
Reut/here_the_files_located/table22.pickle
Reut/here_the_files_located/table23.pickle
Reut/here_the_files_located/table24.pickle
Reut/here_the_files_located/table25.pickle
Reut/here_the_files_located/table26.pickle
Reut/here_the_files_located/table27.pickle
Reut/here_the_files_located/table28.pickle
Reut/here_the_files_located/table29.pickle
Reut/here_the_files_located/table30.pickle
Reut/here_the_files_located/table31.pickle
Reut/here_the_files_located/table32.pickle

关于您的代码的一些评论:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'

{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}
num=list(range(1, 33)) #number of tables I have in the folder

不需要用range创建一个list。直接在 for 循环或列表/字典理解中使用 range 效果很好。

Path=r'Reut/folder_no_one/here_the_files_located'

我猜您之前已经从 pathlib 中导入了 Path 类。如果您想像平常一样调用 Path,则需要为该变量选择另一个名称。这就是您收到错误 TypeError: 'str' object is not callable 的原因。


is there nay way to use it if the tables names' are not the same? e.g if one was table1 and one is dataframe3, just to read them not depended on their name

当然。假设所有已保存表格的文件名都以 .pickle 结尾,您可以使用 glob像你第一次尝试的方法。不要忘记import pathlib

import pathlib
path_to_files = r'Reut/here_the_files_located'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pickle"):
tables.append(pd.read_pickle(table))

df = pd.concat(tables)

关于python - 从 Jupyter notebook 文件夹打开多个 pickle 文件不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64192388/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com