gpt4 book ai didi

python - 使用for循环到 "read_pickle"和 "to_pickle"很多数据文件

转载 作者:太空宇宙 更新时间:2023-11-03 16:51:11 25 4
gpt4 key购买 nike

我正在使用 Linux 和 Ipython Notebook。我有一个腌制数据文件目录('/home/jayaramdas/anaconda3/pdf/senate_bills'),其中包含日期、bill_id和sponsor_id(每个赞助商有多个账单);我有一个腌制数据文件(位于:'/home/jayaramdas/anaconda3/pdf/sbcommittee_id_pdf'),其中包含所有赞助商 ID sbsponsor_id_pdf 的列。我需要进入目录 '/home/.../senate_bills',打开每个 pickle 文件,创建一个单独的文件,收集 sbsponsor_id_pdf 文件中每个 Sponsor_id 的所有 bill_ids,然后 pickle 该文件,根据其命名到sponsor_id 和一个两位数的号码。

到目前为止我拥有的代码是:

import pandas as pd
import os
import os.path
path = '/home/jayaramdas/anaconda3/pdf/senate_bills'
path1 = '/home/jayaramdas/anaconda3/pdf'
dirs = os.listdir(path)
for dir in dirs:
with open(path + "/" + dir) as f:

df = pd.read_pickle(f)
with open(path + "/" + "/sbcommittee_id_pdf", "r") as f:
data = json.load(f)

for sponsor in data['sponsor_id']:

pdf = df[df['sponsor_id'] == sponsor]

pdf.to_pickle('sponsor' + '_08bills.pdf')

print (pdf)

我收到以下错误:

TypeError   Traceback (most recent call     last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas /io/pickle.py in try_read(path, encoding)
44 try:
---> 45 with open(path, 'rb') as fh:
46 return pkl.load(fh)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas /anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas /io/pickle.py in try_read(path, encoding)
50 try:
---> 51 with open(path, 'rb') as fh:
52 return pc.load(fh, encoding=encoding, compat=False)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in read_pickle(path)
59 try:
---> 60 return try_read(path)
61 except:

/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
55 except:
---> 56 with open(path, 'rb') as fh:
57 return pc.load(fh, encoding=encoding, compat=True)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
44 try:
---> 45 with open(path, 'rb') as fh:
46 return pkl.load(fh)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
50 try:
---> 51 with open(path, 'rb') as fh:
52 return pc.load(fh, encoding=encoding, compat=False)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
<ipython-input-61-40e7738e1c05> in <module>()
8 with open(path + "/" + dir) as f:
9
---> 10 df = pd.read_pickle(f)
11 with open(path + "/" + "/sbcommittee_id_pdf", "r") as f:
12 data = json.load(f)

/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in read_pickle(path)
61 except:
62 if PY3:
---> 63 return try_read(path, encoding='latin1')
64 raise

/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
54 # compat pickle
55 except:
---> 56 with open(path, 'rb') as fh:
57 return pc.load(fh, encoding=encoding, compat=True)
58

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

最佳答案

希望这会有所帮助。我不清楚 JSON 文件的文件位置以及它与路径的关系。

一般来说,您想使用os.path.join(a, b)以便您的代码可以跨多个平台运行(例如 Mac 和 PC)。

请注意,您的示例代码中在for dir in dirs:之后缺少一层缩进。 (无论如何,dir 都是保留字,不应使用)。

您还在使用您的f变量两次。试试f1f2或者更具描述性的东西。

path = '/home/jayaramdas/anaconda3/pdf'
senate_bill_dir = os.path.join(path, 'senate_bills')

data = pd.read_pickle(os.path.join(path, 'sbcommittee_id_pdf.p'))
data.columns = ['sponsor_id']
for my_file in os.listdir(senate_bill_dir):
df = pd.read_pickle(os.path.join(senate_bill_dir, my_file))
for sponsor in data['sponsor_id'].unique():
pdf = df[df['sponsor_id'] == sponsor]
if len(pdf): # Only save if there are records.
pdf.to_pickle(str(sponsor) + '_08bills.p')

关于python - 使用for循环到 "read_pickle"和 "to_pickle"很多数据文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35830199/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com