gpt4 book ai didi

python - 如何使用 python 在 zip 中打开 zip 中的 csv?

转载 作者:行者123 更新时间:2023-12-01 01:48:24 24 4
gpt4 key购买 nike

我一直在使用用户定义的函数来打开 ZIP 文件中包含的 CSV 文件,这对我来说非常有效。

How to scrape .csv files from a url, when they are saved in a .zip file in Python?

现在我试图打开一个包含在一个 ZIP 中的 CSV 文件,该文件又包含在另一个 ZIP 中,但遇到了一些麻烦。

我没有得到包含 CSV 数据的数据帧的预期输出,而是收到以下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 0: invalid start byte

这是有道理的,因为我正在尝试使用 read_csv() 打开一个 zip 文件

import pandas as pd

def fetch_multi_csv_zip_from_url(url, filenames=(), *args, **kwargs):
assert kwargs.get('compression') is None
req = urlopen(url)
zip_file = zipfile.ZipFile(BytesIO(req.read()))

if filenames:
names = zip_file.namelist()
for filename in filenames:
if filename not in names:
raise ValueError(
'filename {} not in {}'.format(filename, names))
else:
filenames = zip_file.namelist()

return {name: pd.read_csv(zip_file.open(name), *args, **kwargs)
for name in filenames}

try:
from urllib.request import urlopen
except ImportError:
from urllib2 import urlopen
from io import BytesIO
import zipfile

final_links_list =['http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170523.zip', 'http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170524.zip']
l = len(final_links_list)

for j in range(0,l):
print(j)
dfs = fetch_multi_csv_zip_from_url(final_links_list[j])

这是我一直在使用的代码,我认为我必须更改以以下开​​头的行:

return {name: pd.read_csv(zip_file.open(name)

因为它不再返回 csv 文件,而是返回 zip 文件。

最佳答案

这可以通过一些递归来完成。如果发现 ZIP 内的文件是 ZIP 文件,则进行递归调用以提取 CSV 文件:

try:
from urllib.request import urlopen
except ImportError:
from urllib2 import urlopen

from io import BytesIO
import zipfile

import pandas as pd

# Dictionary holding all the dataframes from all zip/zip/csvs
dfs = {}


def zip_to_dfs(data):
zip_file = zipfile.ZipFile(BytesIO(data))

for name in zip_file.namelist():
if name.lower().endswith('.csv'):
dfs[name] = pd.read_csv(zip_file.open(name))
elif name.lower().endswith('.zip'):
zip_to_dfs(zip_file.open(name).read())


def get_zip_data_from_url(url):
req = urlopen(url)
zip_to_dfs(req.read())


final_links_list = [
'http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170523.zip',
'http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170524.zip']

for link in final_links_list:
print(link)
get_zip_data_from_url(link)

# Display the first couple of dataframes
for name, df in sorted(dfs.items())[:2]:
print('\n', name, '\n')
print(df)

这将显示以下内容:

http://www.nemweb.com.au/REPORTS/ARCHIVE/Dispatch_SCADA/PUBLIC_DISPATCHSCADA_20170524.zip

PUBLIC_DISPATCHSCADA_201705240010_0000000283857084.CSV

C NEMP.WORLD DISPATCHSCADA AEMO PUBLIC 2017/05/24 \
0 I DISPATCH UNIT_SCADA 1.0 SETTLEMENTDATE DUID
1 D DISPATCH UNIT_SCADA 1.0 2017/05/24 00:10:00 BARCSF1
2 D DISPATCH UNIT_SCADA 1.0 2017/05/24 00:10:00 BUTLERSG
.. .. ... ... ... ... ...
263 D DISPATCH UNIT_SCADA 1.0 2017/05/24 00:10:00 YWPS3
264 D DISPATCH UNIT_SCADA 1.0 2017/05/24 00:10:00 YWPS4
265 C END OF REPORT 267 NaN NaN NaN

00:05:08 0000000283857084 DISPATCHSCADA.1 0000000283857078
0 SCADAVALUE NaN NaN NaN
1 0 NaN NaN NaN
2 8.299998 NaN NaN NaN
.. ... ... ... ...
263 388.745570 NaN NaN NaN
264 391.568360 NaN NaN NaN
265 NaN NaN NaN NaN

[266 rows x 10 columns]

PUBLIC_DISPATCHSCADA_201705240015_0000000283857169.CSV

C NEMP.WORLD DISPATCHSCADA AEMO PUBLIC 2017/05/24 \
0 I DISPATCH UNIT_SCADA 1.0 SETTLEMENTDATE DUID
1 D DISPATCH UNIT_SCADA 1.0 2017/05/24 00:15:00 BARCSF1
2 D DISPATCH UNIT_SCADA 1.0 2017/05/24 00:15:00 BUTLERSG
.. .. ... ... ... ... ...
263 D DISPATCH UNIT_SCADA 1.0 2017/05/24 00:15:00 YWPS3
264 D DISPATCH UNIT_SCADA 1.0 2017/05/24 00:15:00 YWPS4
265 C END OF REPORT 267 NaN NaN NaN

00:10:08 0000000283857169 DISPATCHSCADA.1 0000000283857163
0 SCADAVALUE NaN NaN NaN
1 0 NaN NaN NaN
2 8.299998 NaN NaN NaN
.. ... ... ... ...
263 386.205080 NaN NaN NaN
264 389.592410 NaN NaN NaN
265 NaN NaN NaN NaN

[266 rows x 10 columns]

关于python - 如何使用 python 在 zip 中打开 zip 中的 csv?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50991084/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com