gpt4 book ai didi

python - 下载和解压缩 .zip 文件而不写入磁盘

转载 作者:IT老高 更新时间:2023-10-28 21:38:08 25 4
gpt4 key购买 nike

我已经设法让我的第一个 python 脚本工作,它从 URL 下载 .ZIP 文件列表,然后继续提取 ZIP 文件并将它们写入磁盘。

我现在不知如何实现下一步。

我的主要目标是下载和解压缩 zip 文件并通过 TCP 流传递内容(CSV 数据)。如果可以的话,我宁愿不实际将任何 zip 或解压缩文件写入磁盘。

这是我当前工作的脚本,但不幸的是必须将文件写入磁盘。

import urllib, urllister
import zipfile
import urllib2
import os
import time
import pickle

# check for extraction directories existence
if not os.path.isdir('downloaded'):
os.makedirs('downloaded')

if not os.path.isdir('extracted'):
os.makedirs('extracted')

# open logfile for downloaded data and save to local variable
if os.path.isfile('downloaded.pickle'):
downloadedLog = pickle.load(open('downloaded.pickle'))
else:
downloadedLog = {'key':'value'}

# remove entries older than 5 days (to maintain speed)

# path of zip files
zipFileURL = "http://www.thewebserver.com/that/contains/a/directory/of/zip/files"

# retrieve list of URLs from the webservers
usock = urllib.urlopen(zipFileURL)
parser = urllister.URLLister()
parser.feed(usock.read())
usock.close()
parser.close()

# only parse urls
for url in parser.urls:
if "PUBLIC_P5MIN" in url:

# download the file
downloadURL = zipFileURL + url
outputFilename = "downloaded/" + url

# check if file already exists on disk
if url in downloadedLog or os.path.isfile(outputFilename):
print "Skipping " + downloadURL
continue

print "Downloading ",downloadURL
response = urllib2.urlopen(downloadURL)
zippedData = response.read()

# save data to disk
print "Saving to ",outputFilename
output = open(outputFilename,'wb')
output.write(zippedData)
output.close()

# extract the data
zfobj = zipfile.ZipFile(outputFilename)
for name in zfobj.namelist():
uncompressed = zfobj.read(name)

# save uncompressed data to disk
outputFilename = "extracted/" + name
print "Saving extracted file to ",outputFilename
output = open(outputFilename,'wb')
output.write(uncompressed)
output.close()

# send data via tcp stream

# file successfully downloaded and extracted store into local log and filesystem log
downloadedLog[url] = time.time();
pickle.dump(downloadedLog, open('downloaded.pickle', "wb" ))

最佳答案

以下是我用来获取压缩 csv 文件的代码片段,请看一下:

Python 2:

from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen

resp = urlopen("http://www.test.com/file.zip")
myzip = ZipFile(StringIO(resp.read()))
for line in myzip.open(file).readlines():
print line

Python 3:

from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
# or: requests.get(url).content

resp = urlopen("http://www.test.com/file.zip")
myzip = ZipFile(BytesIO(resp.read()))
for line in myzip.open(file).readlines():
print(line.decode('utf-8'))

这里 file 是一个字符串。要获取您想要传递的实际字符串,您可以使用 zipfile.namelist()。例如,

resp = urlopen('http://mlg.ucd.ie/files/datasets/bbc.zip')
myzip = ZipFile(BytesIO(resp.read()))
myzip.namelist()
# ['bbc.classes', 'bbc.docs', 'bbc.mtx', 'bbc.terms']

关于python - 下载和解压缩 .zip 文件而不写入磁盘,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5710867/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com