gpt4 book ai didi

python - 如何只从谷歌云存储中读取 csv 的第一行?

转载 作者:太空宇宙 更新时间:2023-11-04 00:07:19 26 4
gpt4 key购买 nike

我看过这个问题:How to read first 2 rows of csv from Google Cloud Storage

但就我而言,我不想将整个 csv blob 加载到内存中,因为它可能很大。有什么方法可以将它作为一些可迭代的(或类似文件的对象)打开,并且只读取前几行的字节吗?

最佳答案

想通过在我们不知道 CSV header 大小的情况下如何创建可迭代对象的示例来扩展 simzes 的答案。也可用于逐行从数据存储中读取 CSV:

def get_csv_header(blob):
for line in csv.reader(blob_lines(blob)):
return line


# How much bytes of blob download using one request.
# Selected experimentally. If there is more optimal value for this - please update.
BLOB_CHUNK_SIZE = 2000


def blob_lines(blob: storage.blob.Blob) -> Generator[str, None, None]:
position = 0
buff = []
while True:
chunk = blob.download_as_string(start=position, end=position + BLOB_CHUNK_SIZE).decode()
if '\n' in chunk:
part1, part2 = chunk.split('\n', 1)
buff.append(part1)
yield ''.join(buff)
parts = part2.split('\n')
for part in parts[:-1]:
yield part
buff = [parts[-1]]
else:
buff.append(chunk)

position += BLOB_CHUNK_SIZE + 1 # Blob chunk is downloaded using closed interval
if len(chunk) < BLOB_CHUNK_SIZE:
yield ''.join(buff)
return

关于python - 如何只从谷歌云存储中读取 csv 的第一行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53657130/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com