gpt4 book ai didi

python - 如何使用 csv.DictReader 读取存储在 S3 中的 csv?

转载 作者:太空狗 更新时间:2023-10-29 20:39:20 34 4
gpt4 key购买 nike

我有获取 AWS S3 对象的代码。我如何使用 Python 的 csv.DictReader 读取这个 StreamingBody?

import boto3, csv

session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>)
s3_resource = session.resource('s3')
s3_object = s3_resource.Object(<bucket>, <key>)
streaming_body = s3_object.get()['Body']

#csv.DictReader(???)

最佳答案

代码应该是这样的:

import boto3
import csv

# get a handle on s3
s3 = boto3.resource(u's3')

# get a handle on the bucket that holds your file
bucket = s3.Bucket(u'bucket-name')

# get a handle on the object you want (i.e. your file)
obj = bucket.Object(key=u'test.csv')

# get the object
response = obj.get()

# read the contents of the file and split it into a list of lines

# for python 2:
lines = response[u'Body'].read().split()

# for python 3 you need to decode the incoming bytes:
lines = response['Body'].read().decode('utf-8').split()

# now iterate over those lines
for row in csv.DictReader(lines):

# here you get a sequence of dicts
# do whatever you want with each line here
print(row)

您可以在实际代码中稍微压缩一下,但我尽量保持它一步一步显示 boto3 的对象层次结构。

根据您关于避免将整个文件读入内存的评论进行编辑:我没有遇到过该要求,所以不能权威地说,但我会尝试包装流以便获得文本文件- 像迭代器。例如,您可以使用 codecs库用类似的东西替换上面的 csv 解析部分:

for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])):
print(row)

关于python - 如何使用 csv.DictReader 读取存储在 S3 中的 csv?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42312196/

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com