gpt4 book ai didi

Python 2.7 CSV文件读/写\xef\xbb\xbf代码

转载 作者:行者123 更新时间:2023-11-28 20:14:24 27 4
gpt4 key购买 nike

我有一个关于 Python 2.7 读/写 csv 文件的问题,代码为“utf-8-sig”,我的 csv。 header 是

['\xef\xbb\xbfID;timestamp;CustomerID;Email']

有一些代码("\xef\xbb\xbfID") 我从文件 A.csv 中读取,我想将相同的代码和 header 写入文件 B.csv

我的打印日志显示:

['\xef\xbb\xbfID;timestamp;CustomerID;Email']

但是实际输出的文件头是这样的

ÔªøID;timestamp

enter image description here

代码如下:

def remove_gdpr_info_from_csv(file_path, file_name, temp_folder, original_header):
new_temp_folder = tempfile.mkdtemp()
new_temp_file = new_temp_folder + "/" + file_name
# Blanked new file
with open(new_temp_file, 'wb') as outfile:
writer = csv.writer(outfile, delimiter=";")
print original_header
writer.writerow(original_header)
# File from SFTP
with open(file_path, 'r') as infile:
reader = csv.reader(infile, delimiter=";")
first_row = next(reader)
email = first_row.index('Email')
contract_detractor1 = first_row.index('Contact Detractor (Q21)')
contract_detractor2 = first_row.index('Contact Detractor (Q20)')
contract_detractor3 = first_row.index('Contact Detractor (Q43)')
contract_detractor4 = first_row.index('Contact Detractor(Q26)')
contract_detractor5 = first_row.index('Contact Detractor(Q27)')
contract_detractor6 = first_row.index('Contact Detractor(Q44)')
indexes = []
for column_name in header_list:
ind = first_row.index(column_name)
indexes.append(ind)

for row in reader:
output_row = []
for ind in indexes:
data = row[ind]
if ind == email:
data = ''
elif ind == contract_detractor1:
data = ''
elif ind == contract_detractor2:
data = ''
elif ind == contract_detractor3:
data = ''
elif ind == contract_detractor4:
data = ''
elif ind == contract_detractor5:
data = ''
elif ind == contract_detractor6:
data = ''
output_row.append(data)
writer.writerow(output_row)
s3core.upload_files(SPARKY_S3, DESTINATION_PATH, new_temp_file)
shutil.rmtree(temp_folder)
shutil.rmtree(new_temp_folder)

最佳答案

'\xef\xbb\xbf' 是 unicode ZERO WIDTH NO-BREAK SPACE U+FEFF 的 UTF8 编码版本。它通常用作 unicode 文本文件开头的字节顺序标记:

  • 当你有3个字节时:'\xef\xbb\xbf',那么文件是utf8编码的
  • 当你有 2 个字节时:'\xff\xfe',那么文件是 utf16 little endian
  • 当你有 2 个字节时:'\xfe\xff',那么文件是 utf16 big endian

'utf-8-sig' 编码明确要求在文件开头写入此 BOM

要在 Python 2 中读取 csv 文件时自动处理它,您可以使用编解码器模块:

with open(file_path, 'r') as infile:
reader = csv.reader(codecs.EncodedFile(infile, 'utf-8', 'utf-8-sig'), delimiter=";")

EncodedFile 将通过在 utf8-sig 中解码来包装原始文件对象,实际上跳过 BOM 并将其重新编码为 utf8没有 BOM。

关于Python 2.7 CSV文件读/写\xef\xbb\xbf代码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50130605/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com