gpt4 book ai didi

python - python CSV 和 JSON 文件的编码/解码故障排除

转载 作者:太空宇宙 更新时间:2023-11-04 05:24:36 26 4
gpt4 key购买 nike

我最初转储了一个包含特定句子的文件,使用:

 with open(labelFile, "wb") as out:
json.dump(result, out,indent=4)

JSON 中的这句话看起来像:

"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .", 

然后我继续通过以下方式加载它:

with open(sys.argv[1]) as sentenceFile:
sentenceFile = json.loads(sentenceFile.read())

对其进行处理,然后使用以下方法将其写入 CSV:

with open(sys.argv[2], 'wb') as csvfile:
fieldnames = ['x','y','z'
]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for sentence in sentence2locations2values:
sentence = unicode(sentence['parsedSentence']).encode("utf-8")
writer.writerow({'x': sentence})

在Excel for Mac中打开的CSV文件中的句子是:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .

然后我继续将它从 Mac 版 Excel 转移到 Google 表格,它位于:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .

注意,略有不同, 已替换 Ã

然后标记它,将它带回 Excel for Mac,此时它又变成了:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .

我最初如何阅读 CSV,包含如下句子:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .

值为:

"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating 45,000 per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in Hong Kong are granted a `` one way permit '' .", 

以便它与问题开头的原始 json 转储中的内容相匹配?

编辑

我从这里检查并发现 \u00c3Ã 的编码,即 Google 表格中的格式,实际上是 Latin 8。

编辑

我运行了 enca 并看到原始转储文件是 7 位 ASCII 字符,而我的 CSV 是 unicode。所以我需要作为 unicode 加载并转换为 7 位 ASCII?

最佳答案

我找到了解决这个问题的方法。解决方案是将 CSV 文件从其原始格式(标识为 UTF-8)解码,然后句子变为原始格式。所以:

csvfile = open(sys.argv[1], 'r')

fieldnames = ("x","y","z")
reader = csv.DictReader(csvfile, fieldnames)
next(reader)

for i,row in enumerate(reader):
row['x'] = row['x'].decode("utf-8")

非常奇怪的是,当我在 Excel for Mac 中编辑 CSV 文件并保存时,每次它似乎都转换为不同的编码。我警告其他用户注意这一点,因为这非常令人头疼。

关于python - python CSV 和 JSON 文件的编码/解码故障排除,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39229646/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com