gpt4 book ai didi

Python - 第一个单词未包含在搜索中

转载 作者:行者123 更新时间:2023-12-01 01:44:50 25 4
gpt4 key购买 nike

为什么第一个单词正在打印,但未包含在“dic”的搜索中。

谁能告诉我原因和解决方案如何也包含第一个单词?

这是我的代码:

my_dic = {
"a":"1",
"b":"2",
"c":"3",
"d":"4",
"e":"5",
}

with open('c:\\english_text_file.txt',encoding = 'utf8') as file :
for line in file:
for word in line.split():
print('word from line.split: ',word)
if word in my_dic.keys():
print('word from if word in ...',word)

and the test file is here:

文本文件的内容是:

a b c d e

输出代码为:

word from line.split:  a
word from line.split: b
word from if word in ... b
word from line.split: c
word from if word in ... c
word from line.split: d
word from if word in ... d
word from line.split: e
word from if word in ... e

最佳答案

这是因为Windows对于txt文件的一个行为:它会添加BOM到 txt 文件的开头。

什么是BOM

这意味着Byte-order mark Description ,取值如下:

Byte-order mark Description 
EF BB BF UTF-8
FF FE UTF-16 aka UCS-2, little endian
FE FF UTF-16 aka UCS-2, big endian
00 00 FF FE UTF-32 aka UCS-4, little endian.
00 00 FE FF UTF-32 aka UCS-4, big-endian.

打开您的english_text_file.txt ,然后使用任何十六进制编辑器查看它,您将看到内容是:

efbb bf61 2062 2063 2064 2065 0d0a

在这里,efbb bf是 BOM,61 2062 2063 2064 2065 0d0a的 ASCII 码是 a b c d e\r\n

所以对于utf-8文件,我们需要检查它是否有BOM开始时,如果有,需要将其删除。

接下来是一个示例代码供您引用,如果您不介意更改原始文件,也可以直接覆盖旧的输入文件,这里我只是使用一个不带 BOM 的新文件。就在其中。

import codecs

my_dic = {
"a":"1",
"b":"2",
"c":"3",
"d":"4",
"e":"5",
}

content = open('./english_text_file.txt', 'rb').read()
if content[:3] == codecs.BOM_UTF8:
content = content[3:]
open('./changed_english_text_file.txt', 'wb').write(content)
else:
open('./changed_english_text_file.txt', 'wb').write(content)

with open('./changed_english_text_file.txt',encoding = 'utf8') as file :
for line in file:
for word in line.split():
print('word from line.split: ',word)
if word in my_dic.keys():
print('word from if word in ...',word)

输出是:

word from line.split:  a
word from if word in ... a
word from line.split: b
word from if word in ... b
word from line.split: c
word from if word in ... c
word from line.split: d
word from if word in ... d
word from line.split: e
word from if word in ... e

关于Python - 第一个单词未包含在搜索中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51489432/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com