gpt4 book ai didi

python - 为什么此 Python 3 代码无法使用 str.translate() 删除 Unicode 重音字符?

转载 作者:太空宇宙 更新时间:2023-11-03 15:13:11 26 4
gpt4 key购买 nike

我正在尝试像这样规范化 Python 3 中字符串中的重音字符:

from bs4 import BeautifulSoup
import os

def process_markup():
#the file is utf-8 encoded
fn = os.path.join(os.path.dirname(__file__), 'src.txt') #
markup = BeautifulSoup(open(fn), from_encoding="utf-8")

for player in markup.find_all("div", class_="glossary-player"):
text = player.span.string
print(format_filename(text)) # Python console shows mangled characters not in utf-8
player.span.string.replace_with(format_filename(text))

dest = open("dest.txt", "w", encoding="utf-8")
dest.write(str(markup))

def format_filename(s):
# prepare string
s = s.strip().lower().replace(" ", "-").strip("'")

# transliterate accented characters to non-accented versions
chars_in = "àèìòùáéíóú"
chars_out = "aeiouaeiou"
no_accented_chars = str.maketrans(chars_in, chars_out)
return s.translate(no_accented_chars)

process_markup()

输入的 src.txt 文件是 utf-8 编码的:

<div class="glossary-player">
<span class="gd"> Fàilte </span><span class="en"> Welcome </span>
</div>
<div class="glossary-player">
<span class="gd"> àèìòùáéíóú </span><span class="en"> aeiouaeiou </span>
</div>

输出文件 dest.txt 如下所示:

<div class="glossary-player">
<span class="gd">fã ilte</span><span class="en"> Welcome </span>
</div>
<div class="glossary-player">
<span class="gd">ã ã¨ã¬ã²ã¹ã¡ã©ã­ã³ãº</span><span class="en"> aeiouaeiou </span>
</div>

我正试图让它看起来像这样:

<div class="glossary-player">
<span class="gd">failte</span><span class="en"> Welcome </span>
</div>
<div class="glossary-player">
<span class="gd">aeiouaeiou</span><span class="en"> aeiouaeiou </span>
</div>

我知道有像 unidecode 这样的解决方案,但只是想找出我在这里做错了什么。

最佳答案

chars.translate(no_accented_chars) 不修改 chars。它返回一个应用了翻译的新字符串。如果要使用翻译后的字符串,请将其保存到一个变量(可能是原始的 chars 变量):

chars = chars.translate(no_accented_chars)

或者直接传递给write调用:

dest.write(chars.translate(no_accented_chars))

关于python - 为什么此 Python 3 代码无法使用 str.translate() 删除 Unicode 重音字符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24096960/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com