gpt4 book ai didi

python - urllib.request 中的 Unicode 字符串

转载 作者:太空宇宙 更新时间:2023-11-03 16:45:00 25 4
gpt4 key购买 nike

简短版本:我有一个变量 s = 'bär' 。我需要转换s转换为 ASCII 以便 s = 'b%C3%A4r' .

长版:

我正在使用urllib.request.urlopen()从 URL 读取 mp3 发音文件。这非常有效,只是我遇到了一个问题,因为 URL 通常包含 unicode 字符。例如,德语“Bär”。完整网址为https://d7mj4aqfscim2.cloudfront.net/tts/de/token/bär 。事实上,将其作为 URL 输入 Chrome 可以正常工作,并且可以毫无问题地导航到 mp3 文件。但是,将相同的 URL 提供给 urllib产生了一个问题。

我确定这是一个 unicode 问题,因为堆栈跟踪显示:

Traceback (most recent call last):
File "importer.py", line 145, in <module>
download_file(tuple[1], tuple[0], ".mp3")
File "importer.py", line 81, in download_file
with urllib.request.urlopen(url) as in_stream, open(to_fname+ext, 'wb') as out_file: #`with object as name:` safely __enter__() and __exit__() the runtime of object. `as` assigns `name` as referring to the object `object`.
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 465, in open
response = self._open(req, data)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 483, in _open
'_open', req)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1283, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1240, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1083, in request
self._send_request(method, url, body, headers)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1118, in _send_request
self.putrequest(method, url, **skips)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 960, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 19: ordinal not in range(128)

...除了明显的 UnicodeEncodeError 之外,我可以看到它正在尝试 encode()到 ASCII。

有趣的是,当我从 Chrome 复制 URL(而不是简单地将其输入到 Python 解释器中)时,它会翻译 bärb%C3%A4r 。当我将其输入 urllib.request.urlopen() 时,它处理得很好,因为所有这些字符都是 ASCII。所以我的目标是在我的程序中进行这种转换。我试图将原始字符串转换为等效的 unicode,但是 unicodedata.normalize()它的所有变体都不起作用;此外,我不确定如何将 Unicode 存储为 ASCII,因为 Python 3 将所有字符串存储为 Unicode,因此不会尝试转换文本。

最佳答案

使用urllib.parse.quote :

>>> urllib.parse.quote('bär')
'b%C3%A4r'
<小时/>
>>> urllib.parse.urljoin('https://d7mj4aqfscim2.cloudfront.net/tts/de/token/',
... urllib.parse.quote('bär'))
'https://d7mj4aqfscim2.cloudfront.net/tts/de/token/b%C3%A4r'

关于python - urllib.request 中的 Unicode 字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36395705/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com