gpt4 book ai didi

python - Picasa 相册标题编码。不是统一码?

转载 作者:太空宇宙 更新时间:2023-11-04 06:26:53 27 4
gpt4 key购买 nike

我为 Google 的 Picasa 服务编写了一个简单的客户端。我想要的是创建一个带有相册标题名称的文件夹,并将原始照片从服务下载到该文件夹​​。如果标题中有任何非拉丁字符,我会得到一个 IOError:

IOError: [Errno 2] No such file or directory: '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c\Autumnal-Equinox.jpg'

代码示例:

import gdata.photos.service
import gdata.media
import os
import urllib2

gd_client = gdata.photos.service.PhotosService()

username = 'cha.com.ua'
albums = gd_client.GetUserFeed(user=username)
for album in albums.entry:
photos = gd_client.GetFeed(
'/data/feed/api/user/%s/albumid/%s?kind=photo' % (
username, album.gphoto_id.text))

for photo in photos.entry:
destination = os.path.join(album.title.text, photo.title.text)
out = open(destination, 'wb')
out.write(urllib2.urlopen(photo.content.src).read())
out.close()

我尝试用 .decode('utf-8') 解码标题,它不起作用。

最佳答案

你说:

@rocksportrocker repr(album.title.text) returns str:
'\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0'

@d-k Yep, I've tried it. The result is the same.
For example repr(album.title.text.encode('utf-8')) returns str:
'\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0'

这不可能是真的。如果第一个陈述是正确的,第二个将导致:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

您的 str 对象似乎是 UTF-8 编码的西里尔字符串:

>>> foo = '\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0'
>>> from unicodedata import name
>>> for uc in foo.decode('utf8'):
... print "U+%04X" % ord(uc), name(uc)
...
U+0412 CYRILLIC CAPITAL LETTER VE
U+0438 CYRILLIC SMALL LETTER I
U+0434 CYRILLIC SMALL LETTER DE
U+0020 SPACE
U+0438 CYRILLIC SMALL LETTER I
U+0437 CYRILLIC SMALL LETTER ZE
U+0020 SPACE
U+043E CYRILLIC SMALL LETTER O
U+043A CYRILLIC SMALL LETTER KA
U+043D CYRILLIC SMALL LETTER EN
U+0430 CYRILLIC SMALL LETTER A
>>>

此外,上面的内容与错误消息中的文本完全不同:'\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c\Autumnal-Equinox.jpg'

>>> bar =  '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c\Autumnal-Equinox.jpg'
>>> for uc in bar.decode('utf8'):
... print "U+%04X" % ord(uc), name(uc)
...
U+041E CYRILLIC CAPITAL LETTER O
U+0441 CYRILLIC SMALL LETTER ES
U+0435 CYRILLIC SMALL LETTER IE
U+043D CYRILLIC SMALL LETTER EN
U+044C CYRILLIC SMALL LETTER SOFT SIGN
U+005C REVERSE SOLIDUS
U+0041 LATIN CAPITAL LETTER A
U+0075 LATIN SMALL LETTER U
U+0074 LATIN SMALL LETTER T
# snipped the remainder

REVERSE SOLIDUS(反斜杠)表示您在 Windows 上运行。 Windows 只是不理解 UTF-8。在输入时将所有文本转换为 Unicode。对所有路径和文件名使用 Unicode。有效的简单示例:

>>> bar =  '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c.txt'
>>> ubar = bar.decode('utf8')
>>> print repr(ubar)
u'\u041e\u0441\u0435\u043d\u044c.txt'
>>> f = open(ubar, 'wb')
>>> f.write('hello\n')
>>> f.close()
>>> open(ubar, 'rb').read()
'hello\n'

关于python - Picasa 相册标题编码。不是统一码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7366197/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com