gpt4 book ai didi

python - 谷歌应用引擎 : UnicodeDecode Error in bulk data upload

转载 作者:太空宇宙 更新时间:2023-11-04 06:39:25 26 4
gpt4 key购买 nike

我在 Windows 上使用 Google App Engine devserver 1.3.5 和 Python 2.5.4 时遇到奇怪的错误。

CSV 中的示例行:

EQS,550,foobar,"<some><html><garbage /></html></some>",odp,Ti4=,http://url.com,success

错误:

..................................................................................................................[ERROR   ] [Thread-1] WorkerThread:
Traceback (most recent call last):
File "C:\Program Files\Google\google_appengine\google\appengine\tools\adaptive_thread_pool.py", line 150, in WorkOnItems
status, instruction = item.PerformWork(self.__thread_pool)
File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 695, in PerformWork
transfer_time = self._TransferItem(thread_pool)
File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 852, in _TransferItem
self.request_manager.PostEntities(self.content)
File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 1296, in PostEntities
datastore.Put(entities)
File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore.py", line 282, in Put
req.entity_list().extend([e._ToPb() for e in entities])
File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore.py", line 687, in _ToPb
properties = datastore_types.ToPropertyPb(name, values)
File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore_types.py", line 1499, in ToPropertyPb
pbvalue = pack_prop(name, v, pb.mutable_value())
File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore_types.py", line 1322, in PackString
pbvalue.set_stringvalue(unicode(value).encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 36: ordinal not in range(128)
[INFO ] Unexpected thread death: Thread-1
[INFO ] An error occurred. Shutting down...
..[ERROR ] Error in Thread-1: 'ascii' codec can't decode byte 0xe8 in position 36: ordinal not in range(128)

错误是否是由 base64 字符串的问题产生的,其中每一行都有一个?

KGxwMAoobHAxCihTJ0JJT0VFJwpwMgpJMjYxMAp0cDMKYWEu

KGxwMAoobHAxCihTJ01BVEgnCnAyCkkyOTQwCnRwMwphYS4=

数据加载器:

class CourseLoader(bulkloader.Loader):
def __init__(self):
bulkloader.Loader.__init__(self, 'Course',
[('dept_code', str),
('number', int),
('title', str),
('full_description', str),
('unparsed_pre_reqs', str),
('pickled_pre_reqs', lambda x: base64.b64decode(x)),
('course_catalog_url', str),
('parse_succeeded', lambda x: x == 'success')
])

loaders = [CourseLoader]

有没有办法从回溯中判断是哪一行导致了错误?

更新:看起来有两个字符导致错误:è®。我怎样才能让 Google App Engine 处理它们?

最佳答案

看起来 CSV 的某行有一些非 ascii 数据(可能是 LATIN SMALL LETTER E WITH GRAVE -- 这就是 0xe8 在 ISO-8859 中的内容-1,例如),但您将其映射到 str(应该是 unicode,我相信 CSV 应该是 utf-8)。

要查找文本文件的任何行是否包含非 ascii 数据,一个简单的 Python 代码片段会有所帮助,例如:

>>> f = open('thefile.csv')
>>> prob = []
>>> for i, line in enumerate(f):
... try: unicode(line)
... except: prob.append(i)
...
>>> print 'Problems in %d lines:' % len(prob)
>>> print prob

关于python - 谷歌应用引擎 : UnicodeDecode Error in bulk data upload,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3170489/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com