gpt4 book ai didi

python - 如何在 Python 和 Postgres 中处理批量数据库导入中的重音字符

转载 作者:行者123 更新时间:2023-11-29 12:22:55 25 4
gpt4 key购买 nike

在 Python (openblock) 中运行批量导入脚本时,我得到以下用于编码“UTF8”的无效字节序列:重音字符的 0xca4e 错误:

它显示为:GRAND-CH?NE, COUR DU

但实际上是“GRAND-CHÊNE, COUR DU”

处理此问题的最佳方法是什么?理想情况下,我想保留重音字符。我怀疑我需要以某种方式对其进行编码?

编辑:?实际上应该是Ê。另请注意,该变量来自 ESRI Shapefile。当我尝试 davidcrow 的解决方案时,我得到“不支持 Unicode”,因为可能没有重音字符的字符串已经是 Unicode 字符串。

这是我正在使用的 ESRIImporter 代码:

from django.contrib.gis.gdal import DataSource

class EsriImporter(object):
def __init__(self, shapefile, city=None, layer_id=0):
print >> sys.stderr, 'Opening %s' % shapefile
ds = DataSource(shapefile)

self.layer = ds[layer_id]
self.city = "OTTAWA" #city and city or Metro.objects.get_current().name
self.fcc_pat = re.compile('^(' + '|'.join(VALID_FCC_PREFIXES) + ')\d$')

def save(self, verbose=False):
alt_names_suff = ('',)
num_created = 0
for i, feature in enumerate(self.layer):
#if not self.fcc_pat.search(feature.get('FCC')):
# continue
parent_id = None
fields = {}
for esri_fieldname, block_fieldname in FIELD_MAP.items():
value = feature.get(esri_fieldname)
#print >> sys.stderr, 'Looking at %s' % esri_fieldname

if isinstance(value, basestring):
value = value.upper()
elif isinstance(value, int) and value == 0:
value = None
fields[block_fieldname] = value
if not ((fields['left_from_num'] and fields['left_to_num']) or
(fields['right_from_num'] and fields['right_to_num'])):
continue
# Sometimes the "from" number is greater than the "to"
# number in the source data, so we swap them into proper
# ordering
for side in ('left', 'right'):
from_key, to_key = '%s_from_num' % side, '%s_to_num' % side
if fields[from_key] > fields[to_key]:
fields[from_key], fields[to_key] = fields[to_key], fields[from_key]
if feature.geom.geom_name != 'LINESTRING':
continue
for suffix in alt_names_suff:
name_fields = {}
for esri_fieldname, block_fieldname in NAME_FIELD_MAP.items():
key = esri_fieldname + suffix
name_fields[block_fieldname] = feature.get(key).upper()
#if block_fieldname == 'postdir':
#print >> sys.stderr, 'Postdir block %s' % name_fields[block_fieldname]


if not name_fields['street']:
continue
# Skip blocks with bare number street names and no suffix / type
if not name_fields['suffix'] and re.search('^\d+$', name_fields['street']):
continue
fields.update(name_fields)
block = Block(**fields)
block.geom = feature.geom.geos
print repr(fields['street'])
print >> sys.stderr, 'Looking at block %s' % unicode(fields['street'], errors='replace' )

street_name, block_name = make_pretty_name(
fields['left_from_num'],
fields['left_to_num'],
fields['right_from_num'],
fields['right_to_num'],
'',
fields['street'],
fields['suffix'],
fields['postdir']
)
block.pretty_name = unicode(block_name)
#print >> sys.stderr, 'Looking at block pretty name %s' % fields['street']

block.street_pretty_name = street_name
block.street_slug = slugify(' '.join((unicode(fields['street'], errors='replace' ), fields['suffix'])))
block.save()
if parent_id is None:
parent_id = block.id
else:
block.parent_id = parent_id
block.save()
num_created += 1
if verbose:
print >> sys.stderr, 'Created block %s' % block
return num_created

输出:

'GRAND-CH\xcaNE, COUR DU'
Looking at block GRAND-CH�NE, COUR DU
Traceback (most recent call last):

File "../blocks_ottawa.py", line 144, in <module>
sys.exit(main())
File "../blocks_ottawa.py", line 139, in main
num_created = esri.save(options.verbose)
File "../blocks_ottawa.py", line 114, in save
block.save()
File "/home/chris/openblock/src/django/django/db/models/base.py", line 434, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/home/chris/openblock/src/django/django/db/models/base.py", line 527, in save_base
result = manager._insert(values, return_id=update_pk, using=using)
File "/home/chris/openblock/src/django/django/db/models/manager.py", line 195, in _insert
return insert_query(self.model, values, **kwargs)
File "/home/chris/openblock/src/django/django/db/models/query.py", line 1479, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "/home/chris/openblock/src/django/django/db/models/sql/compiler.py", line 783, in execute_sql
cursor = super(SQLInsertCompiler, self).execute_sql(None)
File "/home/chris/openblock/src/django/django/db/models/sql/compiler.py", line 727, in execute_sql
cursor.execute(sql, params)
File "/home/chris/openblock/src/django/django/db/backends/util.py", line 15, in execute
return self.cursor.execute(sql, params)
File "/home/chris/openblock/src/django/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
return self.cursor.execute(query, args)

django.db.utils.DatabaseError: invalid byte sequence for encoding "UTF8": 0xca4e
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

最佳答案

请提供更多信息。什么平台 - Windows/Linux/???

什么版本的 Python?

如果您运行的是 Windows,则您的编码更有可能是 cp1252 或类似ISO-8859-1。绝对不是 UTF-8

您将需要:(1) 找出输入数据的编码方式。试试 cp1252;这是通常的嫌疑人。 (2) 将您的数据解码为 un​​icode (3) 将其编码为 UTF-8。

您如何从 ESRI shapefile 中获取数据?显示你的代码。显示完整的回溯和错误消息。为避免视觉问题(它是 E-grave!不,它是 E-acute!)print repr(the_suspect_data) 并将结果复制/粘贴到您的问题的编辑中。大胆使用粗体字。

关于python - 如何在 Python 和 Postgres 中处理批量数据库导入中的重音字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4362716/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com