gpt4 book ai didi

python - 如何打印包含一些俄语的 Pandas 数据框

转载 作者:太空宇宙 更新时间:2023-11-04 08:09:44 26 4
gpt4 key购买 nike

我正在处理以下类型的数据。

      itemid      category             subcategory               title    1 10000010   Транспорт     Автомобили с пробегом     Toyota Sera, 1991    2 10000025      Услуги         Предложения услуг         Монтаж кровли    3 10000094 Личные вещи Одежда, обувь, аксессуары      Костюм Steilmann    4 10000101   Транспорт     Автомобили с пробегом      Ford Focus, 2011    5 10000132   Транспорт     Запчасти и аксессуары       Турбина 3.0 Bar    6 10000152   Транспорт     Автомобили с пробегом ВАЗ 2115 Samara, 2005

现在我运行以下命令

    import pandas as pd    trainingData = pd.read_table("train.tsv",nrows=10, header=0,encoding='utf-8')    trainingData['itemid'].head()    0    10000010    1    10000025    2    10000094    3    10000101    4    10000132    Name: itemid

此时一切都很好,但是当我做类似的事情时

trainingData['itemid','category'].head()    Error:    ---------------------------------------------------------------------------    UnicodeDecodeError                        Traceback (most recent call last)    /home/vikram/Documents/Avito/ in ()    ----> 1 trainingData[['itemid','category']].head()    /usr/lib/python2.7/dist-packages/IPython/core/displayhook.pyc in __call__(self, result)        236             self.start_displayhook()        237             self.write_output_prompt()    --> 238             format_dict = self.compute_format_data(result)        239             self.write_format_data(format_dict)        240             self.update_user_ns(result)    /usr/lib/python2.7/dist-packages/IPython/core/displayhook.pyc in compute_format_data(self, result)        148             MIME type representation of the object.        149         """    --> 150         return self.shell.display_formatter.format(result)        151         152     def write_format_data(self, format_dict):    /usr/lib/python2.7/dist-packages/IPython/core/formatters.pyc in format(self, obj, include, exclude)        124                     continue        125             try:    --> 126                 data = formatter(obj)        127             except:        128                 # FIXME: log the exception    /usr/lib/python2.7/dist-packages/IPython/core/formatters.pyc in __call__(self, obj)        445                 type_pprinters=self.type_printers,        446                 deferred_pprinters=self.deferred_printers)    --> 447             printer.pretty(obj)        448             printer.flush()        449             return stream.getvalue()    /usr/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in pretty(self, obj)        352                 if callable(obj_class._repr_pretty_):        353                     return obj_class._repr_pretty_(obj, self, cycle)    --> 354             return _default_pprint(obj, self, cycle)        355         finally:        356             self.end_group()    /usr/lib/python2.7/dist-packages/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)        472     if getattr(klass, '__repr__', None) not in _baseclass_reprs:        473         # A user-provided repr.    --> 474         p.text(repr(obj))        475         return        476     p.begin_group(1, ' 456                 self.to_string(buf=buf)        457                 value = buf.getvalue()        458                 if max([len(l) for l in value.split('\n')]) > terminal_width:    /usr/lib/pymodules/python2.7/pandas/core/frame.pyc in to_string(self, buf, columns, col_space, colSpace, header, index, na_rep, formatters, float_format, sparsify, nanRep, index_names, justify, force_unicode)       1024                                            index_names=index_names,       1025                                            header=header, index=index)    -> 1026         formatter.to_string(force_unicode=force_unicode)       1027        1028         if buf is None:    /usr/lib/pymodules/python2.7/pandas/core/format.pyc in to_string(self, force_unicode)        176             for i, c in enumerate(self.columns):        177                 if self.header:    --> 178                     fmt_values = self._format_col(c)        179                     cheader = str_columns[i]        180                     max_len = max(max(len(x) for x in fmt_values),    /usr/lib/pymodules/python2.7/pandas/core/format.pyc in _format_col(self, col)        217                             float_format=self.float_format,        218                             na_rep=self.na_rep,    --> 219                             space=self.col_space)        220         221     def to_html(self):    /usr/lib/pymodules/python2.7/pandas/core/format.pyc in format_array(values, formatter, float_format, na_rep, digits, space, justify)        424                         justify=justify)        425     --> 426     return fmt_obj.get_result()        427         428     /usr/lib/pymodules/python2.7/pandas/core/format.pyc in get_result(self)        471                 fmt_values.append(float_format(v))        472             else:    --> 473                 fmt_values.append(' %s' % _format(v))        474         475         return _make_fixed_width(fmt_values, self.justify)    /usr/lib/pymodules/python2.7/pandas/core/format.pyc in _format(x)        457             else:        458                 # object dtype    --> 459                 return '%s' % formatter(x)        460         461         vals = self.values    /usr/lib/pymodules/python2.7/pandas/core/common.pyc in _stringify(col)        503 def _stringify(col):        504     # unicode workaround    --> 505     return unicode(col)        506         507 def _maybe_make_list(obj):UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

请帮我正确“显示”数据。

最佳答案

我遇到了由 IPython 引起的相同问题,它无法显示 Pandas head() 函数返回的非 ASCII 文本。事实证明,Python 的默认编码在我的机器上设置为 'ascii'。你可以检查这个

import sys
sys.getdefaultencoding()

解决方案是将默认编码重新设置为 UTF-8:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

在此之后,IPython 可以正确显示带有非 ASCII 字符的 Pandas 数据帧。

请注意,reload 调用是使 setdefaultencoding 功能可用所必需的。没有它你会得到错误:

AttributeError: 'module' object has no attribute 'setdefaultencoding'

关于python - 如何打印包含一些俄语的 Pandas 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24894213/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com