gpt4 book ai didi

python - ValueError ("No JSON object could be decoded") 使用 Python 2.6 和 utf-8

转载 作者:可可西里 更新时间:2023-11-01 15:52:32 25 4
gpt4 key购买 nike

我正在尝试为 hadoop 编写一组映射器/缩减器代码来计算推文中的单词数,但我遇到了一些问题。我输入的文件是收集到的tweet信息的JSON文件。我首先将默认编码设置为 utf-8,但是在运行我的代码时,我收到以下错误:

Traceback (most recent call last): File "./mapperworks2.py", line 211, in my_json_dict = json.loads(line) File "/usr/lib/python2.6/json/init.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib/python2.6/json/decoder.py", line 319, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.6/json/decoder.py", line 338, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

程序的代码在哪里

#!/usr/bin/python


import sys

import json

import string

reload(sys)
sys.setdefaultencoding('utf8')

stop_words = ['a',
'about',
'above',
'after',
'again',
'against',
'all',
'am',
'an',
'and',
'any',
'are',
"aren't",
'as',
'at',
'be',
'because',
'been',
'before',
'being',
'below',
'between',
'both',
'but',
'by',
"can't",
'cannot',
'could',
"couldn't",
'did',
"didn't",
'do',
'does',
"doesn't",
'yourselves']

numbers = ["0","1","2","3","4","5","6","7","8","9"]

def clean_word(word):
for c in string.punctuation:
word = word.replace(c,"")
for c in numbers:
word = word.replace(c,"")
return word

def dont_stop(word):
if word in stop_words or word == "":
return False
else:
return True



# input comes from STDIN (standard input)
for line in sys.stdin:
############
############
############
############
my_json_dict = json.loads(line)
line = my_json_dict['text'].lower()
############
############
############
############
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()
# increase counters
for word in words:
##################
##################
word = clean_word(word)
##################
##################
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
#
# tab-delimited; the trivial word count is 1
##################
##################
if dont_stop(word):
print '%s\t%s' % (word, 1)

当我不切换编码时(即注释掉 reload(sys) 和 sys.setdefaultencoding() 我遇到以下错误:

Traceback (most recent call last): File "./mapperworks2.py", line 236, in print '%s\t%s' % (word, 1) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position >3: ordinal not in range(128)

不确定如何解决这个问题,感谢任何帮助。

最佳答案

请参阅此处的讨论: Setting the correct encoding when piping stdout in Python

您的错误在于尝试打印要输出的 Unicode 字符串。

关于python - ValueError ("No JSON object could be decoded") 使用 Python 2.6 和 utf-8,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47757464/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com