gpt4 book ai didi

尝试解析 JSON 推文时出现 Python 编码问题

转载 作者:行者123 更新时间:2023-12-01 04:40:03 24 4
gpt4 key购买 nike

我尝试使用以下代码解析从 Twitter 返回的 JSON 对象的推文和用户名部分:

class listener(StreamListener):

def on_data(self, data):
all_data = json.loads(data)
tweet = all_data["text"]
username = all_data["user"]["screen_name"]

c.execute("INSERT INTO tweets (tweet_time, username, tweet) VALUES (%s,%s,%s)" ,
(time.time(), username, tweet))
print (username, tweet)
return True

def on_error(self, status):
print (status)

auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track = ["LeBron James"])

但是我收到以下错误。如何调整代码以正确解码或编码响应?

Traceback (most recent call last):
File "C:/Users/sagars/PycharmProjects/YouTube NLP Lessons/Twitter Stream to DB.py", line 45, in <module>
twitterStream.filter(track = ["LeBron James"])
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 428, in filter
self._start(async)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 346, in _start
self._run()
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 286, in _run
raise exception
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 255, in _run
self._read_loop(resp)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 309, in _read_loop
self._data(next_status_obj)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 289, in _data
if self.listener.on_data(data) is False:
File "C:/Users/sagars/PycharmProjects/YouTube NLP Lessons/Twitter Stream to DB.py", line 36, in on_data
print (username, tweet)
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to <undefined>

最佳答案

不幸的是,问题是您从 Twitter 获取的信息不是 utf-8 编码的,这会导致您收到 charmap 错误。要解决这个问题,您需要对其进行编码。

tweet = all_data["text"].encode('utf-8')
username = all_data["user"]["screen_name"].encode('utf-8')

这将导致您丢失推文中出现的一些表情符号和特殊字符,它将被转换为 \x899。如果您确实需要这些信息(我自己放弃了)进行情感分析,那么您需要安装一个带有预编译列表的包来相应地转换它们。

关于尝试解析 JSON 推文时出现 Python 编码问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30928334/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com