gpt4 book ai didi

python - 下载一个多月前的历史推文

转载 作者:行者123 更新时间:2023-11-28 16:28:34 25 4
gpt4 key购买 nike

我正在尝试下载特定日期范围内过去几个月的推文。我只能在一周内下载,但不能超过一周。

代码:

import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pandas as pd
import json
import csv
import sys
import time

ckey = 'key'
csecret = 'key'
atoken = 'key'
asecret = 'key'

def toDataFrame(tweets):

DataSet = pd.DataFrame()

DataSet['tweetID'] = [tweet.id for tweet in tweets]
DataSet['tweetText'] = [tweet.text.encode('utf-8') for tweet in tweets]
DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet in tweets]
DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet in tweets]
DataSet['tweetSource'] = [tweet.source for tweet in tweets]
DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
DataSet['userID'] = [tweet.user.id for tweet in tweets]
DataSet['userScreen'] = [tweet.user.screen_name for tweet in tweets]
DataSet['userName'] = [tweet.user.name for tweet in tweets]
DataSet['userCreateDt'] = [tweet.user.created_at for tweet in tweets]
DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet in tweets]
DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet in tweets]
DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
DataSet['userTimezone'] = [tweet.user.time_zone for tweet in tweets]
DataSet['Coordinates'] = [tweet.coordinates for tweet in tweets]
DataSet['GeoEnabled'] = [tweet.user.geo_enabled for tweet in tweets]
DataSet['Language'] = [tweet.user.lang for tweet in tweets]
tweets_place= []
#users_retweeted = []
for tweet in tweets:
if tweet.place:
tweets_place.append(tweet.place.full_name)
else:
tweets_place.append('null')
DataSet['TweetPlace'] = [i for i in tweets_place]
#DataSet['UserWhoRetweeted'] = [i for i in users_retweeted]

return DataSet

OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret,'access_token_key':atoken, 'access_token_secret':asecret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])
#auth = tweepy.AppAuthHandler('key', 'key')

api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
if (not api):
print ("Can't Authenticate")
sys.exit(-1)
else:
# I am trying to download from Dec 1st to Dec 7th but I am not able to

cursor = tweepy.Cursor(api.search, q='#chennairains OR #chennaihelp OR #chennaifloods',since= '2015-12-20',until='2015-12-21',lang='en',count=100)
results=[]
for item in cursor.items():
results.append(item)

DataSet = toDataFrame(results)
DataSet.to_csv('output.csv',index=False)

该程序可以很好地下载一周内的数据,但无法下载超过一周的数据。我确实尝试在此处引用一些帖子,但其中大部分都没有得到答复。

最佳答案

Twitter 限制从其 REST API 返回的数据量,Tweepy 的 API 类使用 Twitter 的 REST API .

来自 https://dev.twitter.com/overview/general/things-every-developer-should-know :

There are pagination limits Rest API Limit Clients may access a theoretical maximum of 3,200 statuses via the page and count parameters for the user_timeline REST API methods. Other timeline methods have a theoretical maximum of 800 statuses. Requests for more than the limit will result in a reply with a status code of 200 and an empty result in the format requested. Twitter still maintains a database of all the tweets sent by a user. However, to ensure performance of the site, this artificial limit is temporarily in place.

如果您想要获得更长时间的回顾,Gnip 和 DataSift 等付费服务可以提供此数据。

关于python - 下载一个多月前的历史推文,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34423077/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com