gpt4 book ai didi

google-trends - 自动从 Google Trends 中提取 csv 文件

转载 作者:行者123 更新时间:2023-12-02 15:19:25 25 4
gpt4 key购买 nike

pyGTrends 似乎不起作用。在 Python 中给出错误。

pyGoogleTrendsCsvDownloader 似乎可以工作,可以登录,但在收到 1-3 个请求(每天!)后,提示配额耗尽,即使使用相同登录名/IP 的手动下载工作完美。

底线:两者都不起作用。通过 stackoverflow 搜索:尝试从 Google 提取 csv 的人们提出了许多问题,但我找不到可行的解决方案...

预先感谢您:无论谁能够提供帮助。代码应该怎么改呢?您知道另一种可行的解决方案吗?

这是 pyGoogleTrendsCsvDownloader.py 的代码

    import httplib
import urllib
import urllib2
import re
import csv
import lxml.etree as etree
import lxml.html as html
import traceback
import gzip
import random
import time
import sys

from cookielib import Cookie, CookieJar
from StringIO import StringIO


class pyGoogleTrendsCsvDownloader(object):
'''
Google Trends Downloader
Recommended usage:
from pyGoogleTrendsCsvDownloader import pyGoogleTrendsCsvDownloader
r = pyGoogleTrendsCsvDownloader(username, password)
r.get_csv(cat='0-958', geo='US-ME-500')
'''
def __init__(self, username, password):
'''
Provide login and password to be used to connect to Google Trends
All immutable system variables are also defined here
'''

# The amount of time (in secs) that the script should wait before making a request.
# This can be used to throttle the downloading speed to avoid hitting servers too hard.
# It is further randomized.
self.download_delay = 0.25

self.service = "trendspro"
self.url_service = "http://www.google.com/trends/"
self.url_download = self.url_service + "trendsReport?"

self.login_params = {}
# These headers are necessary, otherwise Google will flag the request at your account level
self.headers = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'),
("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),
("Accept-Language", "en-gb,en;q=0.5"),
("Accept-Encoding", "gzip, deflate"),
("Connection", "keep-alive")]
self.url_login = 'https://accounts.google.com/ServiceLogin?service='+self.service+'&passive=1209600&continue='+self.url_service+'&followup='+self.url_service
self.url_authenticate = 'https://accounts.google.com/accounts/ServiceLoginAuth'
self.header_dictionary = {}

self._authenticate(username, password)

def _authenticate(self, username, password):
'''
Authenticate to Google:
1 - make a GET request to the Login webpage so we can get the login form
2 - make a POST request with email, password and login form input values
'''

# Make sure we get CSV results in English
ck = Cookie(version=0, name='I4SUserLocale', value='en_US', port=None, port_specified=False, domain='www.google.com', domain_specified=False,domain_initial_dot=False, path='/trends', path_specified=True, secure=False, expires=None, discard=False, comment=None, comment_url=None, rest=None)

self.cj = CookieJar()
self.cj.set_cookie(ck)
self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cj))
self.opener.addheaders = self.headers

# Get all of the login form input values
find_inputs = etree.XPath("//form[@id='gaia_loginform']//input")
try:
#
resp = self.opener.open(self.url_login)

if resp.info().get('Content-Encoding') == 'gzip':
buf = StringIO( resp.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
else:
data = resp.read()

xmlTree = etree.fromstring(data, parser=html.HTMLParser(recover=True, remove_comments=True))

for input in find_inputs(xmlTree):
name = input.get('name')
if name:
name = name.encode('utf8')
value = input.get('value', '').encode('utf8')
self.login_params[name] = value
except:
print("Exception while parsing: %s\n" % traceback.format_exc())

self.login_params["Email"] = username
self.login_params["Passwd"] = password

params = urllib.urlencode(self.login_params)
self.opener.open(self.url_authenticate, params)

def get_csv(self, throttle=False, **kwargs):
'''
Download CSV reports
'''

# Randomized download delay
if throttle:
r = random.uniform(0.5 * self.download_delay, 1.5 * self.download_delay)
time.sleep(r)

params = {
'export': 1
}
params.update(kwargs)
params = urllib.urlencode(params)

r = self.opener.open(self.url_download + params)

# Make sure everything is working ;)
if not r.info().has_key('Content-Disposition'):
print "You've exceeded your quota. Continue tomorrow..."
sys.exit(0)

if r.info().get('Content-Encoding') == 'gzip':
buf = StringIO( r.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
else:
data = r.read()

myFile = open('trends_%s.csv' % '_'.join(['%s-%s' % (key, value) for (key, value) in kwargs.items()]), 'w')
myFile.write(data)
myFile.close()

最佳答案

虽然我不懂python,但我可能有解决方案。我目前正在 C# 中做同样的事情,虽然我没有获得 .csv 文件,但我通过代码创建了一个自定义 URL,然后下载该 HTML 并保存到文本文件(也是通过代码)。在此 HTML 中(第 12 行)包含创建 Google 趋势上使用的图表所需的所有信息。然而,其中有大量不必要的文本需要删除。但无论哪种方式,你最终都会得到相同的结果。谷歌趋势数据。我在这里发布了我的问题的更详细答案:

Downloading .csv file from Google Trends

关于google-trends - 自动从 Google Trends 中提取 csv 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14772235/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com