gpt4 book ai didi

python - 使用 cookie 从 Google Scholar(bibtex) 导入数据

转载 作者:行者123 更新时间:2023-12-01 05:05:08 25 4
gpt4 key购买 nike

以下是代码:

import cookielib
import urllib2
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0'}
url='http://scholar.google.co.in/scholar_setprefs?sciifh=1&scisig=AAGBfm0AAAAAU9jcmEN2h2yuBuZqQK8Es5dQG3ksjutw&inststart=0&num=10&scis=yes&scisf=4&hl=en&lang=all&instq=&save='

filename = "cookies.txt"
request = urllib2.Request(url, None, headers)
cookies = cookielib.MozillaCookieJar(filename, None, None)
cookies.load()
cookie_handler= urllib2.HTTPCookieProcessor(cookies)
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)
response = opener.open(request)
print response.read()

输出错误:

C:\Python27\lib\_MozillaCookieJar.py:109: UserWarning: cookielib bug!
Traceback (most recent call last):
File "C:\Python27\lib\_MozillaCookieJar.py", line 71, in _really_load
line.split("\t")
ValueError: need more than 1 value to unpack

_warn_unhandled_exception()
Traceback (most recent call last):
File "C:\Users\new user\Desktop\pythonprac\working\googlescholar.py", line 10, in <module>
cookies.load()
File "C:\Python27\lib\cookielib.py", line 1763, in load
self._really_load(f, filename, ignore_discard, ignore_expires)
File "C:\Python27\lib\_MozillaCookieJar.py", line 111, in _really_load
(filename, line))
cookielib.LoadError: invalid Netscape format cookies file 'cookies.txt': '.scholar.google.com TRUE / FALSE 2147483647 GSP ID=353e8f974d766dcd:CF=2'

这段代码来自网络,我正在尝试将谷歌学者bibtex数据中的数据下载到txt文件中。为此,我需要将用户设置保存到 cookie 中。我正在将数据写入cookie.txt。但我收到上述错误。请指导如何处理此 cookie 错误以及如何使用 cookie 保存 google.scolar.com 的用户定义的首选项。

最佳答案

我可以建议使用另一组库吗?

from bs4 import BeautifulSoup
import requests

url= 'http://scholar.google.co.in/scholar_setprefs?sciifh=1&' +\
'scisig=AAGBfm0AAAAAU9jcmEN2h2yuBuZqQK8Es5dQG3ksjutw' +\
'&inststart=0&num=10&scis=yes&scisf=4&hl=en&lang=all&instq=&save='

page = requests.get(url)
cookies = page.cookies

page = requests.get(url, cookies=cookies)

print page.content

使用cookies = page.cookies,我检索cookie并将其保存到cookies变量中。我重新请求传递该变量的同一页面。如果您有 cookies.txt 文件,则可以将其作为字典加载

<小时/>

如果您想使用标准库 urllib2 和 cookielib 来执行此操作,请确保 cookies.txt 文件中的第一行是

# Netscape HTTP Cookie File

否则 cookielib 将不会加载它:https://stackoverflow.com/a/11536599/1688590

关于python - 使用 cookie 从 Google Scholar(bibtex) 导入数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25197771/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com