gpt4 book ai didi

python - 如何抓取付费专区后面的网站

转载 作者:行者123 更新时间:2023-11-30 23:28:05 25 4
gpt4 key购买 nike

我想抓取本地报纸的新闻文章。文件位于付费墙后面,并且我有一个付费帐户,我将如何自动输入我的凭据?

最佳答案

使用 Scrapy(参见 Tutorial )使用 FormRequest 通过 HTTP POST 发送数据(参见 Example )

# Install scrapy : pip install Scrapy
# Create structure with : scrapy startproject my_project
# Create ./my_project/spiders/my_spider.py
# use something like this inside my_spider.py:

class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']

def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]

def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return

# continue scraping with authenticated session...

关于python - 如何抓取付费专区后面的网站,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21807914/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com