gpt4 book ai didi

python - 使用 scrapysplash 获取响应体

转载 作者:行者123 更新时间:2023-12-02 05:42:14 25 4
gpt4 key购买 nike

enter image description here

我正在使用 scrapy 1.6 和splash 3.2 我有:

import scrapy
import random
from scrapy_splash import SplashRequest
from scrapy.utils.response import open_in_browser
from scrapy.linkextractors import LinkExtractor

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0'

class MySpider(scrapy.Spider):


start_urls = ["http://yahoo.com"]
name = 'mytest'

def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, self.parse, endpoint='render.html', args={'wait': 2.5},headers={'User-Agent': USER_AGENT,'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'})

def parse(self, response):
# response.body is a result of render.html call; it
# contains HTML processed by a browser.
# from scrapy.http.response.html import HtmlResponse
# ht = HtmlResponse('jj')
# ht.body.replace =response
open_in_browser(response)
return None

问题是,当我尝试在浏览器中打开响应时,我却在记事本中打开它。

正在查看https://splash.readthedocs.io/en/stable/scripting-response-object.html 。如何激活response.body以便我可以在浏览器中打开响应(我希望能够使用浏览器开发工具来获取xpath)?

最佳答案

我让它工作:

def parse(self, response):
# response.body is a result of render.html call; it
# contains HTML processed by a browser.
from scrapy.http.response.html import HtmlResponse
ht = HtmlResponse(url=response.url, body=response.body, encoding="utf-8", request=response.request)
open_in_browser(ht)
return None

关于python - 使用 scrapysplash 获取响应体,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56744120/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com