- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我尝试使用 scrapy-playwright
从动态加载的 javascript 网站中提取一些数据,但我在一开始就卡住了。
我在 settings.py 文件中遇到的问题如下:
#剧作家
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
#TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
#ASYNCIO_EVENT_LOOP = 'uvloop.Loop'
当我注入(inject)以下 scrapy-playwright 处理程序时:
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
然后我得到:
scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor
(twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)
当我注入(inject) TWISTED_REACTOR 时
TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
然后我得到:
raise TypeError(
TypeError: SelectorEventLoop required, instead got: <ProactorEventLoop running=False closed=False debug=False>
毕竟,当我注入(inject) ASYNCIO_EVENT_LOOP 时
然后我得到:
ModuleNotFoundError: No module named 'uvloop'
最后,安装'uvloop'失败
pip install uvloop
import scrapy
from scrapy_playwright.page import PageCoroutine
class ProductSpider(scrapy.Spider):
name = 'product'
def start_requests(self):
yield scrapy.Request(
'https://shoppable-campaign-demo.netlify.app/#/',
meta={
'playwright': True,
'playwright_include_page': True,
'playwright_page_coroutines': [
PageCoroutine("wait_for_selector", "div#productListing"),
]
}
)
async def parse(self, response):
pass
# parses content
最佳答案
scrapy_playwright
的开发人员建议将 DOWNLOAD_HANDLERS
和 TWISTER_REACTOR
实例化到您的脚本中。
提供了类似的评论here
这是一个实现这个的工作脚本:
import scrapy
from scrapy_playwright.page import PageCoroutine
from scrapy.crawler import CrawlerProcess
class ProductSpider(scrapy.Spider):
name = 'product'
def start_requests(self):
yield scrapy.Request(
'https://shoppable-campaign-demo.netlify.app/#/',
callback = self.parse,
meta={
'playwright': True,
'playwright_include_page': True,
'playwright_page_coroutines': [
PageCoroutine("wait_for_selector", "div#productListing"),
]
}
)
async def parse(self, response):
container = response.xpath("(//div[@class='col-md-6'])[1]")
for items in container:
yield {
'products':items.xpath("(//h3[@class='card-title'])[1]//text()").get()
}
# parses content
if __name__ == "__main__":
process = CrawlerProcess(
settings={
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
"DOWNLOAD_HANDLERS": {
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
},
"CONCURRENT_REQUESTS": 32,
"FEED_URI":'Products.jl',
"FEED_FORMAT":'jsonlines',
}
)
process.crawl(ProductSpider)
process.start()
我们得到以下输出:
{'products': 'Oxford Loafers'}
关于python - scrapy-playwright :- Downloader/handlers: scrapy. exceptions.NotSupported: AsyncioSelectorReactor,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70275302/
有推荐的Attribute在 .Net Framework 中将代码标记为“不支持”? 到目前为止我一直在使用 ObsoleteAttribute但它并不总是完全准确。例如,目前我正在编写一个托管 F
这个问题已经有答案了: Maven Compilation Error: (use -source 7 or higher to enable diamond operator) (4 个回答) 已关
StreamReader fr = new StreamReader("D:\\test\\" + item); 这就是我想要做的。 Item 是一个带有文件名的字符串。孔串是这样的 "D:\\tes
var entity = from document in db.Context.DocumentEntity join product in db.Context.ProductEnti
我一直在尝试使用azure私有(private)注册表部署docker,我一直在遵循下面的教程,当我尝试az acr login -n 命令azure shell不断给我这个命令需要运行docker守
我一直在尝试使用azure私有(private)注册表部署docker,我一直在遵循下面的教程,当我尝试az acr login -n 命令azure shell不断给我这个命令需要运行docker守
当我打电话 UNUserNotificationCenter.current().getNotificationSettings alertSetting 返回“不支持”。这是回调。 Notifica
我尝试使用 scrapy-playwright 从动态加载的 javascript 网站中提取一些数据,但我在一开始就卡住了。 我在 settings.py 文件中遇到的问题如下: #剧作家 DOW
我正在尝试删除网站,但在运行脚本时出现以下错误 'NotSupported: Unsupported URL scheme '': no handler available for that sche
我是一名优秀的程序员,十分优秀!