gpt4 book ai didi

Python + Scrapy + JSON + XPath : How to scrape JSON data with Scrapy

转载 作者:行者123 更新时间:2023-12-01 08:55:48 27 4
gpt4 key购买 nike

我知道如何使用 Scrapy 获取 HTML 数据点的 XPATH。但我必须抓取该网站上该页面的所有 URL(起始 URL),这些 URL 以 JSON 格式编写:

https://highape.com/bangalore/all-events

查看源代码:https://highape.com/bangalore/all-events

我通常这样写:

def parse(self, response):
events = response.xpath('**What To Write Here?**').extract()

for event in events:
absolute_url = response.urljoin(event)
yield Request(absolute_url, callback = self.parse_event)

请告诉我应该在“这里写什么?”中写什么?部分。

enter image description here

最佳答案

查看网址的页面源代码,然后复制第 76 - 9045 行并在本地驱动器中另存为 data.json,然后使用此代码...

import json
from bs4 import BeautifulSoup
import requests
req = requests.get('https://highape.com/bangalore/all-events')
soup = BeautifulSoup(req.content, 'html.parser')
js = soup.find_all('script')[5].text
data = json.loads(js, strict=False)
for i in data:
url = i['url']
print(url)
##callback with scrapy

关于Python + Scrapy + JSON + XPath : How to scrape JSON data with Scrapy,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52779161/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com