gpt4 book ai didi

python - 如何抓取 JSON 网页

转载 作者:行者123 更新时间:2023-12-01 07:39:10 25 4
gpt4 key购买 nike

嘿,所以我有一些抓取 html 的经验,但没有抓取 json 的经验,所以我需要使用 scrapy 抓取以下网页,http://www.starcitygames.com/buylist/search?search-type=category&id=5061 ,我在网上找到了一个教程,使用 scrapy 和 jmspath 从网络上抓取 json 数据。我得到了教程,但我试图改变它以与我的网站一起使用,但没有成功。没有错误,但不返回任何数据。任何帮助将不胜感激!

项目.py

import scrapy


class NameItem(scrapy.Item):
"""User item definition for jsonplaceholder /LoginSpider endpoint."""
name = scrapy.Field()
condition = scrapy.Field()
price = scrapy.Field()
rarity = scrapy.Field()

登录蜘蛛.py

import scrapy
import json
from scrapy.spiders import Spider
from scrapy_splash import SplashRequest
from ..items import NameItem
from scrapy.loader import ItemLoader
from scrapy.loader.processors import Join, MapCompose, SelectJmes


class UserSpider(scrapy.Spider):
"""Spider to scrape `http://www.starcitygames.com/buylist/search?search-type=category&id=5061`."""
name = 'LoginSpider'
allowed_domains = ['http://www.starcitygames.com/buylist/search?search-type=category&id=5061']
start_urls = ['http://www.starcitygames.com/buylist/search?search-type=category&id=5061']
# dictionary to map UserItem fields to Jmes query paths
jmes_paths = {
'name': 'name',
'condition': 'condition',
'price': 'price',
'rarity': 'rarity',
}

def parse(self, response):
jsonresponse = json.loads(response.body_as_unicode())
for user in jsonresponse:
loader = ItemLoader(item=NameItem()) # create an ItemLoader to populate a NameItem
loader.default_input_processor = MapCompose(str) # apply str conversion on each value
loader.default_output_processor = Join(' ')
for (field, path) in self.jmes_paths.items():
loader.add_value(field, SelectJmes(path)(user))
yield loader.load_item()

最佳答案

此网址的响应http://www.starcitygames.com/buylist/search?search-type=category&id=5061has 3个级别:

  1. “好的”
  2. “搜索”
  3. 'results' ## 这包含数据

并且结果键有多个值,您应该迭代这些值。值里面是数据。尝试一下这段代码,希望对您有所帮助。

这是模块 items.py

class SoResponseItem(scrapy.Item):
name = scrapy.Field()
condition = scrapy.Field()
price = scrapy.Field()
rarity = scrapy.Field()

这是蜘蛛

import scrapy
import json
from SO_response.items import SoResponseItem

class LoginspiderSpider(scrapy.Spider):
name = 'LoginSpider'
allowed_domains = ['www.starcitygames.com']
url = 'http://www.starcitygames.com/'

def start_requests(self):
yield scrapy.Request(url=self.url, callback=self.parse)

def parse(self, response):
url = response.urljoin('buylist/search?search-type=category&id=5061')
yield scrapy.Request(url=url, callback=self.parse_data)

def parse_data(self, response):
jsonreponse = json.loads(response.body)
for result in jsonreponse['results']:
for index in range(len(result)):
items = SoResponseItem()
items['name'] = result[index]['name']
items['condition'] = result[index]['condition']
items['price'] = result[index]['price']
items['rarity'] = result[index]['rarity']
yield items

在你的 shell 中尝试:scrapy爬行-o jmes.json

关于python - 如何抓取 JSON 网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56811879/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com