gpt4 book ai didi

python - Scrapy 尝试在 python 中抓取企业名称 href

转载 作者:行者123 更新时间:2023-12-01 01:37:06 25 4
gpt4 key购买 nike

我正在尝试在黄页中抓取每个企业的href。我对使用 scrapy 很陌生,这是我的第二天。我正在使用请求来获取实际的网址以使用蜘蛛进行搜索。我的代码做错了什么?我希望最终让 scrapy 去每个企业并抓取其地址和其他信息。

# -*- coding: utf-8 -*-
import scrapy
import requests

search = "Plumbers"
location = "Hammond, LA"
url = "https://www.yellowpages.com/search"
q = {'search_terms': search, 'geo_location_terms': location}
page = requests.get(url, params=q)
page = page.url

class YellowpagesSpider(scrapy.Spider):
name = 'quotes'
allowed_domains = ['yellowpages.com']
start_urls = [page]

def parse(self, response):
self.log("I just visited: " + response.url)
items = response.css('span.text::text')
for items in items:
print(items)

最佳答案

要获取名称,请使用:

response.css('a[class=business-name]::text')

要获取href,请使用:

response.css('a[class=business-name]::attr(href)')

在最终的调用中,如下所示:

    for bas in response.css('a[class=business-name]'):
item = { 'name' : bas.css('a[class=business-name]::text').extract_first(),
'url' : bas.css('a[class=business-name]::attr(href)').extract_first() }
yield item

结果:

2018-09-13 04:12:49 [quotes] DEBUG: I just visited: https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Hammond%2C+LA
2018-09-13 04:12:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Hammond%2C+LA>
{'name': 'Roto-Rooter Plumbing & Water Cleanup', 'url': '/new-orleans-la/mip/roto-rooter-plumbing-water-cleanup-21804163?lid=149760174'}
2018-09-13 04:12:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.yellowpages.com/search?search_terms=Plumbers&geo_location_terms=Hammond%2C+LA>
{'name': "AJ's Plumbing And Heating Inc", 'url': '/new-orleans-la/mip/ajs-plumbing-and-heating-inc-16078566?lid=1001789407686'}
...

关于python - Scrapy 尝试在 python 中抓取企业名称 href,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52298479/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com