gpt4 book ai didi

javascript - python scrapy : scraping dynamic information

转载 作者:行者123 更新时间:2023-11-28 08:11:59 25 4
gpt4 key购买 nike

我正在尝试从 http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx 中删除信息。我想做以下事情: - 从页面顶部的下拉列表中选择“牙医” - 点击搜索 - 请注意,页面底部的信息使用 javascript 动态更改 - 单击从业者姓名的超链接,会出现一个弹出窗口 - 我想将每个从业者的所有信息保存在 json/csv 文件中 -我还想要页面底部链接的其他页面上的信息,这些信息会更改保存 div 中的信息。

我对 scrapy 很陌生,只是研究了 selenium,因为我在某处读到你需要 selenium 来获取动态信息

所以我在 scrapy 应用程序中使用 Selenium。不确定这是否正确。我不知道最好的方法是什么。到目前为止我有以下代码。我收到此错误 sch_spider.py",

line 21, in DmozSpider
all_options = element.find_elements_by_tag_name("option")
NameError: name 'element' is not defined

sch_spider.py

from scrapy.spider import Spider
from scrapy.selector import Selector
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from scrapytutorial.items import SchItem
from selenium.webdriver.support.ui import Select

class DmozSpider(Spider):
name = "sch"

driver = webdriver.Firefox()
driver.get("http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx")
select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType'))
all_options = element.find_elements_by_tag_name("option")

for option in all_options:
if option.get_attribute("value") == "4": #Dentist
option.click()
ends
break

driver.find_element_by_name("ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$Searchbtn").click()


def parse(self, response):

all_docs = element.find_elements_by_tag_name("td")
for name in all_docs:
name.click()
alert = driver.switch_to_alert()
sel = Selector(response)
ma = sel.xpath('//table')
items = []
for site in ma:
item = SchItem()
item['name'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Name']/text()").extract()
item['profession'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Profession']/text()").extract()
item['scope_of_practise'] = site.xpath("//span[@id='PractitionerDetails1_lbl_sop']/text()").extract()
item['instituition'] = site.xpath("//span[@id='PractitionerDetails1_lbl_institution']/text()").extract()
item['license'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceNo']/text()").extract()
item['license_expiry_date'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceExpiry']/text()").extract()
item['qualification'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Qualification']/text()").extract()

items.append(item)
return items

项目.py

from scrapy.item import Item, Field

class SchItem(Item):

name = Field()
profession = Field()
scope_of_practise = Field()
instituition = Field()
license = Field()
license_expiry_date = Field()
qualification = Field()

最佳答案

您不应该将下面代码中的 element.find_elements .. 更改为 select.find_element..

  select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType'))
all_options = element.find_elements_by_tag_name("option")

或者更确切地说不应该使用 select.options ?

关于javascript - python scrapy : scraping dynamic information,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24080204/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com