gpt4 book ai didi

python - 使用输入按钮处理网站上的分页

转载 作者:太空宇宙 更新时间:2023-11-04 00:11:37 25 4
gpt4 key购买 nike

尝试使用 selenium 抓取该网站。

我的代码可以正常工作,但它目前只抓取了第一页。该页面使用输入按钮作为浏览页面的一种方式,所以我想一个接一个地单击每个按钮,但它不起作用,有没有人有任何其他方式来处理这种类型的分页导航?

import requests
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options

options = Options()
# options.add_argument('--headless')
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options,
executable_path=r'/Users/liban/Downloads/chromedriver')

url = 'http://www.boston.gov.uk/index.aspx?articleid=6207&ShowAdvancedSearch=true'
driver.get(url)


def get_Data():
data = []
divs = driver.find_element_by_xpath('//*[@id="content"]/form').find_elements_by_tag_name('div')
for div in divs:
app_number = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/h4/a').text
address = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/p[5]').text
status = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/p[1]/strong').text
link = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/h4/a').get_attribute("href")
proposals = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/div[1]/p[3]').text

data.append({"caseRef": app_number, "propDesc": proposals, "address": address, "caseUrl": link, "status": status})
print(data)
return data

def navigation():
data = []
total_inputs = driver.find_element_by_xpath('//div[ contains( concat( " ", normalize-space( @class ), " "), " grid_13 ") ]/form/table/tbody/tr/td/input')
for input in total_inputs:
input.click()
data.extend(get_Data())

def main():
all_data = []
select = Select(driver.find_element_by_xpath('//*[@id="DatePresets"]'))
select.select_by_index(7)
search_by = driver.find_element_by_xpath('//*[@id="radio-ReceivedDate"]')
search_by.click()
show = Select(driver.find_element_by_xpath('//*[@id="ResultSize"]'))
show.select_by_index(4)
search_button = driver.find_element_by_xpath('//*[@id="content"]/form/input[3]')
search_button.click()

all_data.extend(navigation())

if __name__ == "__main__":
main()

网站如何处理分页:

  <td align="center">
<input type="submit" class="pageNumberButton selected" name="searchResults_Page" value="1" disabled="disabled"/>
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="2" />
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="3" />
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="4" />
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="5" />
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="6" />
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="7" />
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="8" />
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="9" />
<input type="submit" class="pageNumberButton " name="searchResults_Page" value="10" />
</td>

手动步骤:

  1. 选择预设日期 = '上个月'
  2. 搜索依据 = '两个日期'
  3. 点击搜索
  4. 抓取每个页面后转到下一页,依此类推,直到没有更多页面,然后返回原始 URL。

最佳答案

尝试:find_elements_by_xpath 而不是 find_element_by_xpath 它将返回列表。

PS:我没有在本地试过你的代码,但你提到的错误是我提到的解决方案。

关于python - 使用输入按钮处理网站上的分页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52364188/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com