gpt4 book ai didi

python - 基于href自动访问多个页面抓取数据

转载 作者:太空宇宙 更新时间:2023-11-03 20:48:00 26 4
gpt4 key购买 nike

我在使用 selenium webdriver 和 python 自动化多个页面时遇到问题。在我的代码中,我自动点击的页面最多为 10 页,但在 10 页之后,它将不起作用。我没有点击第 11 页之后的页面。

import urllib.request
from bs4 import BeautifulSoup
import csv
import os
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
import os



url = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?
hDistName=Buldhana'
chrome_path = r'C:/Users/User/AppData/Local/Programs/Python/Python36/Scripts/chromedriver.exe'
d = webdriver.Chrome(executable_path=chrome_path)
d.implicitly_wait(10)
d.get(url)



Select(d.find_element_by_name('ctl00$ContentPlaceHolder5$ddlTaluka')).select_by_value('7')
Select(d.find_element_by_name('ctl00$ContentPlaceHolder5$ddlVillage')).select_by_value('1464')
page = [page.get_attribute('href')for page in
d.find_elements_by_css_selector(
"#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate [href*='Page$']")]

while True:
pages = [page.get_attribute('href')for page in
d.find_elements_by_css_selector(
"#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate
[href*='Page$']")]



for script_page in pages:
d.execute_script(script_page)
#print(script_page)

最佳答案

尝试使用页面索引并检查该页面是否可用,并且您必须单击每个页面并继续。尝试以下代码。

from selenium import webdriver

url = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Buldhana'
chrome_path = r'C:/Users/User/AppData/Local/Programs/Python/Python36/Scripts/chromedriver.exe'
d = webdriver.Chrome(executable_path=chrome_path)
d.implicitly_wait(10)
d.get(url)
Select(d.find_element_by_name('ctl00$ContentPlaceHolder5$ddlTaluka')).select_by_value('7')
Select(d.find_element_by_name('ctl00$ContentPlaceHolder5$ddlVillage')).select_by_value('1464')
i=2
while True:
if len(d.find_elements_by_css_selector("#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate a[href*='Page${}']".format(i)))>0:
print( d.find_elements_by_css_selector("#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate a[href*='Page${}']".format(i))[0].get_attribute('href'))
d.find_elements_by_css_selector("#ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate a[href*='Page${}']".format(i))[0].click()
i+=1
else:
break

输出:因为我是从第 2 页开始的。

<小时/>
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$2')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$3')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$4')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$5')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$6')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$7')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$8')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$9')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$10')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$11')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$12')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$13')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$14')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$15')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$16')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$17')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$18')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$19')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$20')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$21')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$22')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$23')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$24')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$25')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$26')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$27')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$28')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$29')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$30')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$31')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$32')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$33')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$34')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$35')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$36')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$37')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$38')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$39')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$40')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$41')
javascript:__doPostBack('ctl00$ContentPlaceHolder5$grdUrbanSubZoneWiseRate','Page$42')

Process finished with exit code 0

关于python - 基于href自动访问多个页面抓取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56442665/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com