gpt4 book ai didi

python - Selenium 在一段时间后停止工作

转载 作者:太空宇宙 更新时间:2023-11-03 21:44:02 26 4
gpt4 key购买 nike

我想提取9000页的数据。提取大约 1700 页后,当我希望它继续时,它停止工作,它从头开始并在大约 1000 页后执行。在此代码中,我必须手动选择区域。如何废弃总页数数据? chromedriver( session )有时间限制吗?

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import csv

url = "https://www.mcg.gov.in/default1.aspx?HTT=B"

driver = webdriver.Chrome(executable_path = 'D:/Python_module/chromedriver_win32/chromedriver.exe')
driver.get(url)
time.sleep(4)

driver.find_element_by_xpath('//*[@id="CphContentPlaceHolderbody_mcg"]/section/div[1]/div/a[1]/div').click()
time.sleep(2)

driver.find_element_by_xpath('//*[@id="CphContentPlaceHolderbody_lnkViewSurveyDataBtn"]').click()
time.sleep(4)

driver.find_element_by_xpath('//*[@id="CphContentPlaceHolderbody_PropertySearchControl1_btnSearch"]').click()
time.sleep(4)


#-----------------This is for extracting the data of page-1-----------------------------------

driver.find_element_by_xpath('//*[@id="form"]/div[4]/div[11]/table/tbody/tr/td[12]/a').click()
time.sleep(1)
print("If you are in second page then the code is fine.")
soup = BeautifulSoup(driver.page_source, 'html.parser')
current_url = driver.current_url
table = soup.find('table', {'class':'table table-hover table-bordered'})

#divs = soup.find('div', {'id':'CphContentPlaceHolderbody_PropertySearchControl1_upTop'})
print(table)

for row in table.findAll('tr')[1:]:
raw_data = row.findAll('td')[0:]
property_id = raw_data[0].text
ward_no = raw_data[1].text
owner = raw_data[2].text
print(owner)

page_no = page_no+1

try:
while True:
driver.find_element_by_xpath('//*[@id="form"]/div[4]/div[11]/table/tbody/tr/td[14]/a').click()
time.sleep(1)
print("If you are in second page then the code is fine.")

soup = BeautifulSoup(driver.page_source, 'html.parser')
current_url = driver.current_url
table = soup.find('table', {'class':'table table-hover table-bordered'})
#divs = soup.find('div', {'id':'CphContentPlaceHolderbody_PropertySearchControl1_upTop'})
#print(table)

for row in table.findAll('tr')[1:]:
raw_data = row.findAll('td')[0:]
property_id = raw_data[0].text
ward_no = raw_data[1].text
owner = raw_data[2].text
print(owner)
page_no = page_no+1
except:
while True:
driver.find_element_by_xpath('//*[@id="form"]/div[4]/div[11]/table/tbody/tr/td[19]/a').click()
time.sleep(1)
print("If you are in second page then the code is fine.")

soup = BeautifulSoup(driver.page_source, 'html.parser')
current_url = driver.current_url
table = soup.find('table', {'class':'table table-hover table-bordered'})
#divs = soup.find('div', {'id':'CphContentPlaceHolderbody_PropertySearchControl1_upTop'})
#print(table)

for row in table.findAll('tr')[1:]:
raw_data = row.findAll('td')[0:]
owner = raw_data[2].text
print(owner)
page_no = page_no+1

print("Successfully scrap the data")
driver.quit()

它给出以下错误:

Traceback (most recent call last):
File "D:\C Folder\program\scrap\scrap_mcg.py", line 64, in <module>
element = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="form"]/div[4]/div[11]/table/tbody/tr/td[14]/a')))
File "C:\Users\asn\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\C Folder\program\scrap\scrap_mcg.py", line 90, in <module>
soup = BeautifulSoup(driver.page_source, 'html.parser')
File "C:\Users\asn\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 670, in page_source
return self.execute(Command.GET_PAGE_SOURCE)['value']
File "C:\Users\asn\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "C:\Users\asn\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: no such window: target window already closed
from unknown error: web view not found
(Session info: chrome=69.0.3497.100)
(Driver info: chromedriver=2.37.544315 (730aa6a5fdba159ac9f4c1e8cbc59bf1b5ce12b7),platform=Windows NT 6.1.7601 SP1 x86_64)

最佳答案

title=" Next to Page 336">... is not clickable at point (988, 604). Other element would receive the click: ...

正如错误所述,该元素无法接受点击。可能有不同数量的东西阻挡了被点击的元素。

  • 元素尚不可见。
  • 由于页面加载延迟,元素尚未完全加载。
  • 元素正在被包装,包装器正在阻止点击。

调试问题的几种方法:

  • 尝试等待。例如:在执行 click 方法之后 waitforelementtobeclickable 。
  • 点击带有javascript代码的按钮,有很多例子周围。
  • 在页面上检查 ajax 加载程序或其他自定义加载程序。
  • 更改/改进定位器。
  • 使用异常处理程序,尝试{} catch{} 再次执行点击或其他事情,以便它可以恢复而不是从头开始。

如果还有其他需要我们帮助的地方,请告诉我们。

关于python - Selenium 在一段时间后停止工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52627163/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com