gpt4 book ai didi

pandas - 如何在使用 selenium 和 requests 的数千个下载程序中包含 try 和 Exceptions 测试?

转载 作者:行者123 更新时间:2023-12-01 18:28:38 24 4
gpt4 key购买 nike

我有一个程序可以在各个网站上下载照片。每个 url 均由代码形成在地址末尾,可在数据帧中访问这些代码。在 8,583 行的数据框中

这些网站有javascript,所以我使用selenium来访问照片的src。我用 urllib.request.urlretrieve 下载它

照片网站示例:http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/PB/150000608817

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup
import time
import urllib.request, urllib.parse, urllib.error

# Root URL of the site that is accessed to fetch the photo link
url_raiz = 'http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/'

# Accesses the dataframe that has the "sequencial" type codes
candidatos = pd.read_excel('candidatos_2018.xlsx',sheet_name='Sheet1', converters={'sequencial': lambda x: str(x), 'cpf': lambda x: str(x),'numero_urna': lambda x: str(x)})

# Function that opens each page and takes the link from the photo
def pegalink(url):
profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)

browser.get(url)
time.sleep(10)

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()

link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']

return link

# Function that downloads the photo and saves it with the code name "cpf"
def baixa_foto(nome, url):
urllib.request.urlretrieve(url, nome)


# Iteration in the dataframe
for num, row in candidatos.iterrows():
cpf = (row['cpf']).strip()
uf = (row['uf']).strip()
print(cpf)
print("-/-")
sequencial = (row['sequencial']).strip()

# Creates full page address
url = url_raiz + uf + '/' + sequencial

link_foto = pegalink(url)

baixa_foto(cpf, link_foto)

请我查看指导:

  • 放置一个 try-Exception 类型来等待页面加载(我在读取 src 时遇到错误 - 经过多次点击后,该网站需要十多秒才能加载)

  • 我想记录所有可能的错误 - 在文件或数据框中 - 写下出现错误的“顺序”代码并继续程序

有人知道怎么做吗?下面的指南非常有用,但我无法继续前进

我把我使用的部分数据和程序放在一个文件夹中,如果你想看的话:https://drive.google.com/drive/folders/1lAnODBgC5ZUDINzGWMcvXKTzU7tVZXsj?usp=sharing

最佳答案

将您的代码放入:

   try:
WebDriverWait(browser, 30).until(wait_for(page_has_loaded))
# here goes your code
except: Exception
print "This is an unexpected condition!"

对于 waitForPageToLoad :

def page_has_loaded():
page_state = browser.execute_script(
'return document.readyState;'
)
return page_state == 'complete'

上面的30是以秒为单位的时间。您可以根据需要进行调整。

方法 2:

class wait_for_page_load(object):

def __init__(self, browser):
self.browser = browser

def __enter__(self):
self.old_page = self.browser.find_element_by_tag_name('html')

def page_has_loaded(self):
new_page = self.browser.find_element_by_tag_name('html')
return new_page.id != self.old_page.id

def __exit__(self, *_):
wait_for(self.page_has_loaded)


def pegalink(url):
profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)

browser.get(url)

try:
with wait_for_page_load(browser):
html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']

except Exception:
print ("This is an unexpected condition!")
print("Erro em: ", url)
link = "Erro"

return link

关于pandas - 如何在使用 selenium 和 requests 的数千个下载程序中包含 try 和 Exceptions 测试?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51977310/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com