gpt4 book ai didi

python - 使 Selenium 从 .txt 文件中获取 URL 列表

转载 作者:塔克拉玛干 更新时间:2023-11-03 02:02:56 27 4
gpt4 key购买 nike

我有一个代码可以返回 URL 列表的标题。我想通过几种方式充实它。

这是代码:

from pyvirtualdisplay import Display
from time import sleep
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
display = Display(visible=0, size(800,600))
display.start()
urls = ["https://google.com", "https://youtube.com"]
driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver.set_page_load_timeout(60)
for url in urls:
try:
driver.get(url)
print(driver.title)
except TimeoutException as e:
print("Timeout")
driver.quit()

有了这个,我想做以下事情。首先,我不想像那样获取 url 列表,而是希望从 .txt 中获取它们。然后,我还希望它在检查单个 URL 时等待其标题从“正在加载...”变为其他内容,然后打印它变成的内容。为此,我试过这个:

while driver.title == 'Loading...':  
pass
print(driver.title)

这里的问题是,有时候,标题永远不会从“正在加载...”改变,所以程序将永远停在那里。我想要它,这样如果 10 秒后,它还没有改变,它会在打印“标题未加载”后转到列表中的下一个 url。

还有最后一件事我想补充,但我不确定如何补充。它用“print(driver.title)”打印标题。我想在标题后添加一个数字 ("print(driver.title), "number")。这个数字背后的原因是为了知道到目前为止它已经经过了多少个 URL,但它不是从 1 开始的。它会从一个更高的数字开始,比如 50。这意味着在第 5 个 url 上,它应该是“网址标题,55。”我怎样才能做到这一点?

谢谢。

最佳答案

这是根据您的要求更新后的脚本。

from pyvirtualdisplay import Display
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
from datetime import datetime

# this method will check the driver title after the specified interval seconds for a given max time in seconds
def wait_until_browser_loaded(interval, maxTime):
start_time = datetime.now()
elements = []
while (datetime.now() - start_time).seconds < maxTime:
time.sleep(interval)
if driver.title != 'Loading...':
return

display = Display(visible=0, size(800,600))
display.start()
# open and readlines from external input file
urlsFile = open("urls_file_path_goes_here", "r")
urls = urlsFile.readlines() # use this if you want to enter urls in different lines
#urls = urlsFile.read().split(",") # use this if you want to enter comma separated urls.

driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver.set_page_load_timeout(60)
titleAppendNumber = 50
for url in urls:
try:
driver.get(url)

title = driver.title
if title == "Loading...":
wait_until_browser_loaded(5, 10)
if title == "Loading...":
print ("Title Load" + " - " + str(titleAppendNumber))
else:
print (title + " - "+ str(titleAppendNumber))
titleAppendNumber +=1
except TimeoutException as e:
print("Timeout")
driver.quit()

关于python - 使 Selenium 从 .txt 文件中获取 URL 列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55426577/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com