gpt4 book ai didi

python-3.x - SELENIUM:element.text 很慢,我不知道为什么

转载 作者:行者123 更新时间:2023-12-02 03:20:14 25 4
gpt4 key购买 nike

driver.get('https://nameberry.com/popular_names/US')
boys_names = driver.find_elements_by_css_selector("""tr.even>.boys""")
girls_names = driver.find_elements_by_css_selector("""tr.even>.girls""")
# so this goes quickly

def list_gen(ls):
hugo = []
for i in ls:
hugo.append(i.text)
return hugo


i = time()
boys_names = list_gen(boys_names) # takes each <a> tag found before contained in boys_names and creates a list
# of names by taking everything CONTAINED (NOT attributes) between the opening and closing tag <a>
e = time()
print(e-i) # gives ~ 50 sec

i = time()
girls_names = list_gen(girls_names) # same thing but with girl names
e = time()
print(e-i) # gives ~ 80 sec
# those timings are consistent even though no. of boys and girls is the same
# which is also weird
# no. is 1000 btw so that quite alot

所以基本上我很困惑为什么需要这么长时间。我得出的结论是,由于某种原因,element.text 花费了最多的时间。有没有办法在不导入其他模块的情况下加快速度?

最佳答案

我认为您的代码花费这么长时间的原因是因为 list_gen 中的循环在循环时向网页发送了一堆请求。如果您在循环中设置断点,并在开发工具运行时查看浏览器的网络页面,您将看到大量请求从循环开始。我认为这是因为当 Selenium 向下滚动时页面正在加载新元素。据我所知,如果你想让它更快,你应该使用其他东西。我的建议是使用美丽的汤。

from selenium import webdriver  
from time import time
from bs4 import BeautifulSoup

driver = webdriver.Chrome()

i = time()
driver.get('https://nameberry.com/popular_names/US')
soup = BeautifulSoup(driver.page_source, 'html5lib')

boys_names = [x.getText() for x in soup.find_all("td", {"class", "boys"})]
girls_names = [x.getText() for x in soup.find_all("td", {"class", "girls"})]

e = time()
print(e - i) # gives ~ 14 sec for me

这会立即获取网页的整个源代码并对其进行解析,而不必使用 css 选择器返回的 webdriver 对象列表。

如果您不使用 selenium 浏览器执行其他任何操作,而只想获取名称,则可以使用请求来更快地获取页面源,因为您不需要加载 selenium 浏览器。

import requests  

i = time()

response = requests.get('https://nameberry.com/popular_names/US')
soup = BeautifulSoup(response.content, 'html5lib')
boys_names = [x.getText() for x in soup.find_all("td", {"class", "boys"})]
girls_names = [x.getText() for x in soup.find_all("td", {"class", "girls"})]

e = time()
print(e - i) # gives ~ 3.2 sec

关于python-3.x - SELENIUM:element.text 很慢,我不知道为什么,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55054467/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com