gpt4 book ai didi

python - 使用 Selenium 抓取动态内容

转载 作者:太空宇宙 更新时间:2023-11-03 21:29:47 25 4
gpt4 key购买 nike

我正在尝试学习如何从网络上抓取内容,并且在之前的尝试中成功地发现了我认为是动态内容的内容,但结果发现这些内容被隐藏在源代码中显示的标签下。感谢这里的社区,我能够轻松地使用 Beautiful Soup 和 pandas 获取数据。

对于我的下一个挑战,我试图从实际动态生成的站点获取数据,并且这些数据似乎不在页面源中。我的代码如下,虽然我可以拉出保存动态内容的容器,但它是空的。当我使用开发人员工具查看时,我可以看到 class="event 2-2-1 row"的 div 包含一些数据。但每次我尝试访问这些标签时,都找不到它们。

有人可以帮我指出正确的道路吗?我搜索了这个论坛,但还没有找到答案。

from selenium import webdriver
import re
from bs4 import BeautifulSoup


start_url = "https://www.tissottiming.com/Live/Index?id=0003100005010105FFFFFFFFFFFFFFF2&style=Tissot"#input("Enter the results URL: ")
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get(start_url)
content = driver.find_element_by_xpath('//*[@id="container-fluid"]')
print(content)

这是我从 print 语句中得到的内容。

<selenium.webdriver.remote.webelement.WebElement (session="99ca6419fd181c0bdd39797e20c739df", element="0.7688034456332402-1")>

最佳答案

我设法使用以下代码解析动态内容:

from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0

start_url = "https://www.tissottiming.com/Live/Index?id=0003100005010105FFFFFFFFFFFFFFF2&style=Tissot"#input("Enter the results URL: ")
driver = webdriver.Chrome()
driver.get(start_url)
WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.XPATH, "//div[@class='heat 2_2_1_1_1 row']")))

x = driver.find_element_by_xpath("//div[@class='heat 2_2_1_1_1 row']")
print(x.get_attribute('innerHTML'))
<小时/>
<div class="name row"><span>HEAT 1</span></div><div class="heatsheaders row rowtitle"><div class="col-xs-05 rank">Rank</div><div class="col-xs-05 bib">Bib</div><div class="col-xs-3 longname">Name</div><div class="col-xs-1 nation">Nat</div><div class="col-xs-5 run_title"><div class="RunName col-xs-4">1ST RACE</div><div class="RunName col-xs-4">2ND RACE</div><div class="RunName col-xs-4">DECIDER</div></div><div class="col-xs-1 qualified"></div><div class="col-xs-1 points">Time</div></div><div class="rider 2_2_1_1_1_1_1 row" data-sortorder="1" data-inter-pos-x="2" data-inter-pos-y="342" data-final-pos-x="2" data-final-pos-y="342" style="transition: all 600ms ease 0ms, opacity 600ms linear; display: block; transform: translate(0px, 0px);" data-bound="true"><div class="rank col-xs-05"><span>1</span></div><div class="bib col-xs-05"><span>52</span></div><div class="longname col-xs-3"><span>GLAETZER Matthew</span><div class="teamname "><span>AUSTRALIA</span></div></div><div class="nation col-xs-1"><span><div class="img_flag">AUS<img src="/Content/images/flags/AUS.png" alt="AUS national flag"></div></span></div><div class="run_group col-xs-5"><div class="run 2_2_1_1_1_1_1_1_1 col-xs-4"><div class="time row"><span>10.218</span></div><div class="points row"><span>70,464</span></div></div><div class="run 2_2_1_1_1_1_1_1_2 col-xs-4"><div class="time row"><span>0.000</span></div><div class="points row"><span>0,000</span></div></div><div class="run 2_2_1_1_1_1_1_1_3 col-xs-4"><div class="time row"><span></span></div><div class="points row"><span></span></div></div></div><div class="qualified col-xs-1"><span>QG</span></div><div class="points col-xs-1"><span></span></div></div><div class="rider 2_2_1_1_1_1_2 row" data-sortorder="2" data-inter-pos-x="2" data-inter-pos-y="422" data-final-pos-x="2" data-final-pos-y="422" style="transition: all 600ms ease 0ms, opacity 600ms linear; display: block; transform: translate(0px, 0px);" data-bound="true"><div class="rank col-xs-05"><span>2</span></div><div class="bib col-xs-05"><span>53</span></div><div class="longname col-xs-3"><span>HART Nathan</span><div class="teamname "><span>AUSTRALIA</span></div></div><div class="nation col-xs-1"><span><div class="img_flag">AUS<img src="/Content/images/flags/AUS.png" alt="AUS national flag"></div></span></div><div class="run_group col-xs-5"><div class="run 2_2_1_1_1_1_2_1_1 col-xs-4"><div class="time row"><span>+0.028</span></div><div class="points row"><span></span></div></div><div class="run 2_2_1_1_1_1_2_1_2 col-xs-4"><div class="time row"><span>+0.000</span></div><div class="points row"><span></span></div></div><div class="run 2_2_1_1_1_1_2_1_3 col-xs-4"><div class="time row"><span></span></div><div class="points row"><span></span></div></div></div><div class="qualified col-xs-1"><span>QB</span></div><div class="points col-xs-1"><span></span></div></div>

关于python - 使用 Selenium 抓取动态内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53599121/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com