gpt4 book ai didi

python - 基于 Selenium 的抓取代码失败并出现错误 NoSuchElementException

转载 作者:行者123 更新时间:2023-12-01 02:10:24 26 4
gpt4 key购买 nike

我有一个Python代码,可以废弃不同的数据。例如,它从 HTML code 中删除 Website :

<a data-ix="show-popup-on-click" target="_blank" rel="nofollow" href="https://mylink.org/" class="button full w-button" style="transition: all 0.4s ease 0s;">Website</a>

它工作正常,但现在失败并出现错误:

NoSuchElementException: Message: {"errorMessage":"Unable to find element with link text 'Website'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"95","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:40581","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"link text\", \"sessionId\": \"a7a441f0-0f6a-11e8-ad3a-6121f74a30f4\", \"value\": \"Website\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7a441f0-0f6a-11e8-ad3a-6121f74a30f4/element"}} Screenshot: available via screen

这是我的代码:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(link)
driver.implicitly_wait(10)

website = driver.find_element_by_link_text("Website").get_attribute("href")

我做错了什么?

更新:

<div class="column-space w-col w-col-4">
<a data-ix="show-popup-on-click" target="_blank"
rel="nofollow" href="https://example.com/"
class="button full w-button"
style="transition: all 0.4s ease 0s;">Website</a>

<div class="space big"></div>
<a target="_blank" rel="nofollow"
href="https://example.com/storage/b/2/0/2/WhitepaperLive.pdf"
class="button-2 w-button">Whitepaper</a>
<div class="space big"></div>
<a class="button-2 w-condition-invisible w-button">Program</a>
<div class="space big w-condition-invisible"></div>
<div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Token:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">UTC</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Price:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">1 LUC=0,05 USD</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Buy with:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">USD, EUR</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Platform:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">MyPlatform</div>
</div>
</div>
<div class="div-block-4 w-clearfix w-condition-invisible">
<div class="div-block-2">KYC:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">No</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">KYC:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">Yes</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Location:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">Malta</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Can't join:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">USA</div>
</div>
</div>
<div class="space big"></div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Start:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">January 25, 2018</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">End:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">February 5, 2018</div>
</div>
</div>
<div class="space big"></div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">Start2:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">February 12, 2018</div>
</div>
</div>
<div class="div-block-4 w-clearfix">
<div class="div-block-2">End2:</div>
<div class="div-block-5 w-clearfix">
<div class="text-block-12">March 5, 2018</div>
</div>
</div>
<div>
<div class="div-block-33">
<div class="space big"></div>
<div>
<a target="_blank" rel="nofollow"
class="button green full w-condition-invisible w-button">JOIN WHITELIST NOW »</a>
<div class="div-block-34">
<a target="_blank" rel="nofollow" href="http://we-do-not-have-slack.com"
class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/903_slack-symbol.png" alt="ICO Slack link">
</a>
<a target="_blank" rel="nofollow" href="https://twitter.com/live" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/f4000142b091_twitter%20(1).png" width="16" alt="ICO Twitter link">
</a>
<a target="_blank" rel="nofollow" href="https://t.me/live" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/790001798dfe_telegram.png" alt="ICO Telegram link">
</a>
<a target="_blank" rel="nofollow" href="http://we-do-not-have-GitHub.com" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b26a_github-logo.png" alt="ICO GitHun link">
</a>
<a target="_blank" rel="nofollow" href="https://www.facebook.com/Play2Live-504880049864038/" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b117/59d510290116ac0001964c8e_facebook.png" alt="Facebook link">
</a>
<a target="_blank" rel="nofollow" href="https://talk.org/index.php?topic=2381679.0" class="link-block-2 w-inline-block">
<img src="https://global-uploads.webflow.com/0011f8c3c_talk.jpg" alt="Talk link">
</a>
</div>
</div>
</div>
</div>
</div>
</div>

最佳答案

代码中没有问题,在检查网页中的网站链接时,我可以看到文本为“网站”,但如果我使用相同的文本通过链接文本查找元素,例如下面我收到 NoSuchElementException

website = driver.find_element_by_link_text("Website").get_attribute("href")
print(website)

我尝试过给予“等待”并使用 partial_link_text 但没有运气。

然后我尝试获取标签名称“a”的所有元素并使用以下代码打印文本。

elements = driver.find_elements_by_tag_name("a")
for element in elements:
print(element.text)

后来我才知道这不是“网站”而是“网站”。但我不确定为什么它会这样。

将网站上的所有字符更改为大写后,我能够识别该元素并从中获取 href

driver.get("https://topicolist.com/ico/adhive")
website = driver.find_element_by_link_text("WEBSITE").get_attribute("href")
print(website)

希望它能解决您的问题。

关于python - 基于 Selenium 的抓取代码失败并出现错误 NoSuchElementException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48736196/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com