gpt4 book ai didi

Python - 获取 HTML 标签之间的文本

转载 作者:太空宇宙 更新时间:2023-11-03 14:23:29 25 4
gpt4 key购买 nike

下面你可以看到我的代码。它遍历项目列表并给出一个表作为输出。

x = PrettyTable(["Soli", "Zusammenfassung", "Bearbeiter", "Status", "Termin"])

display = Display()
display.start()
driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.example.com')

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from BeautifulSoup import BeautifulSoup

for j in range(0,len(total_tickets)):
url = driver.current_url
r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content, 'lxml')

ticket = driver.find_elements_by_xpath("//*[@id='ghx-issues-in-epic-table']/tbody/tr/td[2]/a")
ticket[j].click()

driver.get_screenshot_as_file("test.png")
worker = driver.find_element_by_xpath("//*[@id='peopledetails']/li/dl[1]/dd").find_element_by_class_name("user-hover").get_attribute("rel")
Soli = driver.find_element_by_xpath("//*[@id='key-val']").get_attribute("data-issue-key")
driver.find_element_by_xpath("//*[@id='summary-val']/span").click()
conclusion = driver.find_element_by_xpath("//*[@id='summary']").get_attribute("value")
status = soup.find('span',{'class':'classname'}).get_text
try:
termin = driver.find_element_by_xpath("//*[@id='datesmodule']").find_element_by_xpath("//*[@id='customfield_10090-val']/span[1]/time").get_attribute("datetime")
except NoSuchElementException:
termin = "No Deadline"

x.add_row([Soli, conclusion, worker, status, termin])
x.padding_width = 1
with open('file', 'w') as w:
w.write(str(x))

第一个问题,我收到此错误:

Traceback (most recent call last):
File "save.py", line 104, in <module>
status = soup.find('span',{'class':'classname'}).get_text
AttributeError: 'NoneType' object has no attribute 'get_text'

如果我删除“get_text”属性,状态部分始终显示“NONE”作为输出。

这是应该获取文本的 HTML。我希望它显示跨度标记之间的文本“NEU”。

<li class="item item-right">
<div class="wrap">
<strong class="name">
Status:
</strong>
<span id="status-val" class="value">
<span class="classname" original-title="">
Neu
</span>
</span>
<span class="status-view">(<a href="#" class="classname">Arbeitsablauf anzeigen</a>)
</span>
</div>
</li>

最佳答案

可以使用beautifulsoup中的select方法

soup.select("div#id")[0].text

[0]表示第一个元素

“#id”是div的id

.class是div的类

关于Python - 获取 HTML 标签之间的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47787289/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com