gpt4 book ai didi

Python Beautifulsoup 获取标签下的文字

转载 作者:行者123 更新时间:2023-12-05 09:33:09 32 4
gpt4 key购买 nike

我正在尝试获取特定月份的所有链接、标题和日期,例如网站上的三月,我正在使用 BeautifulSoup 这样做:

from bs4 import BeautifulSoup
import requests

html_link='https://www.pds.com.ph/index.html%3Fpage_id=3261.html'
html = requests.get(html_link).text
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('td'):
#Text contains 'March'
#Get the link title & link &date

我是 BeautifulSoup 的新手,在 Selenium 中我使用了 xpath:"//td[contains(text(),'Mar')",我如何使用 BeautifulSoup 做到这一点?

最佳答案

如果“日期”包含文本“三月”,则获取所有链接和标题:

  1. 查找“日期”- 找到所有 <td>具有文本“march”的元素。

  2. 查找上一个<a>使用 .find_previous() 的标签包含所需标题和链接的方法。


import requests
from bs4 import BeautifulSoup


url = "https://www.pds.com.ph/index.html%3Fpage_id=3261.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

fmt_string = "{:<20} {:<130} {}"
print(fmt_string.format("Date", "Title", "Link"))
print('-' * 200)

for tag in soup.select("td:contains('March')"):
a_tag = tag.find_previous("a")
print(
fmt_string.format(
tag.text, a_tag.text, "https://www.pds.com.ph/" + a_tag["href"],
)
)

输出(截断):

Date                 Title                                                                                                                              Link
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
March 31, 2021 RCBC Lists PHP 17.87257 Billion ASEAN Sustainability Bonds on PDEx https://www.pds.com.ph/index.html%3Fp=87239.html
March 16, 2021 Aboitiz Power Corporation Raises 8 Billion Fixed Rate Bonds on PDEx https://www.pds.com.ph/index.html%3Fp=86743.html
March 1, 2021 Century Properties Group, Inc Returns to PDEx with PHP 3 Billion Fixed Rate Bonds https://www.pds.com.ph/index.html%3Fp=86366.html
March 27, 2020 BPI Lists Over PhP 33 Billion of Fixed Rate Bonds on PDEx https://www.pds.com.ph/index.html%3Fp=74188.html
March 25, 2020 SM Prime Raises PHP 15 Billion Fixed Rate Bonds on PDEx https://www.pds.com.ph/index.html%3Fp=74082.html
...

关于Python Beautifulsoup 获取标签下的文字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67394695/

32 4 0