Python 和 Beautifulsoup : Searching only in a certain class-6ren

gpt4 book

didi

Python 和 Beautifulsoup : Searching only in a certain class

转载作者：太空狗更新时间：2023-10-30 01:31:56

25

4

gpt4 key购买

nike

我写了一个脚本来捕捉维基百科上几个国家的独立日期。

例如哈萨克斯坦:

URL_QS = 'https://en.wikipedia.org/wiki/Kazakhstan'
r = requests.get(URL_QS)
soup = BeautifulSoup(r.text, 'lxml')

# Only keep the infobox (top right)
infobox = soup.find("table", class_="infobox geography vcard")

if infobox:
    formation = infobox.find_next(text = re.compile("Formation"))

    if formation: 
        independence = formation.find_next(text = re.compile("independence")) 

        if independence:
            independ_date = independence.find_next("td").text
        else:
            independence = formation.find_next(text = re.compile("Independence"))

            if independence:
                independ_date = independence.find_next("td").text


print(independ_date)

我有以下输出:

Almaty

此输出未本地化在信息框中，而是在文本之后。这是因为 "formation.find_next(text = re.compile("independence"))" 在信息框之外发现了一些东西，但我不明白为什么研究不应该只在信息框中进行？我怎样才能只搜索这个字段？

预先感谢您的帮助!

最佳答案

It's because "formation.find_next(text = re.compile("independence"))" found something outside of the infobox

将 .extract() 添加到您的 soup.find() 以仅在 infobox geography vcard 元素内搜索。

infobox = soup.find("table", class_="infobox geography vcard").extract()

关于Python 和 Beautifulsoup : Searching only in a certain class，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47441585/

25

4

0

文章推荐： python - 从另一个数组中定义的(图像)数组中提取多个窗口/补丁

文章推荐： c# - 在 TestResults 的 deploymentFolder 的 "Out"文件夹中从 UnitTest 输出，在 Visual Studio 中使用 MSTest

文章推荐： c# - Entity Framework 的通用插入或更新

文章推荐： c# - UWP 访问冲突异常

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

滴滴打车优惠券

全站热门文章

Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com