gpt4 book ai didi

python-2.7 - python 2.7 BeautifulSoup 查找包含特定字符串的表

转载 作者:行者123 更新时间:2023-12-04 16:21:19 26 4
gpt4 key购买 nike

在 BeautifulSoup 文档中搜索字符串后,如何获取包含该字符串的表?我有一个解决方案,它适用于我熟悉的一张 table :

我的代码如下:

import mechanize
from bs4 import BeautifulSoup

sitemap_url = "https://www.rbi.org.in/scripts/sitemap.aspx"

br = mechanize.Browser()
br.addheaders = [('User-agent',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'),
('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')]

response = br.open(sitemap_url)
text = response.read()
br.close()

soup = BeautifulSoup(text, 'lxml')

# Find the table containing the financial intermediaries.

# First I find "Financial Intermediaries" in soup.
fin_str = soup.find(text="Financial Intermediaries")

# Next I step out through the parents
# until it turns out that I have found the table.
fin_tbl = fin_str.parent.parent.parent.parent

这样做的问题是我每次退出文档时都必须检查结果。在看到表格之前,如何添加 .parent?

最佳答案

将以下代码附加到程序中:

# The first tag around the string is the parent.
fn_in = fin_str.parent

# Step out through the parents.
def step_out(i):
if isinstance(i, element.NavigableString):
pass
return i.parent

# Continue until 'table' is in the name of the tag.
while not 'table' in fn_in.name:
fn_in = step_out(fn_in)

关于python-2.7 - python 2.7 BeautifulSoup 查找包含特定字符串的表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40271663/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com