gpt4 book ai didi

python - Beautifulsoup4 - 通过强标签值识别信息仅适用于标签的某些值

转载 作者:太空宇宙 更新时间:2023-11-03 14:04:18 25 4
gpt4 key购买 nike

我正在使用以下 HTML“ block ”:

<div class="marketing-directories-results">
<ul>
<li>
<div class="contact-details">
<h2>
A I I Insurance Brokerage of Massachusetts Inc
</h2>
<br/>
<address>
183 Davis St
<br/>
East Douglas
<br/>
Massachusetts
<br/>
U S A
<br/>
MA 01516-113
</address>
<p>
<a href="http://www.agencyint.com">
www.agencyint.com
</a>
</p>
</div>
<span data-toggle=".info-cov-0">
Additional trading information
<i class="icon plus">
</i>
</span>
<ul class="result-info info-cov-0 cc">
<li>
<strong>
Accepts Business From:
</strong>
<ul class="cc">
<li>
U.S.A
</li>
</ul>
</li>
<li>
<strong>
Classes of business
</strong>
<ul class="cc">
<li>
Engineering
</li>
<li>
NM General Liability (US direct)
</li>
<li>
Property D&amp;F (US binder)
</li>
<li>
Terrorism
</li>
</ul>
</li>
<li>
<strong>
Disclaimer:
</strong>
<p>
Please note that while coverholders may have been approved by Lloyd's to accept business from the regions shown:
</p>
<p>
it is the responsibility of the parties, including the coverholder and any Lloyd's managing agent appointing them to ensure that the coverholder complies with all local regulatory and legal requirements; and
</p>
<p>
the coverholder may not provide cover for all classes they are approved to underwrite in all territories where they have approval.
</p>
</li>
</ul>
</li>
<li>
<div class="contact-details">
<h2>
ABCO Insurance Underwriters Inc
</h2>
<br/>
<address>
ABCO Building, 350 Sevilla Avenue, Suite 201
<br/>
Coral Gables
<br/>
Florida
<br/>
U S A
<br/>
33134
</address>
<p>
<a href="http://www.abcoins.com">
www.abcoins.com
</a>
</p>
</div>
<span data-toggle=".info-cov-1">
Additional trading information
<i class="icon plus">
</i>
</span>
<ul class="result-info info-cov-1 cc">
<li>
<strong>
Accepts Business From:
</strong>
<ul class="cc">
<li>
U.S.A
</li>
</ul>
</li>
<li>
<strong>
Classes of business
</strong>
<ul class="cc">
<li>
Property D&amp;F (US binder)
</li>
<li>
Terrorism
</li>
</ul>
</li>
<li>
<strong>
Disclaimer:
</strong>
<p>
Please note that while coverholders may have been approved by Lloyd's to accept business from the regions shown:
</p>
<p>
it is the responsibility of the parties, including the coverholder and any Lloyd's managing agent appointing them to ensure that the coverholder complies with all local regulatory and legal requirements; and
</p>
<p>
the coverholder may not provide cover for all classes they are approved to underwrite in all territories where they have approval.
</p>
</li>
</ul>
</li>
</ul>
</div>

我正在从此 HTML 中获取多个数据点。给我带来麻烦的是“接受业务来自:”和“业务类别”值。我可以通过以下方式获取“接受业务:”值,无论它以何种顺序出现:

try:
li_area = company.find('ul', class_='result-info info-cov-' +
str(company_counter) + ' cc')
li_stuff = li_area.find_all('li')
for li in li_stuff:
if li.strong.text.strip() == 'Accepts Business From:':
business_final = li.find('li').text.strip()
except AttributeError:
pass

注意:“company”变量是 beautifulsoup 对象,其中包含我上面粘贴的 html。

注意:页面上每条记录的类名都会发生变化 - 为了保持简洁,我在 HTML 示例中只包含了一条记录。

当我尝试相同的代码块时,这次替换了 li.strong.text.strip() == 'Accepts Business From:''Classes of business'但代码似乎没有检测到那个强标签,只是检测到“接受业务来自:”。我的 for 循环是否不正确,并且实际上没有迭代每个 <li>包含这些不同强标签的标签?这个强标签的真正值(value)是否不同于“业务类别”? (我确实直接从网站的 html 复制了该值)。

非常感谢您能提供的任何帮助

最佳答案

您收到 'Accepts Business From:' 的短信的原因而不是'Classes of business'是您正在使用 try-except在错误的地方。

for li in li_stuff: 的第二次迭代中循环,li变成<li>U.S.A</li> ,这将抛出和 AttributeError调用li.strong因为没有<strong>存在标签。并且,根据您当前的try-except ,错误在 for 之外被捕获循环,是 pass编辑。因此,循环不会到达第三次迭代,在第三次迭代中它应该获取“Classes of Business”文本。

要在捕获错误后继续循环,请使用:

for li in li_stuff:
try:
if li.strong.text.strip() == 'Accepts Business From:':
business_final = li.find('li').text.strip()
print('Accepts Business From:', business_final)
if li.strong.text.strip() == 'Classes of business':
business_final = li.find('li').text.strip()
print('Classes of business:', business_final)
except AttributeError:
pass # or you can use 'continue' too.

输出:

Accepts Business From: U.S.A
Classes of business: Engineering

但是,由于“业务类别”存在许多值,您可以将代码更改为此以获取所有值:

if li.strong.text.strip() == 'Classes of business':
business_final = ', '.join([x.text.strip() for x in li.find_all('li')])
print('Classes of business:', business_final)

输出:

Accepts Business From: U.S.A
Classes of business: Engineering, NM General Liability (US direct), Property D&F (US binder), Terrorism

关于python - Beautifulsoup4 - 通过强标签值识别信息仅适用于标签的某些值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49018025/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com