gpt4 book ai didi

python - BeautifulSoup soup.select 切断子标签

转载 作者:太空宇宙 更新时间:2023-11-04 02:03:39 25 4
gpt4 key购买 nike

当运行一个脚本来检索所有带有“FlatParagraph”类的 blockquote 标签时,我似乎切断了 Blockquote 标签中的一些子标签。是否有包含所有子标签的查询?问题似乎与 <blockquote><i><a>text<a/><i/> 有关标签集。所以不是所有 child 的问题。

我正在使用下面的代码

import urllib


from urllib.request import urlopen
from bs4 import BeautifulSoup

fhand = urllib.request.urlopen('https://www.legislation.qld.gov.au/view/whole/html/2018-07-01/sl-2006-0200').read()

soup = BeautifulSoup(fhand, 'html.parser')
fp = soup.select('blockquote[class="FlatParagraph"]')
for i in fp:
print(i.text)
print('---------')

然后我使用 for 循环从每一行中检索文本

changedfplist = list()
for i in fp:
changedfplist.append(i.text.replace(u'\xa0', ' ').encode('utf-8'))

这是我正在解析的示例 -

<blockquote class="FlatParagraph"><blockquote class="Paragraph"><span class="ListNumber">(1)</span>This section applies if—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span>before the commencement—<blockquote class="Paragraph List"><span class="ListNumber">(i)</span>a person applied under <a href="#sec.28">section&nbsp;28</a>(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(ii)</span>an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(iii)</span>the service had not decided whether or not to approve the proposed fire engineering design brief; and</blockquote>
</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span>the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(2)</span>For assessing the fire engineering design brief for the stated building work—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span><a href="#sec.61">section&nbsp;61</a> applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span><a href="#sec.62">section&nbsp;62</a>(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(c)</span><a href="#sch.2">schedule&nbsp;2</a>, <a href="#sch.2-pt.3">part&nbsp;3</a>, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(3)</span>In this section—<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> means a fire engineering brief meeting under <a href="#sec.28">section&nbsp;28</a>(2)(d) of the repealed regulation.</blockquote><blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringdesignbriefmeetingfee"></a>former fire engineering design brief meeting fee</i></b> means the fire engineering design brief meeting fee stated in <a href="#sch.3">schedule&nbsp;3</a> of the repealed regulation.</blockquote></blockquote></blockquote>

当我解析这个时,我得到了

(1)This section applies if - (a)before the commencement - (i)a person applied under section 28(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and

(ii)an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and

(iii)the service had not decided whether or not to approve the proposed fire engineering design brief; and

(b)the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.

(2)For assessing the fire engineering design brief for the stated building work - (a)section 61 applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and

(b)section 62(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and

(c)schedule 2, part 3, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.

(3)In this section - former fire engineering brief meeting

最后一行的末尾缺少文本。它已被切断

<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> 

更新 - 有一个类我试图避免使用 .FlatParagraph 没有用。我试图避免使用 class=FlatParagraph view-history-note。 FlatParagraph view-history-note 是 Fl​​atParagraph 类标签的子标签类。

我用 lxml 和 html.parser 尝试了上面的代码,我得到了 lxml 的所有文本,以及 html.parser 的截断文本。如果有人知道原因,我很想听听!

最佳答案

您可以使用select()find() 查看下面的代码,我正在获取全文!

html = '''
<blockquote class="FlatParagraph"><blockquote class="Paragraph"><span class="ListNumber">(1)</span>This section applies if—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span>before the commencement—<blockquote class="Paragraph List"><span class="ListNumber">(i)</span>a person applied under <a href="#sec.28">section&nbsp;28</a>(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(ii)</span>an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(iii)</span>the service had not decided whether or not to approve the proposed fire engineering design brief; and</blockquote>
</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span>the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(2)</span>For assessing the fire engineering design brief for the stated building work—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span><a href="#sec.61">section&nbsp;61</a> applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span><a href="#sec.62">section&nbsp;62</a>(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(c)</span><a href="#sch.2">schedule&nbsp;2</a>, <a href="#sch.2-pt.3">part&nbsp;3</a>, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(3)</span>In this section—<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> means a fire engineering brief meeting under <a href="#sec.28">section&nbsp;28</a>(2)(d) of the repealed regulation.</blockquote><blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringdesignbriefmeetingfee"></a>former fire engineering design brief meeting fee</i></b> means the fire engineering design brief meeting fee stated in <a href="#sch.3">schedule&nbsp;3</a> of the repealed regulation.</blockquote></blockquote></blockquote>
'''
soup = BeautifulSoup(html,'lxml')
fp = soup.select('.FlatParagraph')
for i in fp:
print(i.text)

fp = soup.find('blockquote',attrs={'class':'FlatParagraph'})
print(fp.text)

输出:

(1)This section applies if—(a)before the commencement—(i)a person applied under section 28(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and
(ii)an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and
(iii)the service had not decided whether or not to approve the proposed fire engineering design brief; and

(b)the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.
(2)For assessing the fire engineering design brief for the stated building work—(a)section 61 applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and
(b)section 62(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and
(c)schedule 2, part 3, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.
(3)In this section—former fire engineering brief meeting means a fire engineering brief meeting under section 28(2)(d) of the repealed regulation.former fire engineering design brief meeting fee means the fire engineering design brief meeting fee stated in schedule 3 of the repealed regulation.

关于python - BeautifulSoup soup.select 切断子标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55202258/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com