- android - 多次调用 OnPrimaryClipChangedListener
- android - 无法更新 RecyclerView 中的 TextView 字段
- android.database.CursorIndexOutOfBoundsException : Index 0 requested, 光标大小为 0
- android - 使用 AppCompat 时,我们是否需要明确指定其 UI 组件(Spinner、EditText)颜色
当运行一个脚本来检索所有带有“FlatParagraph”类的 blockquote 标签时,我似乎切断了 Blockquote 标签中的一些子标签。是否有包含所有子标签的查询?问题似乎与 <blockquote><i><a>text<a/><i/>
有关标签集。所以不是所有 child 的问题。
我正在使用下面的代码
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
fhand = urllib.request.urlopen('https://www.legislation.qld.gov.au/view/whole/html/2018-07-01/sl-2006-0200').read()
soup = BeautifulSoup(fhand, 'html.parser')
fp = soup.select('blockquote[class="FlatParagraph"]')
for i in fp:
print(i.text)
print('---------')
然后我使用 for 循环从每一行中检索文本
changedfplist = list()
for i in fp:
changedfplist.append(i.text.replace(u'\xa0', ' ').encode('utf-8'))
这是我正在解析的示例 -
<blockquote class="FlatParagraph"><blockquote class="Paragraph"><span class="ListNumber">(1)</span>This section applies if—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span>before the commencement—<blockquote class="Paragraph List"><span class="ListNumber">(i)</span>a person applied under <a href="#sec.28">section 28</a>(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(ii)</span>an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(iii)</span>the service had not decided whether or not to approve the proposed fire engineering design brief; and</blockquote>
</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span>the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(2)</span>For assessing the fire engineering design brief for the stated building work—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span><a href="#sec.61">section 61</a> applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span><a href="#sec.62">section 62</a>(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(c)</span><a href="#sch.2">schedule 2</a>, <a href="#sch.2-pt.3">part 3</a>, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(3)</span>In this section—<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> means a fire engineering brief meeting under <a href="#sec.28">section 28</a>(2)(d) of the repealed regulation.</blockquote><blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringdesignbriefmeetingfee"></a>former fire engineering design brief meeting fee</i></b> means the fire engineering design brief meeting fee stated in <a href="#sch.3">schedule 3</a> of the repealed regulation.</blockquote></blockquote></blockquote>
当我解析这个时,我得到了
(1)This section applies if - (a)before the commencement - (i)a person applied under section 28(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and
(ii)an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and
(iii)the service had not decided whether or not to approve the proposed fire engineering design brief; and
(b)the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.
(2)For assessing the fire engineering design brief for the stated building work - (a)section 61 applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and
(b)section 62(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and
(c)schedule 2, part 3, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.
(3)In this section - former fire engineering brief meeting
最后一行的末尾缺少文本。它已被切断
<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b>
更新 - 有一个类我试图避免使用 .FlatParagraph 没有用。我试图避免使用 class=FlatParagraph view-history-note。 FlatParagraph view-history-note 是 FlatParagraph 类标签的子标签类。
我用 lxml 和 html.parser 尝试了上面的代码,我得到了 lxml 的所有文本,以及 html.parser 的截断文本。如果有人知道原因,我很想听听!
最佳答案
您可以使用select()
或find()
查看下面的代码,我正在获取全文!
html = '''
<blockquote class="FlatParagraph"><blockquote class="Paragraph"><span class="ListNumber">(1)</span>This section applies if—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span>before the commencement—<blockquote class="Paragraph List"><span class="ListNumber">(i)</span>a person applied under <a href="#sec.28">section 28</a>(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(ii)</span>an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(iii)</span>the service had not decided whether or not to approve the proposed fire engineering design brief; and</blockquote>
</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span>the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(2)</span>For assessing the fire engineering design brief for the stated building work—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span><a href="#sec.61">section 61</a> applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span><a href="#sec.62">section 62</a>(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(c)</span><a href="#sch.2">schedule 2</a>, <a href="#sch.2-pt.3">part 3</a>, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(3)</span>In this section—<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> means a fire engineering brief meeting under <a href="#sec.28">section 28</a>(2)(d) of the repealed regulation.</blockquote><blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringdesignbriefmeetingfee"></a>former fire engineering design brief meeting fee</i></b> means the fire engineering design brief meeting fee stated in <a href="#sch.3">schedule 3</a> of the repealed regulation.</blockquote></blockquote></blockquote>
'''
soup = BeautifulSoup(html,'lxml')
fp = soup.select('.FlatParagraph')
for i in fp:
print(i.text)
或
fp = soup.find('blockquote',attrs={'class':'FlatParagraph'})
print(fp.text)
输出:
(1)This section applies if—(a)before the commencement—(i)a person applied under section 28(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and
(ii)an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and
(iii)the service had not decided whether or not to approve the proposed fire engineering design brief; and
(b)the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.
(2)For assessing the fire engineering design brief for the stated building work—(a)section 61 applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and
(b)section 62(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and
(c)schedule 2, part 3, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.
(3)In this section—former fire engineering brief meeting means a fire engineering brief meeting under section 28(2)(d) of the repealed regulation.former fire engineering design brief meeting fee means the fire engineering design brief meeting fee stated in schedule 3 of the repealed regulation.
关于python - BeautifulSoup soup.select 切断子标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55202258/
我不知道“汤”字面意思在与计算机图形相关的“三角形汤”或“多边形汤”中使用时是什么意思。是不是和我们用勺子吃饭的“汤”有关? (我的母语不是英语。) 最佳答案 维基百科来拯救! A polygon s
我们正在废弃 Amazon.in 网站以检索任何产品的价格。所有产品在“span”标签中的“id”属性都具有不同的值,例如; id = 'priceblock_ourprice', id = 'p
我有一个这样的模板: 和这样的输入 HTML COMPLEX HTML 其中 COMPLEX_HTML 是很多子标签(很干净 - 验证) 我试图将输入 HTML 的 body 标记内的 HTML
我对 soup('tag_name') 和 soup.find_all('tag_name') 之间的区别感到困惑。下面是一个包含一小段 html 的示例: from bs4 import Beaut
我正在尝试使用 css 选择器解析 html 页面 import requests import webbrowser from bs4 import BeautifulSoup page = req
这是网页 HTML 源代码的一部分: apple banana cherry melon 我想提取我想要的网址,比如以/Result 开头的网址?我刚刚了解到您可以在 beautiful soup
我注意到一个非常烦人的错误:BeautifulSoup4(包:bs4)经常发现比以前版本(包:BeautifulSoup)更少的标签。 这是该问题的一个可重现的实例: import requests
所以我一直在试图弄清楚如何抓取一个购买/销售网站的网站,我发现了 HTML 中的所有内容,但该类包含不同的随机数,例如:
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。 这个问题似乎不是关于 a specific programming problem, a softwa
我正在尝试加载 html 页面并输出文本,即使我正确获取网页,BeautifulSoup 以某种方式破坏了编码。 来源: # -*- coding: utf-8 -*- import requests
题目地址:https://leetcode.com/problems/soup-servings/description/ 题目描述: There are two types of soup: t
您好,我正在尝试从网站获取一些信息。请原谅我,如果我的格式有任何错误,这是我第一次发布到 SO。 soup.find('div', {"class":"stars"}) 从这里我收到 我需要 “
我想从 Google Arts & Culture 检索信息使用 BeautifulSoup。我检查了许多 stackoverflow 帖子( [1] , [2] , [3] , [4] , [5]
我决定学习 Python,因为我现在有更多时间(由于大流行)并且一直在自学 Python。 我试图从一个网站上刮取税率,几乎可以获得我需要的一切。下面是来自我的 Soup 变量以及相关 Python
我正在使用 beautifulsoup 从页面中获取所有链接。我的代码是: import requests from bs4 import BeautifulSoup url = 'http://ww
我正在尝试根据部分属性值来识别 html 文档中的标签。 例如,如果我有一个 Beautifulsoup 对象: import bs4 as BeautifulSoup r = requests.ge
Показать телефон 如何在 Beautiful Soup 中找到上述元素? 我尝试了以下方法,但没有奏效: show = soup.find('div', {'class': 'acti
我如何获得结果网址:https://www.sec.gov/Archives/edgar/data/1633917/000163391718000094/0001633917-18-000094-in
我是 python 新手,尝试从页面中提取表格,但无法使用 BS4 找到该表格。你能告诉我我哪里出错了吗? import requests from bs4 import BeautifulSoup
我有一个巨大的 XML 文件(1.2 G),其中包含数百万个 MusicAlbums 的信息,每个 MusicAlbums 都具有如下简单格式 P 22 Exitos
我是一名优秀的程序员,十分优秀!