gpt4 book ai didi

python - 从指定id的div开始获取嵌套的div内容

转载 作者:行者123 更新时间:2023-12-04 15:04:34 25 4
gpt4 key购买 nike

我有以下带有 id="participant" 的 div :

<div id="participant" class="panel-collapse collapse in" role="tabpanel" aria-expanded="true" aria-labelledby="headingOne" style="">
<div class="panel-body">
<div class="row">
<div class="col-sm-12">
<div class="question-container">
<div class="question-group">
<h5 class="question">
Organisation
</h5>
<div class="answer">
<p>Ministerio de Hacienda [Ministry of Finance]</p>
<p>Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]</p>
<p>Central Bank of Paraguay – Superintendence of Banks</p>
<br>
</div>
</div>
<div class="question-group">
<h5 class="question">
Role of the organisation
</h5>
<div class="answer">
<p>The Ministry of Finance has authority to establish accounting standards for all entities in Paraguay other than banks and financial institutions.&nbsp; </p>
<p>The Consejo is the professional association of public accountants in Paraguay.&nbsp; The Consejo advises the Ministry of Finance with regard to accounting standards.</p>
<p>Accounting standards for banks and other financial institutions are established by the Central Bank of Paraguay.</p>
</div>
</div>
<div class="question-group">
<h5 class="question">
Website
</h5>
<div class="answer">
<p>Ministry of Finance: <a href="http://www.hacienda.gov.py" target="_blank">http://www.hacienda.gov.py</a></p>
<p>Consejo: <a href="http://www.consejo.com.py" target="_blank">www.consejo.com.py</a></p>
<p>Central Bank: <a href="http://www/bcp.gov.py" target="_blank">http://www/bcp.gov.py</a></p>
</div>
</div>
<div class="question-group">
<h5 class="question">
Email contact
</h5>
<div class="answer">
<p>Consejo: <a href="mailto:consejo@consejo.com.py">consejo@consejo.com.py</a><br>
Central Bank:
</p>
<ul>
<li><a href="mailto:afranco@bcp.gov.py">afranco@bcp.gov.py</a> and <a href="hcentu@bcp.gov.py">hcentu@bcp.gov.py</a></li>
<li><a href="mailto:jjimenez@bcp.gov.py">jjimenez@bcp.gov.py</a></li>
<li><a href="mailto:hcolman@bcp.gov.py">hcolman@bcp.gov.py</a></li>
</ul>
</div>
</div>
</div>
</div>
</div>

我想用 class="question" 获取每个 div 的内容和 class="answer"<div id="participant"> 开始因为我有很多具有相同结构和 CSS 的 div,所以我可以用 id 来区分它们

这是我的预期输出:

Organisation Ministerio de Hacienda [Ministry of Finance]
Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]
Central Bank of Paraguay – Superintendence of Banks
Role of the The Ministry of Finance has authority to establish accounting standards for all entities in Paraguay other than banks and financial institutions.
organisation The Consejo is the professional association of public accountants in Paraguay. The Consejo advises the Ministry of Finance with regard to accounting standards.
Accounting standards for banks and other financial institutions are established by the Central Bank of Paraguay.
Website Ministry of Finance: http://www.hacienda.gov.py
Consejo: www.consejo.com.py
Central Bank: http://www/bcp.gov.py
Emailcontact Consejo: consejo@consejo.com.py
Central Bank:
afranco@bcp.gov.py and hcentu@bcp.gov.py
jjimenez@bcp.gov.py
hcolman@bcp.gov.py

这是我到目前为止的工作:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
# Site URL
url = "https://www.ifrs.org/use-around-the-world/use-of-ifrs-standards-by-jurisdiction/paraguay"
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
# Parse HTML code for the entire site
soup = BeautifulSoup(html_content, "lxml")
divs = soup.find_all("div", attrs={"id": "participant"})
disp = []
d=[]
for c in divs : disp.append(c.find('div', attrs={'class': 'question-group'}))
for t in disp : d.append(t.h5.text.strip())

最佳答案

抛开最终的打印格式不谈,像这样的东西应该可以工作:

questions = [q.text.strip() for q in soup.select('div#participant h5.question') ]
answers = [a.text.strip() for a in soup.select('div#participant div.answer')]
for q, a in zip(questions,answers):
print(q,": ",a)
print('---')

输出:

Organisation :  Ministerio de Hacienda [Ministry of Finance]
Consejo de Contadores Públicos del Paraguay (Consejo) [Council of Public Accountants of Paraguay]
Central Bank of Paraguay – Superintendence of Banks
---

等等

关于python - 从指定id的div开始获取嵌套的div内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66400783/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com