gpt4 book ai didi

python - Selenium - 网页抓取;如何使用selenium获取特定标签?

转载 作者:行者123 更新时间:2023-12-02 02:36:02 25 4
gpt4 key购买 nike

我正在从大学网站上抓取不同的类(class)。

该网站部分的 HTML 为:

<div>
<h2>About the programme</h2>
<p>The National&nbsp;Joint&nbsp;PhD Programme in Nautical Operations&nbsp;is organised as a joint degree between the following four national higher education institutions offering professional maritime education:</p>
<ul>
<li>Universtity of Troms&oslash; - The Arctic University of Norway (UiT)</li>
<li>University of&nbsp;South-Eastern&nbsp;Norway (USN)</li>
<li>Western Norway University of Applied Sciences (HVL)</li>
<li>Norwegian University of Science and Technology (NTNU)</li>
</ul>
<p>
The National&nbsp;Joint&nbsp;PhD Programme in Nautical Operations will educate qualified candidates for research, teaching, dissemination and innovation work, and other activities requiring scientific insight and an operational
maritime focus.&nbsp;
</p>
<p>
Implementation of complex nautical operations today requires interdisciplinarity and differentiated competence, including research expertise, for the safe and efficient planning, implementation and evaluation of nautical
operations.&nbsp;
</p>
<p>The programme has the following&nbsp;vision: to create an internationally recognized national PhD degree in nautical operations.</p>
<p>This vision will be achieved through the following overall objectives:</p>
<ol>
<li>Strengthen the multidisciplinary national expertise in nautical operations through collaboration between the four higher education institutions in Norway with professional maritime education.</li>
<li>The PhD Programme in Nautical Operations is the preferred Programme in the field and attracts good applicants nationally and internationally from major maritime nations.</li>
<li>Individuals graduating from the Programme are in demand both nationally and internationally because they have a strong and relevant research-based expertise and the ability to innovate and adapt.</li>
<li>Increase value creation and innovation through close cooperation between academia, maritime industry and public sector.</li>
<li>The multidisciplinary national competence related to nautical operations constitutes an internationally recognised professional environment that sets the terms for the development of knowledge in the field.</li>
</ol>
<h2>Academic content</h2>
<p>Nautical operations consist of two subject areas:</p>
<ul>
<li>
Nautical studies&nbsp;that include navigation, maneuvering and transport of floating craft, and operations, indicating that the PhD program will focus on applied research to support, improve and develop the activities
undertaken.
</li>
<li>
The operational perspective&nbsp;includes strategic, tactical and operational aspects.&nbsp;Strategic levels include the choice of type and size of a ship fleet.&nbsp;Tactical aspects concern the design of individual ships and
the selection of equipment and staff.&nbsp;The operational aspects include planning, implementation and evaluation of nautical operations.
</li>
</ul>
<p>There is a compulsory&nbsp;joint maritime course offered at all the four institutions.</p>

网站链接: https://www.usn.no/english/research/postgraduate-studies-phd/our-phd-programmes/nautical-operations/

我正在尝试获取上面“h2”标签中的course_description/about_the_courseacademic_content文本。我完全不知道,如何创建一个通用代码来根据 h2 标签抓取标签文本。

此外,我认为索引不会有帮助,因为 <'p'> 和 <'li'> 标签的顺序会因类(class)而异。

最佳答案

您可以将 .get_text()separator='\n' 一起使用:

import requests
from bs4 import BeautifulSoup


url = 'https://www.usn.no/english/research/postgraduate-studies-phd/our-phd-programmes/nautical-operations/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

desc = soup.find('h2', text=lambda t: 'About the programme' in t)
print( desc.parent.get_text(strip=True, separator='\n') )

打印:

About the programme
The National Joint PhD Programme in Nautical Operations is organised as a joint degree between the following four national higher education institutions offering professional maritime education:
Universtity of Tromsø
- The Arctic University of Norway (UiT)
University of South-Eastern Norway (USN)
Western Norway University of Applied Sciences
(HVL)
Norwegian University of Science and Technology
(NTNU)
The National Joint PhD Programme in Nautical Operations will educate qualified candidates for research, teaching, dissemination and innovation work, and other activities requiring scientific insight and an operational maritime focus.
Implementation of complex nautical operations today requires interdisciplinarity and differentiated competence, including research expertise, for the safe and efficient planning, implementation and evaluation of nautical operations.
The programme has the following vision: to create an internationally recognized national PhD degree in nautical operations.
This vision will be achieved through the following overall objectives:
Strengthen the multidisciplinary national expertise in nautical operations through collaboration between the four higher education institutions in Norway with professional maritime education.
The PhD Programme in Nautical Operations is the preferred Programme in the field and attracts good applicants nationally and internationally from major maritime nations.
Individuals graduating from the Programme are in demand both nationally and internationally because they have a strong and relevant research-based expertise and the ability to innovate and adapt.
Increase value creation and innovation through close cooperation between academia, maritime industry and public sector.
The multidisciplinary national competence related to nautical operations constitutes an internationally recognised professional environment that sets the terms for the development of knowledge in the field.
Academic content
Nautical operations consist of two subject areas:
Nautical studies that include navigation, maneuvering and transport of floating craft, and operations, indicating that the PhD program will focus on applied research to support, improve and develop the activities undertaken.
The operational perspective includes strategic, tactical and operational aspects. Strategic levels include the choice of type and size of a ship fleet. Tactical aspects concern the design of individual ships and the selection of equipment and staff. The operational aspects include planning, implementation and evaluation of nautical operations.
There is a compulsory joint maritime course offered at all the four institutions.

关于python - Selenium - 网页抓取;如何使用selenium获取特定标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64306680/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com