- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在从大学网站上抓取不同的类(class)。
该网站部分的 HTML 为:
<div>
<h2>About the programme</h2>
<p>The National Joint PhD Programme in Nautical Operations is organised as a joint degree between the following four national higher education institutions offering professional maritime education:</p>
<ul>
<li>Universtity of Tromsø - The Arctic University of Norway (UiT)</li>
<li>University of South-Eastern Norway (USN)</li>
<li>Western Norway University of Applied Sciences (HVL)</li>
<li>Norwegian University of Science and Technology (NTNU)</li>
</ul>
<p>
The National Joint PhD Programme in Nautical Operations will educate qualified candidates for research, teaching, dissemination and innovation work, and other activities requiring scientific insight and an operational
maritime focus.
</p>
<p>
Implementation of complex nautical operations today requires interdisciplinarity and differentiated competence, including research expertise, for the safe and efficient planning, implementation and evaluation of nautical
operations.
</p>
<p>The programme has the following vision: to create an internationally recognized national PhD degree in nautical operations.</p>
<p>This vision will be achieved through the following overall objectives:</p>
<ol>
<li>Strengthen the multidisciplinary national expertise in nautical operations through collaboration between the four higher education institutions in Norway with professional maritime education.</li>
<li>The PhD Programme in Nautical Operations is the preferred Programme in the field and attracts good applicants nationally and internationally from major maritime nations.</li>
<li>Individuals graduating from the Programme are in demand both nationally and internationally because they have a strong and relevant research-based expertise and the ability to innovate and adapt.</li>
<li>Increase value creation and innovation through close cooperation between academia, maritime industry and public sector.</li>
<li>The multidisciplinary national competence related to nautical operations constitutes an internationally recognised professional environment that sets the terms for the development of knowledge in the field.</li>
</ol>
<h2>Academic content</h2>
<p>Nautical operations consist of two subject areas:</p>
<ul>
<li>
Nautical studies that include navigation, maneuvering and transport of floating craft, and operations, indicating that the PhD program will focus on applied research to support, improve and develop the activities
undertaken.
</li>
<li>
The operational perspective includes strategic, tactical and operational aspects. Strategic levels include the choice of type and size of a ship fleet. Tactical aspects concern the design of individual ships and
the selection of equipment and staff. The operational aspects include planning, implementation and evaluation of nautical operations.
</li>
</ul>
<p>There is a compulsory joint maritime course offered at all the four institutions.</p>
网站链接: https://www.usn.no/english/research/postgraduate-studies-phd/our-phd-programmes/nautical-operations/
我正在尝试获取上面“h2”标签中的course_description/about_the_course和academic_content文本。我完全不知道,如何创建一个通用代码来根据 h2 标签抓取标签文本。
此外,我认为索引不会有帮助,因为 <'p'> 和 <'li'> 标签的顺序会因类(class)而异。
最佳答案
您可以将 .get_text()
与 separator='\n'
一起使用:
import requests
from bs4 import BeautifulSoup
url = 'https://www.usn.no/english/research/postgraduate-studies-phd/our-phd-programmes/nautical-operations/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
desc = soup.find('h2', text=lambda t: 'About the programme' in t)
print( desc.parent.get_text(strip=True, separator='\n') )
打印:
About the programme
The National Joint PhD Programme in Nautical Operations is organised as a joint degree between the following four national higher education institutions offering professional maritime education:
Universtity of Tromsø
- The Arctic University of Norway (UiT)
University of South-Eastern Norway (USN)
Western Norway University of Applied Sciences
(HVL)
Norwegian University of Science and Technology
(NTNU)
The National Joint PhD Programme in Nautical Operations will educate qualified candidates for research, teaching, dissemination and innovation work, and other activities requiring scientific insight and an operational maritime focus.
Implementation of complex nautical operations today requires interdisciplinarity and differentiated competence, including research expertise, for the safe and efficient planning, implementation and evaluation of nautical operations.
The programme has the following vision: to create an internationally recognized national PhD degree in nautical operations.
This vision will be achieved through the following overall objectives:
Strengthen the multidisciplinary national expertise in nautical operations through collaboration between the four higher education institutions in Norway with professional maritime education.
The PhD Programme in Nautical Operations is the preferred Programme in the field and attracts good applicants nationally and internationally from major maritime nations.
Individuals graduating from the Programme are in demand both nationally and internationally because they have a strong and relevant research-based expertise and the ability to innovate and adapt.
Increase value creation and innovation through close cooperation between academia, maritime industry and public sector.
The multidisciplinary national competence related to nautical operations constitutes an internationally recognised professional environment that sets the terms for the development of knowledge in the field.
Academic content
Nautical operations consist of two subject areas:
Nautical studies that include navigation, maneuvering and transport of floating craft, and operations, indicating that the PhD program will focus on applied research to support, improve and develop the activities undertaken.
The operational perspective includes strategic, tactical and operational aspects. Strategic levels include the choice of type and size of a ship fleet. Tactical aspects concern the design of individual ships and the selection of equipment and staff. The operational aspects include planning, implementation and evaluation of nautical operations.
There is a compulsory joint maritime course offered at all the four institutions.
关于python - Selenium - 网页抓取;如何使用selenium获取特定标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64306680/
Selenium IDE、Selenium RC 和 Selenium WebDriver 有什么区别;我们可以在什么样的项目中使用它们?任何建议将不胜感激。 最佳答案 Selenium IDE 是一
我的 Selenium 服务器在远程服务器上运行。我从我的本地 PC 启动我的 Selenium 脚本,它从网站获取数据。 例如,我的 Selenium 脚本执行这段 JS 代码: JSON.stri
Selenium 中“//div[.//a[text()='SELENIUM']]”和“//div[//a[text()='SELENIUM']]”有什么区别xpath。 有人可以澄清我在 xpath
我正在创建自动冒烟测试。我读到在单元测试中使用多个断言不是一个好的做法,这条规则是否也适用于使用 selenium 的 webdriver 测试? 在我的冒烟测试中,有时我会使用 20 多个断言来验证
我在一个变量中存储了一个值,在另一个变量中存储了第二个值,现在我想将这两个数字相加。我无法做到这一点,我尝试过下面的代码,但它不起作用 store 6 w sto
Selenium 中的回车键和回车键有什么区别? This related SO answer并且提供的链接说明它们是不同的。我还注意到,在使用 Firefox 24.2 时,回车键将发送一个 HTM
以下是我遇到异常的详细信息: 当我使用以下命令启动节点时,出现如下错误: F:\SeleniumGrid\Jars>java -jar selenium-server-standalone-3.0.0
我是 的新手 Selenium 我对版本号有点困惑。 Selenium 2.0 2011年发布。我刚刚下载了 Selenium IDE Firefox 扩展,版本为 1.7.2 .是否还有 IDE 的
我正在使用 Selenium 运行Codeception 2。我可以看到 Selenium 打开了浏览器并运行了测试。然后,我从代码接收中得到一个错误,即存在失败的断言。 我知道有一个HTML文件可以
Closed. This question needs to be more focused。它当前不接受答案。 想要改善这个问题吗?更新问题,使它仅关注editing this post的一个问题。
我想关闭弹出窗口(已知的窗口名称),然后返回到原始窗口。 我该怎么办? 如果我无法获得窗口中关闭按钮的常量。那么有没有达到目标的一般行为? 最佳答案 你有没有尝试过: selenium.Close()
我正在用webdriver做一个测试机器人。我有一个场景,它单击一个按钮,打开一个新窗口,并且它通过特定的xpath搜索元素,但是有时没有这样的元素,因为可以将其禁用,并且出现此错误:org.open
我是第一次使用Selenium,对这些选项不知所措。我在Firefox中使用IDE。 当我的页面加载时,它随后通过JSONP请求获取值,并在其中填充选择中的选项。 我如何让Selenium等待选择中的
我开始使用nightwatch.js编写e2e测试,我注意到我想在目标浏览器的控制台(开发人员工具)中手动检查一些错误。但总是在我打开开发者控制台时,浏览器会自动关闭它。这是selenium还是nig
我正在尝试使用以下方式刮除Glassdoor的评论: https://github.com/MatthewChatham/glassdoor-review-scraper 但是我得到了错误并且不知道如
背景 我设置了一个Selenium Grid项目,以在两种不同的浏览器Chrome和Firefox中执行测试。我正在使用Gradle执行测试。该测试将成功执行两次,一次按预期在Chrome中执行,一次
当测试失败时,运行 selenium 测试的浏览器将关闭。这在尝试调试时没有帮助。我知道我可以在失败时选择屏幕截图,但如果没有整个上下文,这并没有帮助。在浏览器仍然可用的情况下,我可以回击并检查发生了
使用 Selenium Web 驱动程序而不是 Selenium RC 启动新的测试框架是个好主意吗?对于 Selenium Web 驱动程序,并非所有 Selenium 方法都已实现。那么使用 Se
我使用 selenium 页面对象模型来定义所有页面元素。我对元素命名所遵循的命名约定不太相信,并且感觉太长了。请对此提出建议。 @FindBy(xpath = "//tbody[@id='tabvi
有一个带有按钮的 html 页面,我的 Selenium 测试正在测试,当单击按钮时,会执行一个操作。 问题是,看起来点击发生在 javascript 执行之前 - 在处理程序绑定(bind)到页面之
我是一名优秀的程序员,十分优秀!