gpt4 book ai didi

python - 使用 BeautifulSoup 与基本表的选项 - 无类 ID,

转载 作者:行者123 更新时间:2023-12-01 04:19:13 26 4
gpt4 key购买 nike

当您的表没有类或属性值时,是否有在 python 中使用 BeautifulSoup 4 的推荐方法?

我正在考虑仅使用 Get_Text() 转储文本,但如果我想挑选单个值或将表分成更离散的部分,我该如何处理?

<table cellpadding="0" cellspacing="0" id="programmeDescriptor" width="100%">
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="1">
Awards
</th>
</tr>
<tr>
</tr>
<tr>
<td>
Ordinary Bachelor Degree
</td>
</tr>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Programme Code:
</th>
<td width="150">
CodeValue
</td>
</tr>
</table>
</td>
<td width="5">
</td>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Mode of Delivery:
</th>
<td width="150">
Full Time
</td>
</tr>
</table>
</td>
<td width="5">
</td>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
No. of Semesters:
</th>
<td width="150">
6
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
NFQ Level:
</th>
<td width="150">
7
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Embedded Award:
</th>
<td width="150">
No
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th width="160">
Department:
</th>
<td>
Computing
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<h3>
Programme Outcomes
</h3>
<p class="info">
On successful completion of this programme the learner will be able to :
</p>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th width="30">
PO1
</th>
<td class="head" colspan="2">
Knowledge - Breadth
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</tr>
<tr>
<th width="30">
PO2
</th>
<td class="head" colspan="2">
Knowledge - Kind
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO3
</th>
<td class="head" colspan="2">
Skill - Range
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO4
</th>
<td class="head" colspan="2">
Skill - Selectivity
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO5
</th>
<td class="head" colspan="2">
Competence - Context
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<tdSome block of text </td>
</tr>
<tr>
<th width="30">
PO6
</th>
<td class="head" colspan="2">
Competence - Role
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO7
</th>
<td class="head" colspan="2">
Competence - Learning to Learn
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO8
</th>
<td class="head" colspan="2">
Competence - Insight
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• The graduate will demonstrate the ability to specify, design and build an IT system or research &amp; report on a current IT topic
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<h3>
Semester Schedules
</h3>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 1 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td
<a href="index.cfm/page/module/moduleId/3897" target="_blank">
Web &amp; User Experience
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3881" target="_blank">
Software Development 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1645" target="_blank">
Computer Architecture
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2328" target="_blank">
Discrete Mathematics 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3848" target="_blank">
Business &amp; Information Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2054" target="_blank">
Learning to Learn at Third Level
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 1 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3886" target="_blank">
Software Development 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3895" target="_blank">
Object Oriented Systems Analysis
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3875" target="_blank">
Database Fundamentals
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3874" target="_blank">
Operating Systems Fundamentals
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2330" target="_blank">
Statistics
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2527" target="_blank">
Social Media Communications
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 2 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3877" target="_blank">
Web &amp; Mobile Design &amp; Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3876" target="_blank">
Database Design And Programming
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3869" target="_blank">
Software Development 3
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3873" target="_blank">
Software Quality Assurance and Testing
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3629" target="_blank">
Networking 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2477" target="_blank">
Discrete Mathematics 2
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 2 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3862" target="_blank">
Project
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3911" target="_blank">
Object Oriented Analysis &amp; Design 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3877" target="_blank">
Web &amp; Mobile Design &amp; Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3630" target="_blank">
Networking 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3870" target="_blank">
Software Development 4
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2476" target="_blank">
Management Science
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 3 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3911" target="_blank">
Object Oriented Analysis &amp; Design 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3899" target="_blank">
Operating Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1721" target="_blank">
Cloud Services &amp; Distributed Computing
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2580" target="_blank">
Innovation &amp; Entrepreneurship
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3878" target="_blank">
Web Application Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1689" target="_blank">
Algorithms and Data Structures 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2025" target="_blank">
Logic and Problem Solving
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3896" target="_blank">
Advanced Databases
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 3 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2465" target="_blank">
Project
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1728" target="_blank">
Algorithms and Data Structures 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1675" target="_blank">
Network Management
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2025" target="_blank">
Logic and Problem Solving
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3899" target="_blank">
Operating Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2580" target="_blank">
Innovation &amp; Entrepreneurship
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1679" target="_blank">
Object Oriented Analysis &amp; Design 2
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>

最佳答案

首先,表(所有表的父表)有一个 id 属性 - 让我们将其作为搜索的基础:

super_table = soup.find("table", id="programmeDescriptor")

然后,根据您在评论中提到的内容,看起来您可以通过标题来区分每个内部表。实现此逻辑的一种选择是找到 header ,然后使用 find_parent()查找父表:

def get_table_by_header_name(super_table, header):
return super_table.find("th", text=header).find_parent("table")

用法:

desired_table = get_table_by_header_name(super_table, "Awards")

关于python - 使用 BeautifulSoup 与基本表的选项 - 无类 ID,,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33919806/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com