gpt4 book ai didi

python - 将 pandas read_html 结果塑造成更简单的结构

转载 作者:太空宇宙 更新时间:2023-11-03 15:46:58 25 4
gpt4 key购买 nike

我希望有人能建议我如何创建仅包含第二列文本而不包含第一两行或左列文本的 pandas 数据框。该解决方案需要能够处理多个相似的表。

我原以为pd.read_html(LOTable.prettify(),skiprows=2,flavor='bs4')从html创建数据帧列表(跳过2行)将是但最终的数据结构对于这个新手来说太困惑了,无法理解或操作成更简单的结构。

其他人是否有一种处理结果结构的方法,或者推荐提炼数据的替代方法,以便我最终得到仅包含我需要的文本的一列?

示例表

<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>

最佳答案

第一个选项
使用iloc

这应该通过让iloc去掉第一列来实现`

pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4').iloc[:, 1:]

说明

...iloc[:, 1:]
# ^ ^
# | \
# says to says to take columns
# take all starting with one and on
# rows

您可以只使用单列

pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4').iloc[:, 1]
<小时/>

我运行的工作代码

htm = """<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table> """

pd.read_html(htm,skiprows=2, flavor='bs4')[0].iloc[:, 1:]

enter image description here

关于python - 将 pandas read_html 结果塑造成更简单的结构,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41663182/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com