gpt4 book ai didi

python Selenium 抓取 tbody

转载 作者:太空宇宙 更新时间:2023-11-03 16:21:24 32 4
gpt4 key购买 nike

下面是我正在尝试抓取的 HTML 代码

<div class="data-point-container section-break">
# some other HTML div classes here which I don't need
<table class data-bind="showHidden: isData">
<!-- ko foreach : sections -->
<thead>...</thead>
<tbody>...</tbody>
<thead>...</thead>
<tbody>...</tbody>
<thead>...</thead>
<tbody>...</tbody>
<thead>...</thead>
<tbody>...</tbody>
<thead>...</thead>
<tbody>...</tbody>
<!-- /ko -->
</table>
</div>

如何使用 Pandas.read_html 抓取所有这些信息,将 thead 作为 header ,将 tbody 作为值?

编辑:

这是我正在尝试抓取的网站,并将数据提取到 Pandas Dataframe 中。 Link here

最佳答案

严格来说,one should not have more than one thead element每个表根据 table 元素规范。

如果您仍然有这个 thead 后跟相应的 tbody 结构,我会迭代地解析它 - 每个这样的结构都有它自己的数据帧 .

工作示例:

import pandas as pd
from bs4 import BeautifulSoup

data = """
<div class="data-point-container section-break">
<table class data-bind="showHidden: isData">

<thead>
<tr><th>Customer</th><th>Order</th><th>Month</th></tr>
</thead>
<tbody>
<tr><td>Customer 1</td><td>#1</td><td>January</td></tr>
<tr><td>Customer 2</td><td>#2</td><td>April</td></tr>
<tr><td>Customer 3</td><td>#3</td><td>March</td></tr>
</tbody>

<thead>
<tr><th>Customer</th></tr>
</thead>
<tbody>
<tr><td>Customer 4</td></tr>
<tr><td>Customer 5</td></tr>
<tr><td>Customer 6</td></tr>
</tbody>

</table>
</div>
"""

soup = BeautifulSoup(data, "html.parser")
for thead in soup.select(".data-point-container table thead"):
tbody = thead.find_next_sibling("tbody")

table = "<table>%s</table>" % (str(thead) + str(tbody))

df = pd.read_html(str(table))[0]
print(df)
print("-----")

打印 2 个数据帧 - 一个对应于示例输入 HTML 中的每个 thead&tbody:

     Customer Order    Month
0 Customer 1 #1 January
1 Customer 2 #2 April
2 Customer 3 #3 March
-----
Customer
0 Customer 4
1 Customer 5
2 Customer 6
-----

请注意,出于演示目的,我故意使每个 block 中的标题和数据单元格的数量不同。

关于python Selenium 抓取 tbody,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38417462/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com