gpt4 book ai didi

python - pd.read_html 更改了数字格式

转载 作者:行者123 更新时间:2023-12-04 07:24:59 29 4
gpt4 key购买 nike

无法从 1,2,3,4,5,6 列中获取 CCCCCCC ,将 pd.read_html 格式更改为 123456 后,我的 预期结果 应保留 1,2,3,4,5,6 HTML 代码

html = """<html>
<body>
<div id="MMMMMMMM" class="MMMMMMMMMMM" style="">
<table class="OOOOOOOO" style="">
<thead>
<tr class="PPPPPPPPPP">
<td colspan="3" style="font-size:14px;font-weight:bold;" class="QQQQQQQQQQ">AAAAAAA</td>
</tr>
<tr class="RRRRRRRRRR">
<td>BBBBBB</td>
<td>CCCCCCC</td>
<td>AAAAAAA</td>
</tr>
</thead>
<tbody>
<tr class="SSSSSSSS">
<td rowspan="1">DDDDDD</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td rowspan="3">EEEEEEEEE</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td rowspan="1">FFFFFFFFF</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="TTTTTT">
<td rowspan="1">GGGGGGGGG</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td rowspan="1">HHHHHHHHH</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="TTTTTTT">
<td rowspan="1">IIIIIIIIII</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td rowspan="1">JJJJJJJJ</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="TTTTT">
<td rowspan="2">KKKKKKKK</td>
<td class="L_LLLL67">1/2/3/4/5/6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="TTTTTT">
<td class="L_LLLL67">1/2/3/4/5/6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
</tbody>
</table>
</body>
</html>"""
Python 代码
from bs4 import BeautifulSoup
import pandas as pd

soup = BeautifulSoup(html,'html.parser')
table = soup.find('div', attrs={'id':'MMMMMMMM'})
df_list = pd.read_html(str(table), header=1)
df_list
执行结果
 [        BBBBBB      CCCCCCC  AAAAAAA
0 DDDDDD 123456 1234.56
1 EEEEEEEEE 123456 1234.56
2 EEEEEEEEE 123456 1234.56
3 EEEEEEEEE 123456 1234.56
4 FFFFFFFFF 123456 1234.56
5 GGGGGGGGG 123456 1234.56
6 HHHHHHHHH 123456 1234.56
7 IIIIIIIIII 123456 1234.56
8 JJJJJJJJ 123456 1234.56
9 KKKKKKKK 1/2/3/4/5/6 1234.56
10 KKKKKKKK 1/2/3/4/5/6 1234.56]
预期结果
 [        BBBBBB      CCCCCCC  AAAAAAA
0 DDDDDD 1,2,3,4,5,6 1234.56
1 EEEEEEEEE 1,2,3,4,5,6 1234.56
2 EEEEEEEEE 1,2,3,4,5,6 1234.56
3 EEEEEEEEE 1,2,3,4,5,6 1234.56
4 FFFFFFFFF 1,2,3,4,5,6 1234.56
5 GGGGGGGGG 1,2,3,4,5,6 1234.56
6 HHHHHHHHH 1,2,3,4,5,6 1234.56
7 IIIIIIIIII 1,2,3,4,5,6 1234.56
8 JJJJJJJJ 1,2,3,4,5,6 1234.56
9 KKKKKKKK 1/2/3/4/5/6 1234.56
10 KKKKKKKK 1/2/3/4/5/6 1234.56]

最佳答案

您需要添加 thousands参数并将其设置为 None默认是 ',' .

from bs4 import BeautifulSoup
import pandas as pd

soup = BeautifulSoup(html,'html.parser')
table = soup.find('div', attrs={'id':'MMMMMMMM'})
df_list = pd.read_html(str(table), header=1, thousands=None)
df_list
输出:
[        BBBBBB      CCCCCCC  AAAAAAA
0 DDDDDD 1,2,3,4,5,6 1234.56
1 EEEEEEEEE 1,2,3,4,5,6 1234.56
2 EEEEEEEEE 1,2,3,4,5,6 1234.56
3 EEEEEEEEE 1,2,3,4,5,6 1234.56
4 FFFFFFFFF 1,2,3,4,5,6 1234.56
5 GGGGGGGGG 1,2,3,4,5,6 1234.56
6 HHHHHHHHH 1,2,3,4,5,6 1234.56
7 IIIIIIIIII 1,2,3,4,5,6 1234.56
8 JJJJJJJJ 1,2,3,4,5,6 1234.56
9 KKKKKKKK 1/2/3/4/5/6 1234.56
10 KKKKKKKK 1/2/3/4/5/6 1234.56]

关于python - pd.read_html 更改了数字格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68264711/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com