gpt4 book ai didi

python - 通过 pandas read_html 获取 HTML 表将不起作用

转载 作者:行者123 更新时间:2023-12-01 08:03:53 27 4
gpt4 key购买 nike

什么有效

我设法通过 pd.read_html 从 hmtl 表获取数据,如下所示:

In[1]:

import numpy as np
import pandas as pd
from tabulate import tabulate

URL = "https://coinmarketcap.com/all/views/all/"
df_in_list = pd.read_html(URL, attrs = {'id': 'currencies-all'})

# df_in_list has the df in element 0
df_raw = df_in_list[0]
df = df_in_list[0]

df = df[['#', 'Name', 'Symbol', 'Market Cap', 'Price' ]]

print(tabulate(df.head(), headers='keys', tablefmt='psql'))
Out[1]:

+----+-----+------------------+----------+-----------------+-----------+
| | # | Name | Symbol | Market Cap | Price |
|----+-----+------------------+----------+-----------------+-----------|
| 0 | 1 | BTC Bitcoin | BTC | $95,224,161,781 | $5398.69 |
| 1 | 2 | ETH Ethereum | ETH | $19,256,205,102 | $182.34 |
| 2 | 3 | XRP XRP | XRP | $15,031,762,618 | $0.359679 |
| 3 | 4 | LTC Litecoin | LTC | $5,530,275,811 | $90.24 |
| 4 | 5 | BCH Bitcoin Cash | BCH | $5,514,209,793 | $311.17 |
+----+-----+------------------+----------+-----------------+-----------+

通过 Chrome 开发工具找到 div id:

<table class="table floating-header summary-table 
js-summary-table dataTable no-footer"
id="currencies-all" <!-- this is what I need -->
style="font-size: 14px; width: 100%;" role="grid">

什么不起作用

现在尝试从不同的 URL 获取数据,但没有成功。网址是这样的:

https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20130428&end=20190410

表格位于此 div 中:

<div id="historical-data" class="tab-pane active">

我的代码是这样的:


In[2]:

import numpy as np
import pandas as pd
from tabulate import tabulate

URL = "https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20130428&end=20190410"
df_in_list = pd.read_html(URL, attrs = {'id': 'historical-data'})

# df_in_list has the df in element 0
df_raw = df_in_list[0]
df = df_in_list[0]

df = df[['#', 'Name', 'Symbol', 'Market Cap', 'Price' ]]

print(tabulate(df.head(), headers='keys', tablefmt='psql'))
Out[2]:

ValueError: No tables found

我错过了什么?

编辑

显然我感兴趣的这个div中没有​​table标签:

<div id="historical-data" class="tab-pane active">

这是错误的原因吗?

如果是这样,我还能如何获取该 div 内的数据?

编辑2

我知道 coinmarketcap.com 有一个 API,但我更喜欢从他们的网站获取数据。

最佳答案

是的,您的表错误。

如果将 df_in_list 更改为 df_in_list = pd.read_html(URL, attrs = {'class': 'table'}) 它应该可以工作。

您还必须更改 df = df[['#', 'Name', 'Symbol', 'Market Cap', 'Price' ]] 部分,因为这些列不在您正在抓取的新表中。

关于python - 通过 pandas read_html 获取 HTML 表将不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55620196/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com