gpt4 book ai didi

python - 如何从 Coinmarketcap 解析 BTC 历史数据?

转载 作者:太空宇宙 更新时间:2023-11-04 08:24:46 26 4
gpt4 key购买 nike

我正在尝试学习如何使用 Python、requests 和 BeautifulSoup 从 Coinmarketcap.com 网络抓取 BTC 历史数据。

我想解析以下内容:

1)日期

2)关闭

3)体积

4)市值

到目前为止,这是我的代码:

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent

ua = UserAgent()
header = {'user-agent': ua.chrome}
response = requests.get('https://coinmarketcap.com/currencies/bitcoin/historical-data/', headers=header)

# html.parser
soup = BeautifulSoup(response.content,'lxml')

tags = soup.find_all('td')
print(tags)

我能够抓取我需要的数据,但我不确定如何正确解析它。我宁愿让日期尽可能往前(“所有时间”)。任何建议将不胜感激。提前致谢!

最佳答案

编辑

CoinMarketCap 似乎更改了他们的 DOM,所以这里有一个更新:

import lxml.html
import requests
from typing import Dict, List


def coinmarketcap_get_btc(start_date: str, end_date: str) -> List[Dict]:
# Build the url
url = f'https://coinmarketcap.com/currencies/bitcoin/historical-data/?start={start_date}&end={end_date}'
# Make the request and parse the tree
response = requests.get(url, timeout=5)
tree = lxml.html.fromstring(response.text)
# Extract table and raw data
table = tree.find_class('cmc-table')[0]
xpath_0, xpath_1 = 'div[3]/div/table/thead/tr', 'div[3]/div/table/tbody/tr/td[%d]/div'
cols = [_.text_content() for _ in table.xpath(xpath_0 + '/th')]
dates = (_.text_content() for _ in table.xpath(xpath_1 % 1))
m = map(lambda d: (float(_.text_content().replace(',', '')) for _ in table.xpath(xpath_1 % d)),
range(2, 8))
return [{k: v for k, v in zip(cols, _)} for _ in zip(dates, *m)]

获取 df 就像使用 pd.DataFrame.from_dict 一样简单。


原创

您可以为此请求lxml:

这是一个函数coinmarketcap_get_btc,它将开始和结束日期作为参数并收集相关数据

import lxml.html
import pandas
import requests


def float_helper(string):
try:
return float(string)
except ValueError:
return None


def coinmarketcap_get_btc(start_date: str, end_date: str) -> pandas.DataFrame:
# Build the url
url = f'https://coinmarketcap.com/currencies/bitcoin/historical-data/?start={start_date}&end={end_date}'
# Make the request and parse the tree
response = requests.get(url, timeout=5)
tree = lxml.html.fromstring(response.text)
# Extract table and raw data
table = tree.find_class('table-responsive')[0]
raw_data = [_.text_content() for _ in table.find_class('text-right')]
# Process the data
col_names = ['Date'] + raw_data[:6]
row_list = []
for x in raw_data[6:]:
_, date, _open, _high, _low, _close, _vol, _m_cap, _ = x.replace(',', '').split('\n')
row_list.append([date, float_helper(_open), float_helper(_high), float_helper(_low),
float_helper(_close), float_helper(_vol), float_helper(_m_cap)])
return pandas.DataFrame(data=row_list, columns=col_names)

您始终可以忽略不感兴趣的列并添加更多功能(例如接受 datetime.datetime 对象作为日期)。

注意,用于构建 URL 的 f-string 至少需要 Python 3.x 版本(我相信是 3.6),所以如果您使用的是旧版本版本仅使用 'string{var}.format(var=var)''string%s' % var 符号之一。

示例

df = coinmarketcap_get_btc(start_date='20130428', end_date='20191020')
df
# Date Open* High Low Close** Volume Market Cap
# 0 Oct 19 2019 7973.80 8082.63 7944.78 7988.56 1.379783e+10 1.438082e+11
# 1 Oct 18 2019 8100.93 8138.41 7902.16 7973.21 1.565159e+10 1.435176e+11
# 2 Oct 17 2019 8047.81 8134.83 8000.94 8103.91 1.431305e+10 1.458540e+11
# 3 Oct 16 2019 8204.67 8216.81 7985.09 8047.53 1.607165e+10 1.448240e+11
# 4 Oct 15 2019 8373.46 8410.71 8182.71 8205.37 1.522041e+10 1.476501e+11
# ... ... ... ... ... ... ... ...
# 2361 May 02 2013 116.38 125.60 92.28 105.21 NaN 1.168517e+09
# 2362 May 01 2013 139.00 139.89 107.72 116.99 NaN 1.298955e+09
# 2363 Apr 30 2013 144.00 146.93 134.05 139.00 NaN 1.542813e+09
# 2364 Apr 29 2013 134.44 147.49 134.00 144.54 NaN 1.603769e+09
# 2365 Apr 28 2013 135.30 135.98 132.10 134.21 NaN 1.488567e+09
#
# [2366 rows x 7 columns]

关于python - 如何从 Coinmarketcap 解析 BTC 历史数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58471421/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com