gpt4 book ai didi

python - 解析站点中的表

转载 作者:太空宇宙 更新时间:2023-11-03 21:23:54 24 4
gpt4 key购买 nike

有一个网站https://ru.myip.ms/browse/market_bitcoin/%D0%91%D0%B8%D1%82%D0%BA%D0%BE%D0%B8%D0%BD_%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F_%D1%86%D0%B5%D0%BD.html#a ,下面是一个包含 BTC 价格的表格,我需要喜欢然后解析这个表格。我试图这样做,但由于某种原因,表中的价格不是显示为点

from time import sleep
import pandas as pd
import requests

host = 'ru.myip.ms'
index_url = 'https://ru.myip.ms'
home_url = "https://ru.myip.ms/browse/market_bitcoin/%D0%91%D0%B8%D1%82%D0%BA%D0%BE%D0%B8%D0%BD_%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F_%D1%86%D0%B5%D0%BD.html#a"
base_ajax_url = "https://ru.myip.ms/ajax_table/market_bitcoin/{page}"


with requests.Session() as session:
session.headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Host': host
}

# visit home page and parse the initial dataframe
response = session.get(home_url)

df = pd.read_html(response.text, attrs={"id": "market_bitcoin_tbl"})[0]
df = df.rename(columns=lambda x: x.strip()) # remove extra newlines from the column names

sleep(2)

# start paginating with page=2
page = 1
while True:
url = base_ajax_url.format(page=page)
print("Processing {url}...".format(url=url))

response = session.post(url,
data={'getpage': 'yes', 'lang': 'ru'},
headers={
'X-Requested-With': 'XMLHttpRequest',
'Origin': index_url,
'Referer': home_url
})

# add data to the existing dataframe
try:
new_df = pd.read_html("<table>{0}</table>".format(response.text))[0]
except ValueError: # could not extract data from HTML - last page?
break

new_df.columns = df.columns
df = pd.concat([df, new_df])

page += 1
sleep(1)


print(df)

最佳答案

你做得正确。你已经有了结果。尝试这样做以查看结果。

print(df['Bitcoin Price'])

你看到这些点,只是因为 df 很大,运行时无法显示所有内容,但它确实存在。

关于python - 解析站点中的表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53999112/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com