gpt4 book ai didi

python - 使用 python3 从 html 源中获取列表

转载 作者:行者123 更新时间:2023-12-04 10:14:07 25 4
gpt4 key购买 nike

我正在尝试获取 Cases来自 https://www.worldometers.info/ 的 COVID-19 阳性病例列表,例如this
示例看起来像(~line no: 700) :

<script type="text/javascript">
Highcharts.chart('coronavirus-cases-linear', {
chart: {
type: 'line'
},
title: {
text: 'Total Cases'
},

subtitle: {
text: '(Linear Scale)'
},

xAxis: {
categories: ["Feb 15","Feb 16","Feb 17","Feb 18","Feb 19","Feb 20","Feb 21","Feb 22","Feb 23","Feb 24","Feb 25","Feb 26","Feb 27","Feb 28","Feb 29","Mar 01","Mar 02","Mar 03","Mar 04","Mar 05","Mar 06","Mar 07","Mar 08","Mar 09","Mar 10","Mar 11","Mar 12","Mar 13","Mar 14","Mar 15","Mar 16","Mar 17","Mar 18","Mar 19","Mar 20","Mar 21","Mar 22","Mar 23","Mar 24","Mar 25","Mar 26","Mar 27","Mar 28","Mar 29","Mar 30","Mar 31","Apr 01","Apr 02","Apr 03","Apr 04","Apr 05","Apr 06","Apr 07","Apr 08","Apr 09","Apr 10","Apr 11"] },

yAxis: {
title: {
text: 'Total Coronavirus Cases'
}


},
legend: {
layout: 'vertical',
align: 'right',
verticalAlign: 'middle'
},

credits: {
enabled: false
},


series: [{
name: 'Cases',
color: '#33CCFF',
lineWidth: 5,
## I NEED THIS LIST
data: [2,2,2,2,2,2,2,2,2,3,9,13,25,33,58,84,120,165,228,282,401,525,674,1231,1695,2277,3146,5232,6391,7988,9942,11826,14769,18077,21571,25496,28768,35136,42058,49515,57786,65719,73235,80110,87956,95923,104118,112065,119199,126168,131646,136675,141942,148220,153222,158273,163027] }],
responsive: {
rules: [{
condition: {
maxWidth: 800
},
chartOptions: {
legend: {
layout: 'horizontal',
align: 'center',
verticalAlign: 'bottom'
}
}
}]
}

});
我使用 bs4 作为:
#!/usr/bin/env python3
import requests as req
from bs4 import BeautifulSoup as bs

resp = req.get("https://www.worldometers.info/coronavirus/country/spain/")
soup = bs(resp.text, 'lxml')
scripts = soup.find_all("script")
for script in scripts:
if "Cases" in script.series:
print(script.name)
这是 文件,但在那之后我不知道如何获取数据。
我要查找的列表评论为 ## I NEED THIS LIST .请帮忙。

最佳答案

你可以为它写一个正则表达式

import requests as req
import re

resp = req.get("https://www.worldometers.info/coronavirus/country/spain/")
p = re.compile(r"(?<=name:\s'Cases')[\s\S]+?data:\s(\[.*?\])")
p.findall(resp.text)[0]

enter image description here

关于python - 使用 python3 从 html 源中获取列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61167275/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com