gpt4 book ai didi

javascript - 如何使用 BeautifulSoup 读取定期生成的 innerHTML 元素?

转载 作者:行者123 更新时间:2023-11-28 03:19:09 26 4
gpt4 key购买 nike

intraday.pro有一个在线状态,在特定时间段后会重复更新。该元素是在 javascript insideHTML 代码中动态生成的。

我使用浏览器的 Inspect Element 检查了 html 代码,这是代码:

<div id="is_online">
<font color="green">Online</font>
</div>

我使用下面的代码,但它返回 None 并且找不到在线状态。

from bs4 import BeautifulSoup
import requests

r = requests.get("http://intraday.pro/")
soup = BeautifulSoup(r.text, 'html.parser')

is_online = True
while is_online:
items = soup.find_all("div", {"id": "is_online"})[0].decode_contents()
if items:
print(items)
is_online = False

我还使用过:

items = soup.find_all("font")
for item in items:
print(item.get_text())

但是我再也找不到在线状态了。

这也是生成在线状态的 javascript 代码:

<script type="text/javascript">

var errtime = 0;
var ftime = 1;
var lastPair = '';

function subscribe(url) {

var xhr = new XMLHttpRequest();

if(ftime == 1)
xhr.open('GET', '/script/table.php?ft=1', true);
else
xhr.open('GET', '/script/table.php', true);

xhr.send();
xhr.onreadystatechange = function()
{
if (xhr.readyState != 4) return;

var isonline = document.getElementById('is_online');

if (xhr.status != 200) {
errtime += 1;
if(errtime < 3)
{
setTimeout( subscribe('/script/table.php') , 30000);
} else {
// offline
isonline.innerHTML = "<font color='red'><b>Offline</b>. Please refresh this page after few minutes</font>";
}
} else {
// online
isonline.innerHTML = "<font color='green'>online</font>";

var result = JSON.parse(xhr.responseText);

var stat24h = document.getElementById('stat24h');
stat24h.innerHTML = result.stat;

var table1 = result.table;

var last1 = result.last;
var tsumm = 0;
for(var i=3;i<21;i++)
{
for(var j=1;j<14;j++)
{
tsumm = 100*i + j;

var test = document.getElementById(i+"_"+j);

if(table1[tsumm] != null && test)
{
test.innerHTML = table1[tsumm];
} else {
if(test)
test.innerHTML = " ";
}
}
}

errtime = 0;
ftime = 2;
subscribe('/script/table.php');

if(lastPair != last1 && lastPair != "")
{
lastPair = last1;
soundClick();
} else {
lastPair = last1;
}
}
}
}

function soundClick() {
var audio = new Audio();
audio.src = '/libs/sounds/sound1.mp3';
audio.autoplay = true;
}

</script>

BeautifulSoup 中是否有任何解决方案能够在 javascript 生成 html 元素时获取该元素?

_谢谢

最佳答案

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
import time

options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(options=options)
driver.get('http://intraday.pro/')
time.sleep(3)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
status = soup.find('div', {'id': 'is_online'})
print(status.text)

driver.quit()

输出:

online

关于javascript - 如何使用 BeautifulSoup 读取定期生成的 innerHTML 元素?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59338432/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com