gpt4 book ai didi

python - Python 中的网页抓取动态内容

转载 作者:行者123 更新时间:2023-12-01 06:55:20 25 4
gpt4 key购买 nike

我正在尝试从此网址获取特定号码:'https://www.ulb.uni-muenster.de/ '通过网络抓取。该数字是动态的。不幸的是,当我搜索号码时,我只得到了类(class),但没有得到号码。当我在 Chrome 浏览器中检查 url 时,我可以在源代码中清楚地看到该数字。我有两种方法:

import seaborn as sns
from urllib.request import urlopen
from bs4 import BeautifulSoup

url = 'https://www.ulb.uni-muenster.de/'
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
tags = soup.find('span', {'class': 'seatingsCounter'})
print(tags)

输出:<span class="seatingsCounter"></span>

import requests
r = requests.get('https://www.ulb.uni-muenster.de/')
data = BeautifulSoup(r.content)
examples = []
for d in data.findAll('a'):
examples.append(d)
my_as = soup.findAll("span", { "class" : "seatingsCounter" })

输出:[<span class="seatingsCounter"></span>]

它们都不起作用,因为输出始终只是类。

最佳答案

如果您查看页面源代码,您将看到空闲位置的数量是由 JavaScript 函数 showMessage 更新的:

var showMessage = function(data) {
var locations = [ "ZB_LS", "ZB_RS" ];
var free = 0;
var total = 0;
var open = true;
$('.availableSeatings .spinner').remove();
$('.availableSeatings .error').data('counter', 0);
$.each(data.locations, function( key, value ) {
if ($.inArray( value.id, locations) !== -1)
{
free = free + Math.round((100 - value.quota) * value.places/100);
total = total + value.places;
open = open && value.open;
}
});

if (open)
{
$('.availableSeatings .message').show().siblings().hide();
quota = Math.round(free/total * 100);
result = free + '<span class="quota">(' + quota + '%)</span>';
date = $.format.date(data.datetime, "dd.MM.yyyy, HH:mm");
$('.availableSeatings .seatingsCounter').html(result); // <- HERE!!
$('.availableSeatings .updated .datetime').text(date);
$('.availableSeatings .updated').show();
} else {
$('.availableSeatings .closed').show().siblings().hide();
}
};

再往下看源代码,您将看到这一行:

$.ajax({
dataType: "json",
url: "/available-seatings.json", \\ <-- THIS LOOKS INTERESTING
timeout: 40000,
success: function(data) { showMessage(data); },
error: function() {
counter = $('.availableSeatings .error').data('counter');
if (isNaN(counter) || counter >= 3)
{
showError();
} else {
$('.availableSeatings .error').data('counter', counter + 1);
}
},
complete: function() {
setTimeout(worker, 60000);
}
});

如果我们转到 https://www.ulb.uni-muenster.de/available-seatings.json然后我们看到类似的内容:

{"datetime":"2019-11-13 13:49:46","locations":[{"id":"ZB_LS","label":"Zentralbibliothek Lesesaal","open":true,"quota":99,"places":678},{"id":"ZB_RS","label":"Zentralbibliothek Recherchesaal","open":true,"quota":94,"places":154},{"id":"VSTH","label":"Bibliothek im Vom-Stein-Haus","open":true,"quota":56,"places":145},{"id":"RWS1","label":"Bibliothek im Rechtswissenschaftlichen Seminar I \/ Einzelarbeitszone","open":true,"quota":98,"places":352},{"id":"RWS1_G","label":"Bibliothek im Rechtswissenschaftlichen Seminar I \/ Gruppenarbeitszone","open":true,"quota":30,"places":40},{"id":"RWS2","label":"Bibliothek im Rechtswissenschaftlichen Seminar II","open":true,"quota":54,"places":162},{"id":"WIWI","label":"Fachbereichsbibliothek Wirtschaftswissenschaften \/ Einzelarbeitszone","open":true,"quota":71,"places":132},{"id":"WIWI_G","label":"Fachbereichsbibliothek Wirtschaftswissenschaften \/ Gruppenarbeitszone","open":true,"quota":98,"places":45},{"id":"ZBSOZ","label":"Zweigbibliothek Sozialwissenschaften","open":true,"quota":74,"places":129},{"id":"FHAUS","label":"Gemeinschaftsbibliothek im F\u00fcrstenberghaus","open":true,"quota":68,"places":197},{"id":"IFE","label":"Bibliothek des Instituts f\u00fcr Erziehungswissenschaft","open":true,"quota":47,"places":183},{"id":"PHI","label":"Bibliotheken im Philosophikum (Domplatz 23)","open":true,"quota":68,"places":98}]}

瞧,添加 Python JSON 模块可能比重写使用 Selenium 更容易,尽管这也可以。

关于python - Python 中的网页抓取动态内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58837231/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com