gpt4 book ai didi

python - Beautiful Soup 或 Python 请求库未检测到某些标签

转载 作者:行者123 更新时间:2023-12-01 07:28:18 25 4
gpt4 key购买 nike

我有以下代码,它告诉我在被抓取的网页上 id='md_7_1' 的“table”标签内没有“tbody”标签:

from bs4 import BeautifulSoup
import requests
import re


url = "https://www.uefa.com/uefaeuro/season=2016/matches/all/index.html"

html = requests.request(method='GET', url=url).text
soup = BeautifulSoup(html, 'lxml')

matches_index = soup.body.find('div', id=re.compile('matchesindex')).find('div', class_='session').find('table', id='md_7_1')
tbody_tags = matches_index.find_all('tbody')
print(len(tbody_tags))

但是在浏览器中检查网页的 html 源代码显示存在“tbody”标签(请参见下面的快照)。不太明白为什么会发生这种情况。如何从“tbody”标签内部检索信息?

enter image description here

最佳答案

数据通过Ajax异步加载。但是您可以使用 requests 检索网站的片段(这里我只获取球队的分数和名称,但您可以从片段中选择更多信息):

import re
from bs4 import BeautifulSoup
import requests

url = 'https://www.uefa.com/uefaeuro/season=2016/matches/all/index.html'

data_url = 'https://www.uefa.com/{}/season={}/matches/library/fixtures/day={}/session={}/_matchesbydate.html'

soup = BeautifulSoup(requests.get(url).text, 'lxml')

cupfolder = re.findall(r"var cupfolder.*?'(.*?)'", str(soup))[0]
season = re.findall(r"var season.*?'(.*?)'", str(soup))[0]

for table in soup.select('table[id^="md_"]'):
_, day, session = table['id'].split('_')
s = BeautifulSoup(requests.get(data_url.format(cupfolder, season, day, session)).content, 'lxml')
h, a, score = s.select_one('td.home').text, s.select_one('td.away').text, s.select_one('td.score').text
match_url = s.select_one('a.sc')
print('{: <30}{: ^10}{: >30}'.format(h, score, a))
print('Match url = {}'.format('https://www.uefa.com' + match_url['href']))
print('-' * 70)

打印:

Portugal                         1-0                            France
Match url = https://www.uefa.com/uefaeuro/season=2016/matches/round=2000451/match=2017907/index.html
----------------------------------------------------------------------
Germany 0-2 France
Match url = https://www.uefa.com/uefaeuro/season=2016/matches/round=2000450/match=2017906/index.html
----------------------------------------------------------------------
Portugal 2-0 Wales
Match url = https://www.uefa.com/uefaeuro/season=2016/matches/round=2000450/match=2017905/index.html
----------------------------------------------------------------------
France 5-2 Iceland
Match url = https://www.uefa.com/uefaeuro/season=2016/matches/round=2000449/match=2017904/index.html
----------------------------------------------------------------------
Germany 1-1 Italy
Match url = https://www.uefa.com/uefaeuro/season=2016/matches/round=2000449/match=2017903/index.html
----------------------------------------------------------------------
Wales 3-1 Belgium
Match url = https://www.uefa.com/uefaeuro/season=2016/matches/round=2000449/match=2017902/index.html
----------------------------------------------------------------------

...and so on.

关于python - Beautiful Soup 或 Python 请求库未检测到某些标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57340792/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com