gpt4 book ai didi

Scrape sofascore with python for info on team lineups and votes(使用蟒蛇获取有关球队阵容和投票的信息)

转载 作者:bug小助手 更新时间:2023-10-28 11:51:10 29 4
gpt4 key购买 nike



I'm in search of some help. I would like to scrape quantitative information from SofaScore (https://www.sofascore.com/) about Serie A teams, specifically the starting lineups, the ratings assigned by the website, and possibly some more advanced statistics. However, my knowledge of HTML and web scraping is limited, and I'm struggling to extract this information from the site.

我在寻求一些帮助。我想从SofaScore(https://www.sofascore.com/))那里获取关于意甲球队的定量信息,特别是首发阵容、网站分配的评级,可能还有一些更高级的统计数据。然而,我对HTML和Web抓取的知识是有限的,我正在努力从网站中提取这些信息。


Currently, I'm attempting to extract this data for a single game, but I'm unsure how to generalize the code to collect information for all the rounds and teams.

目前,我正在尝试提取单场比赛的数据,但我不确定如何概括代码来收集所有回合和球队的信息。


Below is the code I've written so far, but it seems that the part with BeautifulSoup's find method is not targeting the correct section of the website.

以下是我到目前为止编写的代码,但BeautifulSoup的Find方法的部分似乎没有针对网站的正确部分。


import bs4
from bs4 import BeautifulSoup as bs
import requests
import webbrowser

link='https://www.sofascore.com/sassuolo-atalanta/LdbsTfb'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.691'

response=requests.get(link, headers={'user-agent':user_agent})
response.raise_for_status()

soup=bs(response.text, 'html.parser')
div_voti=soup.find('div', class_="sc-fqkvVR eeeBnr sc-d8bc48b6-2 cUcAWg")
print(div_voti)

I understand this might be a basic question, but I'm feeling a bit lost. Thank you to anyone who can provide assistance!

我知道这可能是一个基本的问题,但我感到有点迷茫。感谢所有能提供帮助的人!


更多回答
优秀答案推荐

The data you see on the page is loaded from external URL via Javascript (so beautifulsoup doesn't see it). To simulate these requests you can use this example:

你在页面上看到的数据是通过Java脚本从外部URL加载的(这样美丽的汤就看不到它了)。要模拟这些请求,您可以使用以下示例:


from itertools import zip_longest

import requests
from bs4 import BeautifulSoup

headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/117.0"
}

url = "https://www.sofascore.com/sassuolo-atalanta/LdbsTfb"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
id_ = soup.select_one('link[href*="android-app:"]')["href"].split("/")[-1]

lineups_url = f"https://api.sofascore.com/api/v1/event/{id_}/lineups"

# for goals, substitutions etc use this url:
# incidents_url = "https://api.sofascore.com/api/v1/event/11407341/incidents"

lineups = requests.get(lineups_url, headers=headers).json()
for h, a in zip_longest(lineups["home"]["players"], lineups["away"]["players"]):
if h:
h = h["player"]["name"] + f" ({h['player']['position']})"
else:
h = "-"

if a:
a = a["player"]["name"] + f" ({a['player']['position']})"
else:
a = "-"

print(f"{h:<50} {a:<50}")

Prints:

打印:


Andrea Consigli (G)                                Juan Musso (G)                                    
Jeremy Toljan (D) Berat Djimsiti (D)
Martin Erlić (D) Giorgio Scalvini (D)
Mattia Viti (D) Sead Kolašinac (D)
Matías Viña (D) Davide Zappacosta (M)
Matheus Henrique (M) Marten de Roon (M)
Maxime López (M) Teun Koopmeiners (M)
Grégoire Defrel (F) Mario Pašalić (F)
Nedim Bajrami (M) Matteo Ruggeri (M)
Armand Laurienté (F) Ademola Lookman (F)
Andrea Pinamonti (F) Duván Zapata (F)
Filippo Missori (D) Éderson (M)
Kristian Thorstvedt (M) Charles De Ketelaere (M)
Kevin Miranda (D) Gianluca Scamacca (F)
Cristian Volpato (M) Nadir Zortea (D)
Samuele Mulattieri (F) Michel Ndary Adopo (M)
Gianluca Pegolo (G) Francesco Rossi (G)
Alessio Cragno (G) Marco Carnesecchi (G)
Yeferson Paz (M) Rafael Tolói (D)
Luca Lipani (M) Caleb Okoli (D)
Daniel Boloca (M) Mitchel Bakker (M)
Emil Konradsen Ceide (F) Luis Muriel (F)

更多回答

thank you very much! I will now try to generalize it a bit over the rounds to see what happens.

非常感谢!我现在会试着在几个回合中推广一下,看看会发生什么。

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com