gpt4 book ai didi

Python 循环和网页抓取 |美汤

转载 作者:行者123 更新时间:2023-11-28 00:40:56 26 4
gpt4 key购买 nike

当前正在尝试循环以下网络抓取...

我目前的问题是我只能从表格中获取第一个足球运动员(我在下面有表格 HTML) 而不是完整的 10 名球员,我的直接想法是循环不是'不工作,我不确定我哪里出错了。我正在使用 BeautifulSoup 方法收集数据。

TD;DR 我的错误是我的 CSV 文件中只出现了 1 个播放器,而不是 HTML 中可用的 10 个播放器

Python 代码

 from urllib.request import urlopen as uReq
from urllib.request import Request
from bs4 import BeautifulSoup as soup

my_url = "https://www.fctables.com/teams/stoke-194901/"

#opening up connection , grabbing page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

topScorers = page_soup.findAll("table",{"class":"table table-striped table-bordered table-hover stage-table table-condensed top_scores"})


filename = "stokeGoals.csv"
f = open(filename, "w")

headers = "player, goal_scored, average_goal"

f.write(headers)

for topScorer in topScorers:
#top 10 players who scored
player = topScorer.a["title"]

#top 10 goalscorers for the team
goalpp = topScorer.findAll("div", {"class": "progress"})

#average goal per game
avg = topScorer.findAll("div", {"class": "label label-primary"})
avgpp = avg[0].text.strip()


print("player: " + player)
print("goal_scored: " + goalpp)
print("AVG: "+ avgpp)

f.write(player + "," +goalpp.replace("," , "|")+ "," + avgpp +"\n")

f.close()

HTML 代码 我从中抓取数据的表格/网站

 <table class="table table-striped table-bordered table-hover stage-table table-condensed top_scores">
<thead>
<tr>
<th>#</th>
<th class="tl">Player</th>
<th data-toggle="tooltip" title="Goals scores by player / Goals scores by his team">goals</th>
<th data-toggle="tooltip" title="Average goals">
Avg
</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td class="tl psh" data-id="212996">
<img alt="Benik Afobe" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/s4/s4glg58a2350823d58/benik-afobe.png" width="20" /> <a href="/players/benik_afobe-212996/" title="Benik Afobe">Afobe</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 100%;">
<span class="goal_p">6</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.4</div>
</td>
</tr>
<tr>
<td>2</td>
<td class="tl psh" data-id="320050">
<img alt="Thomas Ince" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/t5/t5ni157c703a92110b/thomas-ince.jpg" width="20" /> <a href="/players/thomas_ince-320050/" title="Thomas Ince">Ince</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 83.333333333333%;">
<span class="goal_p">5</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.6</div>
</td>
</tr>
<tr>
<td>3</td>
<td class="tl psh" data-id="308648">
<img alt="Saido Berahino" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/po/poyhu58a234e0da106/saido-berahino.png" width="20" /> <a href="/players/saido_berahino-308648/" title="Saido Berahino">Berahino</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 66.666666666667%;">
<span class="goal_p">4</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.3</div>
</td>
</tr>
<tr>
<td>4</td>
<td class="tl psh" data-id="257340">
<img alt="Joe Allen" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/6w/6w45558a234deae78e/joe-allen.png" width="20" /> <a href="/players/joe_allen-257340/" title="Joe Allen">Allen</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 50%;">
<span class="goal_p">3</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.4</div>
</td>
</tr>
<tr>
<td>5</td>
<td class="tl psh" data-id="234407">
<img alt="Erik Pieters" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/et/et08558a234dd63b68/erik-pieters.png" width="20" /> <a href="/players/erik_pieters-234407/" title="Erik Pieters">Pieters</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 50%;">
<span class="goal_p">3</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.4</div>
</td>
</tr>
<tr>
<td>6</td>
<td class="tl psh" data-id="299368">
<img alt="Peter Crouch" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/qp/qptn558a234df86f1f/peter-crouch.png" width="20" /> <a href="/players/peter_crouch-299368/" title="Peter Crouch">Crouch</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 33.333333333333%;">
<span class="goal_p">2</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.3</div>
</td>
</tr>
<tr>
<td>7</td>
<td class="tl psh" data-id="214479">
<img alt="Bojan Krkic" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/pl/pleyv57eaedf0afeac/bojan-krkic.jpg" width="20" /> <a href="/players/bojan_krkic-214479/" title="Bojan Krkic">Krkic</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 33.333333333333%;">
<span class="goal_p">2</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.4</div>
</td>
</tr>
<tr>
<td>8</td>
<td class="tl psh" data-id="253114">
<img alt="James McClean" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/gb/gbjmm58a234f55a560/james-mcclean.png" width="20" /> <a href="/players/james_mcclean-253114/" title="James McClean">McClean</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 16.666666666667%;">
<span class="goal_p">1</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.1</div>
</td>
</tr>
<tr>
<td>9</td>
<td class="tl psh" data-id="309022">
<img alt="Sam Clucas" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/g7/g7dig58a234cb144a3/sam-clucas.png" width="20" /> <a href="/players/sam_clucas-309022/" title="Sam Clucas">Clucas</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 16.666666666667%;">
<span class="goal_p">1</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.3</div>
</td>
</tr>
<tr>
<td>10</td>
<td class="tl psh" data-id="215724">
<img alt="Bruno Martins Indi" class="img-circle" height="20" src="https://static.fctables.com/upload/images/20x20/hk/hkung58a234de0dfaa/bruno-martins-indi.png" width="20" /> <a href="/players/bruno_martins_indi-215724/" title="Bruno Martins Indi">Indi</a>
<div class="slider">
<div class="inner"></div>
</div>
</td>
<td width="30%">
<div class="progress">
<div aria-valuemax="100" aria-valuemin="0" aria-valuenow="55" class="progress-bar progress-bar-primary" role="progressbar" style="width: 16.666666666667%;">
<span class="goal_p">1</span>
</div>
</div>
</td>
<td>
<div class="label label-primary">0.2</div>
</td>
</tr>
</tbody>

最佳答案

您指定的网页,通过XMLHttpRequest加载数据

您可以直接从以下位置获取 html:

https://www.fctables.com/xml/table_participant/?template_id=&season_id=52%2C38%2C88&type_home=overall&type=top_score&lang_id=2&team_id=194901&limit=10

通过上面的url,你可以在没有额外的html noise的情况下得到你需要的所有信息,即:

my_url = "https://www.fctables.com/xml/table_participant/?template_id=&season_id=52%2C38%2C88&type_home=overall&type=top_score&lang_id=2&team_id=194901&limit=10"

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

_names = page_soup.findAll("img",{"class":"img-circle"})
_goals = page_soup.findAll("span",{"class":"goal_p"})
_avg = page_soup.findAll("div",{"class":"label label-primary"})
x = 0
for name in _names:
name = name['alt']
avg = _avg[x].get_text()
goals = _goals[x].get_text()
print(name, avg, goals)
x+=1

Benik Afobe 0.4 6
Thomas Ince 0.6 5
Saido Berahino 0.3 4
Joe Allen 0.4 3
Erik Pieters 0.4 3
Peter Crouch 0.3 2
Bojan Krkic 0.4 2
James McClean 0.1 1
Sam Clucas 0.3 1
Bruno Martins Indi 0.2 1

注意:

根据需要调整url值,可以更改top_scoretypeteam_id限制等...

关于Python 循环和网页抓取 |美汤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53598517/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com