gpt4 book ai didi

python - 格式化 BeautifulSoup 的输出

转载 作者:行者123 更新时间:2023-12-01 08:08:41 26 4
gpt4 key购买 nike

通过阅读 BeautifulSoup 文档,我设法编写了一个简短的 python 脚本来抓取表格并打印它,但是我不知道如何将其格式化为表格。最终目标是从网站获取足球比赛预测:https://afootballreport.com/predictions/over-1.5-goals/并将它们保存到文本文件中。

这是我到目前为止编写的代码:

import urllib
import urllib.request
from bs4 import BeautifulSoup

def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata

soup = make_soup("https://afootballreport.com/predictions/over-1.5-goals/")
for record in soup.findAll('tr'):

for data in record.findAll('td'):

print(data.text.strip())

这是输出:

03/28
17:30
Iceland Reykjavik Youth Cup


Fjölnir / Vængir U19
Valur / KH U19
Over 1.5
Valur / KH U19 have over 1.5 goals in 100% of their games in the last 2 months (total games 6).
03/28
17:30
Saudi Arabia Pro League


Al Ittifaq
Al Quadisiya
Over 1.5
Al Ittifaq have over 1.5 goals in 100% of their games in the last 2 months (total games 8).

我希望它每行都有一列:日期、时间、足球联赛、主队、客队、提示、描述。像这样:

Date, Time, Football League, HomeTeam, AwayTeam, Tip, Description
03/28, 17:30, Iceland Reykjavik Youth Cup, Fjölnir / Vængir U19, Valur / KH U19, Over 1.5, Valur / KH U19 have over 1.5 goals in 100% of their games in the last 2 months (total games 6).

有人可以帮助我吗?

最佳答案

你做了很多工作。每当我看到 <table>标签,我首先尝试 pandas' .read_html() 。它会为您完成大部分工作,然后您可以根据需要操作数据框。

import pandas as pd

tables = pd.read_html('https://afootballreport.com/predictions/over-1.5-goals/')
table = tables[0]

table[['Date', 'Time']] = table['Home team - Away team'].str.split(' ', expand=True)
table = table.drop(['Home team - Away team'],axis=1)
table = table.rename(columns={"Unnamed: 3":"Description"})


table[['Football League', 'Home Team', 'Away Team']] = table['Tip'].str.split(' ', expand=True)
table = table.drop(['Tip'],axis=1)

输出:

print (table.head(5).to_string())
Logic Description Date Time Football League Home Team Away Team
0 Over 1.5 Valur / KH U19 have over 1.5 goals in 100% of ... 03/28 17:30 Iceland Reykjavik Youth Cup Fjölnir / Vængir U19 Valur / KH U19
1 Over 1.5 Al Ittifaq have over 1.5 goals in 100% of thei... 03/28 17:30 Saudi Arabia Pro League Al Ittifaq Al Quadisiya
2 Over 1.5 Sarreguemines have over 1.5 goals in 100% of t... 03/28 19:00 France National 3 Sarreguemines Strasbourg II
3 Over 1.5 Mons Calpe have over 1.5 goals in 100% of thei... 03/28 19:29 Gibraltar Premier Division Mons Calpe Glacis United
4 Over 1.5 Glacis United have over 1.5 goals in 100% of t... 03/28 19:29 Gibraltar Premier Division Mons Calpe Glacis United

编辑:

如果您使用的是 Pandas 版本 0.24.2

import pandas as pd

tables = pd.read_html('https://afootballreport.com/predictions/over-1.5-goals/')
table = tables[0]

table[['Date', 'Time']] = table['Home team - Away team'].str.split(' ', expand=True)
table = table.drop(['Home team - Away team'],axis=1)
table = table.rename(columns={"Logic":"Description"})


table[['Football League', 'Home Team', 'Away Team']] = table['Home team - Away team.1'].str.split(' ', expand=True)
table = table.drop(['Home team - Away team.1'],axis=1)

关于python - 格式化 BeautifulSoup 的输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55402116/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com