gpt4 book ai didi

python - 用时间戳分割字符串

转载 作者:太空宇宙 更新时间:2023-11-03 18:25:17 25 4
gpt4 key购买 nike

你好,我已经从网上抓取了信息并将其标准化以删除所有 html 等,留下一个包含的字符串

foo XX:XX +XX:XX bar XX:XX +X:XX bar2 XX:XX +X:XX bar3 XX:XX bar4 XX:XX bar5

哪里foo不按时间戳进行,保留或删除 foo无论哪种方式都可以,因为它总是作为第一个重复 bar .

我希望在 XX:XX 上进行拆分但 +XX:XX ,每个bar前面可以是 XX:XX +XX:XX或者只是 XX:XX

我还希望在分割时保留时间戳,这样我就可以得到一个字符串列表,例如:

XX:XX +XX:XX bar
XX:XX +XX:XX bar2
.....
XX:XX bar5

为了帮助理解这一点,我们基于来自 BBC 网站的足球比赛 html 评论,例如 http://www.bbc.co.uk/sport/0/football/27092972

我试图作为起点的正则表达式是

(?(name)\d+:\d\d|\+\d+:\d\d)

考虑到它无法编译,这是错误的,它的形式是:

(?(id/name)yes-pattern|no-pattern)

yes 模式在哪里

\d+:\d\d (1 or more digits, colon, 2 digits)

没有模式是

+\d+:\d\d (same as yes pattern, but with a + sign proceeding)

我将使用re.split(expression)

有关更多信息,我计划将时间戳转换为秒后,因此我将添加 XX:XX+XX:XXYY:YY稍后。

这是我的程序当前拥有的示例字符串

Full Time Match ends, Everton 3, Swansea City 1. 90:00 +4:09 Full time Full Time Second Half ends, Everton 3, Swansea City 1. 90:00 +2:47 Attempt blocked. Nathan Dyer (Swansea City) right footed shot from the centre of the box is blocked. Assisted by Pablo Hernández. 90:00 +0:18 Offside, Swansea City. Leroy Lita tries a through ball, but Ashley Williams is caught offside. 89:31 Corner, Swansea City. Conceded by Leighton Baines. 88:42 Foul by James McCarthy (Everton). 

所以我希望得到一个列表

Full Time Match ends, Everton 3, Swansea City 1.
90:00 +4:09 Full time Full Time Second Half ends, Everton 3, Swansea City 1.
90:00 +2:47 Attempt blocked. Nathan Dyer (Swansea City) right footed shot from the centre of the box is blocked. Assisted by Pablo Hernández.
90:00 +0:18 Offside, Swansea City. Leroy Lita tries a through ball, but Ashley Williams is caught offside.
89:31 Corner, Swansea City. Conceded by Leighton Baines.
88:42 Foul by James McCarthy (Everton).

最佳答案

您可以在此处使用正向预测

results = re.split(r'\s+(?=\d+:\d{2})', s)

正则表达式:

\s+           # whitespace (\n, \r, \t, \f, and " ") (1 or more times)
(?= # look ahead to see if there is:
\d+ # digits (0-9) (1 or more times)
: # ':'
\d{2} # digits (0-9) (2 times)
) # end of look-ahead

输出

[
'Full Time Match ends, Everton 3, Swansea City 1.',
'90:00 +4:09 Full time Full Time Second Half ends, Everton 3, Swansea City 1.',
'90:00 +2:47 Attempt blocked. Nathan Dyer (Swansea City) right footed shot from the centre of the box is blocked. Assisted by Pablo Hern\xc3\x83\xc2\xa1ndez.',
'90:00 +0:18 Offside, Swansea City. Leroy Lita tries a through ball, but Ashley Williams is caught offside.',
'89:31 Corner, Swansea City. Conceded by Leighton Baines.',
'88:42 Foul by James McCarthy (Everton). '
]

关于python - 用时间戳分割字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23324751/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com