gpt4 book ai didi

python - token 化后如何提取 http 或 https?

转载 作者:行者123 更新时间:2023-11-28 19:06:28 25 4
gpt4 key购买 nike

我有一个包含这样的文本的文本文件

>  because she s the worst 
i am referring to this http iimgurcom5srylmijpg does it have any deeper meaning or does it signify anything i just do nt get it why she d do that
cheating but zoldycks must have a great time at thanksgiving
kurosaki ichigo http images5fanpopcomimagephotos29000000ichigowallpaperkurosakiichigo290694271024768jpg and kurosaki mea http staticzerochannetkurosakimeafull1689483jpg
there are a shit ton of koutarous but the presence of one https smediacacheak0pinimgcomoriginals1219ed1219ed717fc2bfce372759bba2fe1cfegif is enough to make it the most interesting party.

我通过首先将多个空格转换为单个空格来提取标记,因为使用命令空格不统一:

words = re.sub('\s+', ' ', sentence).strip()

现在,我只想获取 http 或 https,因为可以看出文本中没有正确的 URL。

我尝试使用 (http|https)\s 但没有成功。

还有其他替代方案吗?

最佳答案

使用以下正则表达式查找 http 或 https:http(s)?(\s+) 查看工作 regex .

要同时获取组中的 http 或 https,请使用 (http(s)?(\s+)),如下所示 regex .

关于python - token 化后如何提取 http 或 https?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46137789/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com