gpt4 book ai didi

python - 如何将字符串拆分、匹配并输出特定的模式?

转载 作者:太空宇宙 更新时间:2023-11-04 00:00:24 24 4
gpt4 key购买 nike

我正在尝试解决我用 PHP 解决的问题,但不确定如何在 Python 中解决。

在下面三行中,我们喜欢根据这两种模式进行匹配:

  • 仅 vine.co 和 twitter.com URL(应忽略其他域)

  • 仅逗号 , 前的网址(应忽略每行中的最后一个网址)

输入

Row 1: https://vine.co/v/5W2Dg3XPX7a,https://vine.co/v/5W2Dg3XPX7a
Row 2: https://twitter.com/dog_rates/status/836677758902222849/photo/1,https://twitter.com/dog_rates/status/836677758902222849/photo/1
Row 3: https://www.gofundme.com/lolas-life-saving-surgery-funds,https://twitter.com/dog_rates/status/835264098648616962/photo/1,https://twitter.com/dog_rates/status/835264098648616962/photo/1

输出将是 Python 中的一个数组(此输出基于 PHP):

array(3) {
[0]=>
string(30) "https://vine.co/v/5W2Dg3XPX7a
"
[1]=>
string(64) "https://twitter.com/dog_rates/status/836677758902222849/photo/1
"
[2]=>
string(63) "https://twitter.com/dog_rates/status/835264098648616962/photo/1"
}

PHP代码:

$input = 'Row 1: https://vine.co/v/5W2Dg3XPX7a,https://vine.co/v/5W2Dg3XPX7a
Row 2: https://twitter.com/dog_rates/status/836677758902222849/photo/1,https://twitter.com/dog_rates/status/836677758902222849/photo/1
Row 3: https://www.gofundme.com/lolas-life-saving-surgery-funds,https://twitter.com/dog_rates/status/835264098648616962/photo/1,https://twitter.com/dog_rates/status/835264098648616962/photo/1';

$array = preg_split('/Row\s\d:\s/s', $input);

$output = array();
foreach ($array as $key => $value) {
if (strlen($value) > 1) {
$URL_arrays = explode(',', $value);
foreach ($URL_arrays as $key => $value) {
if ($key = sizeof($URL_arrays) - 1) {
unset($URL_arrays[sizeof($URL_arrays) - 1]);
} else {
$match = preg_match('/twitter\.com|vine\.co/s', $value);
if ($match) {
array_push($output, $value);
}
}
}
}
}

var_dump($output);

本题基于this RegEx problem ,您可以回答其中任何一个。

最佳答案

您可以使用此正则表达式来捕获所有具有 vine.comtwitter.com 域的 URL,这些 URL 后面有一个逗号,

https:\/\/(?:www\.)?(?:vine\.co|twitter\.com)[^,\s]*(?=,)

如您所愿,关键点是这种积极的前瞻性 (?=,),它确保您的 URL 后面紧跟着一个逗号。

Regex Demo

使用 re.findall 提取 URL 的 Python 代码

import re

s = '''Row 1: https://vine.co/v/5W2Dg3XPX7a,https://vine.co/v/5W2Dg3XPX7a
Row 2: https://twitter.com/dog_rates/status/836677758902222849/photo/1,https://twitter.com/dog_rates/status/836677758902222849/photo/1
Row 3: https://www.gofundme.com/lolas-life-saving-surgery-funds,https://twitter.com/dog_rates/status/835264098648616962/photo/1,https://twitter.com/dog_rates/status/835264098648616962/photo/1'''

print(re.findall(r'https:\/\/(?:www\.)?(?:vine\.co|twitter\.com)[^,\s]*(?=,)', s))

输出,

['https://vine.co/v/5W2Dg3XPX7a', 'https://twitter.com/dog_rates/status/836677758902222849/photo/1', 'https://twitter.com/dog_rates/status/835264098648616962/photo/1']

关于python - 如何将字符串拆分、匹配并输出特定的模式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55887644/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com