gpt4 book ai didi

java - 匹配特定 url 的正则表达式模式

转载 作者:行者123 更新时间:2023-12-01 11:36:45 24 4
gpt4 key购买 nike

我有一个很大的文本,我只想使用其中的某些信息。文本如下所示:

Some random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_1_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_2_av.m3u8
More random text here
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_3_av.m3u8

我只想要 http 文本。文中有好几个,但我只需要其中之一。正则表达式应该是“以http开头,以.m3u8结尾”。

我查看了所有不同表达的词汇表,但它让我感到非常困惑。我尝试了 "/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{12,30})([\/\w\.-]*)*\/?$/" 作为我的模式。但这就足够了吗?

感谢所有帮助。谢谢。

最佳答案

假设您的示例中的文本在每一行表示中都是行分隔的,下面是一个可行的代码片段:

String text = 
"Some random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
"More random text here" +
System.getProperty("line.separator") +
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8" +
System.getProperty("line.separator") +
// removed some for brevity
"More random text here" +
System.getProperty("line.separator") +
// added counter-example ending with "NOPE"
"http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.NOPE";

// Multi-line pattern:
// ┌ line starts with http
// | ┌ any 1+ character reluctantly quantified
// | | ┌ dot escape
// | | | ┌ ending text
// | | | | ┌ end of line marker
// | | | | |
Pattern p = Pattern.compile("^http.+?\\.m3u8$", Pattern.MULTILINE);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
}

输出

http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8
http://xxx-f.xxx.net/i/xx/open/xxxx/1370235-005A/EPISOD-1370235-005A-xxx_,892,144,252,360,540,1584,xxxx,.mp4.csmil/index_0_av.m3u8

编辑

要通过 URL 的 “index_x” 文件进行精细“过滤”,您只需将其添加到协议(protocol)和行尾之间的 Pattern 中即可,例如:

Pattern.compile("^http.+?index_0.+?\\.m3u8$", Pattern.MULTILINE);

关于java - 匹配特定 url 的正则表达式模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29895620/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com