gpt4 book ai didi

java - 用于在架构和权限之后提取 URL 中的路径的正则表达式

转载 作者:行者123 更新时间:2023-12-01 12:13:31 25 4
gpt4 key购买 nike

我正在尝试编写一个正则表达式来提取 URL 中架构和权限之后的所有内容。例如,如果我有

http://myHost:8080/Starter/docs/start.jsp

我需要 Java 中的正则表达式来获取“Starter/docs/start.jsp”。

预先感谢您的帮助!

最佳答案

URL 的实际官方标准,RFC 3986包括用于解析的示例正则表达式:

Appendix B. Parsing a URI Reference with a Regular Expression

As the "first-match-wins" algorithm is identical to the "greedy"disambiguation method used by POSIX regular expressions, it isnatural and commonplace to use a regular expression for parsing thepotential five components of a URI reference.

The following line is the regular expression for breaking-down awell-formed URI reference into its components.

 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9

The numbers in the second line above are only to assist readability;they indicate the reference points for each subexpression (i.e., eachpaired parenthesis). We refer to the value matched for subexpression as $. For example, matching the above expression to

 http://www.ics.uci.edu/pub/ietf/uri/#Related

results in the following subexpression matches:

 $1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related

where indicates that the component is not present, as isthe case for the query component in the above example. Therefore, wecan determine the value of the five components as

 scheme    = $2
authority = $4
path = $5
query = $7
fragment = $9

如果您正在寻找可以处理错误 URL 的模糊匹配,有许多开源 URI 解析器(至少对于 JavaScript,例如 parseuri ),您可以检查它们以了解它们的正则表达式如何工作。

关于java - 用于在架构和权限之后提取 URL 中的路径的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27136217/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com