gpt4 book ai didi

java正则表达式从文本中检索链接

转载 作者:行者123 更新时间:2023-11-30 01:59:17 24 4
gpt4 key购买 nike

我的输入String为:

String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";

我想将此文本转换为:

Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc&param2=xyz&myParam=pqr (URL Label) and some text after it

所以这里:

1) 我想用普通链接替换链接标签。如果标签包含标签,那么它应该放在 URL 后面的大括号中。

2) 如果 URL 是相对的,我想为基本 URL 添加前缀 ( http://www.google.com )。

3) 我想在 URL 中附加一个参数。 (&myParam=pqr)

我在检索包含 URL 和标签的标签并替换它时遇到问题。

我写了这样的内容:

public static void main(String[] args) {
String text = "String text = "Some content which contains link as <A HREF=\"/relative-path/fruit.cgi?param1=abc&param2=xyz\">URL Label</A> and some text after it";";
text = text.replaceAll("&lt;", "<");
text = text.replaceAll("&gt;", ">");
text = text.replaceAll("&amp;", "&");

// this is not working
Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(text);
String url = null;
if (m.find()) {
url = m.group(1);

}
}

// helper method to append new query params once I have the url
public static URI appendQueryParams(String uriToUpdate, String queryParamsToAppend) throws URISyntaxException {
URI oldUri = new URI(uriToUpdate);
String newQueryParams = oldUri.getQuery();
if (newQueryParams == null) {
newQueryParams = queryParamsToAppend;
} else {
newQueryParams += "&" + queryParamsToAppend;
}
URI newUri = new URI(oldUri.getScheme(), oldUri.getAuthority(),
oldUri.getPath(), newQueryParams, oldUri.getFragment());
return newUri;
}

编辑1:

Pattern p = Pattern.compile("HREF=\"(.*?)\"");

这有效。但我希望它与大小写无关。 Href、HRef、href、hrEF 等都应该可以工作。

此外,如果我的文本有多个 URL,我该如何处理。

编辑2:

一些进展。

Pattern p = Pattern.compile("href=\"(.*?)\"");
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1);
System.out.println(url);
}

这可以处理多个 URL 的情况。

最后一个悬而未决的问题是,如何获取标签并将原始文本中的 href 标签替换为 URL 和标签。

编辑3:

通过多个 URL 情况,我的意思是给定文本中存在多个 URL。

String text = "Some content which contains link as &lt;A HREF=\"/relative-path/fruit.cgi?param1=abc&amp;param2=xyz\"&gt;URL Label&lt;/A&gt; and some text after it and another link &lt;A HREF=\"/relative-path/vegetables.cgi?param1=abc&amp;param2=xyz\"&gt;URL2 Label&lt;/A&gt; and some more text";

Pattern p = Pattern.compile("href=\"(.*?)\"", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
String url = null;
while (m.find()) {
url = m.group(1); // this variable should contain the link URL
url = appendBaseURI(url);
url = appendQueryParams(url, "license=ABCXYZ");
System.out.println(url);
}

最佳答案

public static void main(String args[]) {
String text = "Some content which contains link as &lt;A HREF=\"/relative-path/fruit.cgi?param1=abc&amp;param2=xyz\"&gt;URL Label&lt;/A&gt; and some text after it and another link &lt;A HREF=\"/relative-path/vegetables.cgi?param1=abc&amp;param2=xyz\"&gt;URL2 Label&lt;/A&gt; and some more text";
text = StringEscapeUtils.unescapeHtml4(text);
Pattern p = Pattern.compile("<a href=\"(.*?)\">(.*?)</a>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
while (m.find()) {
text = text.replace(m.group(0), cleanUrlPart(m.group(1), m.group(2)));
}
System.out.println(text);
}

private static String cleanUrlPart(String url, String label) {
if (!url.startsWith("http") && !url.startsWith("www")) {
if (url.startsWith("/")) {
url = "http://www.google.com" + url;
} else {
url = "http://www.google.com/" + url;
}
}
url = appendQueryParams(url, "myParam=pqr").toString();
if (label != null && !label.isEmpty()) url += " (" + label + ")";
return url;
}

输出

Some content which contains link as http://www.google.com/relative-path/fruit.cgi?param1=abc&param2=xyz&myParam=pqr (URL Label) and some text after it and another link http://www.google.com/relative-path/vegetables.cgi?param1=abc&param2=xyz&myParam=pqr (URL2 Label) and some more text

关于java正则表达式从文本中检索链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53423132/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com