如何仅转义标签而不转义内容?让我用一个例子来解释...
这是原始的原始响应:
<GetWhoISResponse xmlns="http://www.webservicex.net">
<GetWhoISResult>Whois Server Version 2.0
To single out one record, look it up with "xxx", where xxx is one of the
of the records displayed above. If the records are the same, look them up
with "=xxx" to receive a full display for each record.
>>> Last update of whois database: Mon, 30 Dec 2013 08:20:00 UTC <<<
NOTICE: The expiration date displayed in this record is the date the
registrar's sponsorship of the domain name registration in the registry is
currently set to expire. This date does not necessarily reflect the expiration
date of the domain name registrant's agreement with the sponsoring
registrar. Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.
</GetWhoISResult>
</GetWhoISResponse>
如果我使用 StringEscapeUtils 和 unescape 文本 (unescapeXml):
<GetWhoISResponse xmlns="http://www.webservicex.net">
<GetWhoISResult>Whois Server Version 2.0
To single out one record, look it up with "xxx", where xxx is one of the
of the records displayed above. If the records are the same, look them up
with "=xxx" to receive a full display for each record.
>>> Last update of whois database: Mon, 30 Dec 2013 08:20:00 UTC <<<
NOTICE: The expiration date displayed in this record is the date the
registrar's sponsorship of the domain name registration in the registry is
currently set to expire. This date does not necessarily reflect the expiration
date of the domain name registrant's agreement with the sponsoring
registrar. Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.
</GetWhoISResult>
</GetWhoISResponse>
问题出在中间,在 <
的行中和>
被逃脱。我需要这个,因为我想将其转换为 JSON,但现在出现解析错误。
这是一个有趣的问题,我尝试使用宽容的 xml 解析器,但它们似乎无法解析损坏的 xml。下一个最好的选择是正则表达式,我设法通过它解析给定的 xml,但需要注意的是,较小和较大的符号不应形成标签的模式,例如:
< some random text here and >
经过一些研究,我最终确定了给定 xml 的 2 个正则表达式模式(也可以用于通用格式):
public static final String LESSER_STRING = "<(.[^>]*)(<)+";
public static final String GREATER_STRING = ">[^<](.[^<]*)(>)+";
这些字符串用于建立匹配器扫描序列的正则表达式模式。
这是带有输出的工作代码:
public static final String LESSER_STRING = "<(.[^>]*)(<)+";
public static final String GREATER_STRING = ">[^<](.[^<]*)(>)+";
public static final String ESCAPED_XML = "<GetWhoISResponse xmlns="http://www.webservicex.net"><GetWhoISResult>Whois Server Version 2.0 To single out one record, look it up with "xxx", where xxx is one of the of the records displayed above. If the records are the same, look them up with "=xxx" to receive a full display for each record. >>> Last update of whois database: Mon, 30 Dec 2013 08:20:00 UTC <<< NOTICE: The expiration date displayed in this record is the date the registrar's sponsorship of the domain name registration in the registry is currently set to expire. This date does not necessarily reflect the expiration date of the domain name registrant's agreement with the sponsoring registrar. Users may consult the sponsoring registrar's Whois database to view the registrar's reported date of expiration for this registration.</GetWhoISResult></GetWhoISResponse>";
private static Matcher matcher;
private static Pattern pattern;
private static String alter;
private static StringBuffer str = new StringBuffer();
private static StringBuffer jsonString = new StringBuffer();
public static void main(String[] args) {
String xml = StringEscapeUtils.unescapeXml(ESCAPED_XML);
pattern = Pattern.compile(GREATER_STRING);
matcher = pattern.matcher(xml);
while (matcher.find()) {
System.out.println(matcher.group(0));
System.out.println(matcher.group(0).substring(1));
// Find the first encountered greater than sing assuming greater
// than and less than do not form a 'tag' pattern
// Picks the first value after the 'last opened tag' including the
// greater sign - take substring 1
alter = ">" + matcher.group(0).substring(1).replaceAll(">", ">");
matcher.appendReplacement(str, alter);
}
matcher.appendTail(str);
pattern = Pattern.compile(LESSER_STRING);
matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(0));
System.out.println(matcher.group(0).substring(0,
matcher.group(0).length() - 1));
// Find the encountered lesser than sign assuming greater
// than and less than do not form a 'tag' pattern
// Picks the content between the lesser tags and the last opened
// tag; including the lesser sign of the tag
// Reduce it by 1 to prevent the last tag getting replaced
alter = matcher.group(0)
.substring(0, matcher.group(0).length() - 1);
// Add the last tag as is without replacing
alter = alter.replaceAll("<", "<") + "<";
matcher.appendReplacement(jsonString, alter);
}
matcher.appendTail(jsonString);
System.out.println(jsonString);
}
输出:
<GetWhoISResponse xmlns="http://www.webservicex.net"><GetWhoISResult>Whois Server Version 2.0 To single out one record, look it up with "xxx", where xxx is one of the of the records displayed above. If the records are the same, look them up with "=xxx" to receive a full display for each record. >>> Last update of whois database: Mon, 30 Dec 2013 08:20:00 UTC <<< NOTICE: The expiration date displayed in this record is the date the registrar's sponsorship of the domain name registration in the registry is currently set to expire. This date does not necessarily reflect the expiration date of the domain name registrant's agreement with the sponsoring registrar. Users may consult the sponsoring registrar's Whois database to view the registrar's reported date of expiration for this registration.</GetWhoISResult></GetWhoISResponse>
我是一名优秀的程序员,十分优秀!