gpt4 book ai didi

java - 在 Java 中解码来自 VBScript 的转义字符串

转载 作者:行者123 更新时间:2023-11-30 03:58:01 30 4
gpt4 key购买 nike

我尝试解码以下字符串,

String str  = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";

System.out.println(StringEscapeUtils.unescapeHtml(str));
try {
System.out.println("res:"+java.net.URLDecoder.decode(str, "UTF-8"));
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

两种方法均失败,如下所示,

AT%26amp%3BT%20Network%20Client%20%u2013%20IBM
Exception in thread "main" java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u2"
at java.net.URLDecoder.decode(URLDecoder.java:173)
at decrypt.DecryptHtml.main(DecryptHtml.java:19)

该字符串的来源是一个使用 the Escape function 的 VBS 脚本。 。我怎样才能解码这个字符串?

最佳答案

不幸的是,从阅读文档来看,Microsoft Has Done It Again (tm):“非标准 xxx”,其中“xxx”是“转义格式”。

具体来说,在 the documentation of the VBScript function ,据说:

[...]Unicode characters that have a value greater than 255 are stored using the %uxxxx format.

(嘿,MS:不存在“Unicode 字符”这样的东西;它们被称为代码点)

太棒了。所以你需要自己的解码函数。

幸运的是,我们使用Java。由于此专有转义序列仅涵盖基本多语言平面中的 Unicode 代码点(U+0000 到 U+FFFF), 因为 char 是 UTF-16 代码单元、并且由于 BMP 和 UTF-16 之间存在 1 对 1 的映射,这使我们的工作稍微变得更容易。

这是代码:

public final class MSUnescaper
{
private static final char PERCENT = '%';
private static final char NONSTANDARD_PCT_ESCAPE = 'u';

private MSUnescaper()
{
}

public static String unescape(final String input)
{
final StringBuilder sb = new StringBuilder(input.length());
final CharBuffer buf = CharBuffer.wrap(input);

char c;

while (buf.hasRemaining()) {
c = buf.get();
if (c != PERCENT) {
sb.append(c);
continue;
}
if (!buf.hasRemaining())
throw new IllegalArgumentException();
c = buf.get();
sb.append(c == NONSTANDARD_PCT_ESCAPE
? msEscape(buf) : standardEscape(buf, c));
}

return sb.toString();
}

private static char standardEscape(final CharBuffer buf, final char c)
{
if (!buf.hasRemaining())
throw new IllegalArgumentException();
final char[] array = { c, buf.get() };
return (char) Integer.parseInt(new String(array), 16);
}

private static char msEscape(final CharBuffer buf)
{
if (buf.remaining() < 4)
throw new IllegalArgumentException();
final char[] array = new char[4];
buf.get(array);
return (char) Integer.parseInt(new String(array), 16);
}

public static void main(final String... args)
{
final String input = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";
System.out.println(unescape(input));
}
}

输出:

AT&amp;T Network Client – IBM

关于java - 在 Java 中解码来自 VBScript 的转义字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22628163/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com