java - 如何在 Java 中取消转义 Java 字符串文字？-6ren

java - 如何在 Java 中取消转义 Java 字符串文字？

转载作者：IT老高更新时间：2023-10-28 13:52:05

我正在使用 Java 处理一些 Java 源代码。我正在提取字符串文字并将它们提供给采用字符串的函数。问题是我需要将字符串的未转义版本传递给函数(即，这意味着将 \n 转换为换行符，并将 \\ 转换为单个 \ 等)。

Java API 中是否有执行此操作的函数？如果没有，我可以从某个库中获得这样的功能吗？显然，Java 编译器必须进行这种转换。

最佳答案

问题

这里作为另一个答案给出的 org.apache.commons.lang.StringEscapeUtils.unescapeJava() 真的没什么帮助。

它忘记了 \0 为 null。
它根本不处理八进制。
它无法处理 java.util.regex.Pattern.compile() 以及使用它的所有内容(包括 \a)所允许的各种转义， \e，尤其是 \cX。
它不支持按数字排列的逻辑 Unicode 代码点，仅支持 UTF-16。
这看起来像 UCS-2 代码，而不是 UTF-16 代码:它们使用已弃用的 charAt 接口(interface)而不是 codePoint 接口(interface)，从而传播了 Java 的错觉char 保证包含一个 Unicode 字符。它不是。他们只能侥幸逃脱，因为没有 UTF-16 代理最终会寻找他们正在寻找的任何东西。

解决方案

我写了一个字符串 unescaper，它解决了 OP 的问题，而没有 Apache 代码的所有烦恼。

/*
 *
 * unescape_perl_string()
 *
 *      Tom Christiansen <tchrist@perl.com>
 *      Sun Nov 28 12:55:24 MST 2010
 *
 * It's completely ridiculous that there's no standard
 * unescape_java_string function.  Since I have to do the
 * damn thing myself, I might as well make it halfway useful
 * by supporting things Java was too stupid to consider in
 * strings:
 * 
 *   => "?" items  are additions to Java string escapes
 *                 but normal in Java regexes
 *
 *   => "!" items  are also additions to Java regex escapes
 *   
 * Standard singletons: ?\a ?\e \f \n \r \t
 * 
 *      NB: \b is unsupported as backspace so it can pass-through
 *          to the regex translator untouched; I refuse to make anyone
 *          doublebackslash it as doublebackslashing is a Java idiocy
 *          I desperately wish would die out.  There are plenty of
 *          other ways to write it:
 *
 *              \cH, \12, \012, \x08 \x{8}, \u0008, \U00000008
 *
 * Octal escapes: \0 \0N \0NN \N \NN \NNN
 *    Can range up to !\777 not \377
 *    
 *      TODO: add !\o{NNNNN}
 *          last Unicode is 4177777
 *          maxint is 37777777777
 *
 * Control chars: ?\cX
 *      Means: ord(X) ^ ord('@')
 *
 * Old hex escapes: \xXX
 *      unbraced must be 2 xdigits
 *
 * Perl hex escapes: !\x{XXX} braced may be 1-8 xdigits
 *       NB: proper Unicode never needs more than 6, as highest
 *           valid codepoint is 0x10FFFF, not maxint 0xFFFFFFFF
 *
 * Lame Java escape: \[IDIOT JAVA PREPROCESSOR]uXXXX must be
 *                   exactly 4 xdigits;
 *
 *       I can't write XXXX in this comment where it belongs
 *       because the damned Java Preprocessor can't mind its
 *       own business.  Idiots!
 *
 * Lame Python escape: !\UXXXXXXXX must be exactly 8 xdigits
 * 
 * TODO: Perl translation escapes: \Q \U \L \E \[IDIOT JAVA PREPROCESSOR]u \l
 *       These are not so important to cover if you're passing the
 *       result to Pattern.compile(), since it handles them for you
 *       further downstream.  Hm, what about \[IDIOT JAVA PREPROCESSOR]u?
 *
 */

public final static
String unescape_perl_string(String oldstr) {

    /*
     * In contrast to fixing Java's broken regex charclasses,
     * this one need be no bigger, as unescaping shrinks the string
     * here, where in the other one, it grows it.
     */

    StringBuffer newstr = new StringBuffer(oldstr.length());

    boolean saw_backslash = false;

    for (int i = 0; i < oldstr.length(); i++) {
        int cp = oldstr.codePointAt(i);
        if (oldstr.codePointAt(i) > Character.MAX_VALUE) {
            i++; /****WE HATES UTF-16! WE HATES IT FOREVERSES!!!****/
        }

        if (!saw_backslash) {
            if (cp == '\\') {
                saw_backslash = true;
            } else {
                newstr.append(Character.toChars(cp));
            }
            continue; /* switch */
        }

        if (cp == '\\') {
            saw_backslash = false;
            newstr.append('\\');
            newstr.append('\\');
            continue; /* switch */
        }

        switch (cp) {

            case 'r':  newstr.append('\r');
                       break; /* switch */

            case 'n':  newstr.append('\n');
                       break; /* switch */

            case 'f':  newstr.append('\f');
                       break; /* switch */

            /* PASS a \b THROUGH!! */
            case 'b':  newstr.append("\\b");
                       break; /* switch */

            case 't':  newstr.append('\t');
                       break; /* switch */

            case 'a':  newstr.append('\007');
                       break; /* switch */

            case 'e':  newstr.append('\033');
                       break; /* switch */

            /*
             * A "control" character is what you get when you xor its
             * codepoint with '@'==64.  This only makes sense for ASCII,
             * and may not yield a "control" character after all.
             *
             * Strange but true: "\c{" is ";", "\c}" is "=", etc.
             */
            case 'c':   {
                if (++i == oldstr.length()) { die("trailing \\c"); }
                cp = oldstr.codePointAt(i);
                /*
                 * don't need to grok surrogates, as next line blows them up
                 */
                if (cp > 0x7f) { die("expected ASCII after \\c"); }
                newstr.append(Character.toChars(cp ^ 64));
                break; /* switch */
            }

            case '8':
            case '9': die("illegal octal digit");
                      /* NOTREACHED */

    /*
     * may be 0 to 2 octal digits following this one
     * so back up one for fallthrough to next case;
     * unread this digit and fall through to next case.
     */
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7': --i;
                      /* FALLTHROUGH */

            /*
             * Can have 0, 1, or 2 octal digits following a 0
             * this permits larger values than octal 377, up to
             * octal 777.
             */
            case '0': {
                if (i+1 == oldstr.length()) {
                    /* found \0 at end of string */
                    newstr.append(Character.toChars(0));
                    break; /* switch */
                }
                i++;
                int digits = 0;
                int j;
                for (j = 0; j <= 2; j++) {
                    if (i+j == oldstr.length()) {
                        break; /* for */
                    }
                    /* safe because will unread surrogate */
                    int ch = oldstr.charAt(i+j);
                    if (ch < '0' || ch > '7') {
                        break; /* for */
                    }
                    digits++;
                }
                if (digits == 0) {
                    --i;
                    newstr.append('\0');
                    break; /* switch */
                }
                int value = 0;
                try {
                    value = Integer.parseInt(
                                oldstr.substring(i, i+digits), 8);
                } catch (NumberFormatException nfe) {
                    die("invalid octal value for \\0 escape");
                }
                newstr.append(Character.toChars(value));
                i += digits-1;
                break; /* switch */
            } /* end case '0' */

            case 'x':  {
                if (i+2 > oldstr.length()) {
                    die("string too short for \\x escape");
                }
                i++;
                boolean saw_brace = false;
                if (oldstr.charAt(i) == '{') {
                        /* ^^^^^^ ok to ignore surrogates here */
                    i++;
                    saw_brace = true;
                }
                int j;
                for (j = 0; j < 8; j++) {

                    if (!saw_brace && j == 2) {
                        break;  /* for */
                    }

                    /*
                     * ASCII test also catches surrogates
                     */
                    int ch = oldstr.charAt(i+j);
                    if (ch > 127) {
                        die("illegal non-ASCII hex digit in \\x escape");
                    }

                    if (saw_brace && ch == '}') { break; /* for */ }

                    if (! ( (ch >= '0' && ch <= '9')
                                ||
                            (ch >= 'a' && ch <= 'f')
                                ||
                            (ch >= 'A' && ch <= 'F')
                          )
                       )
                    {
                        die(String.format(
                            "illegal hex digit #%d '%c' in \\x", ch, ch));
                    }

                }
                if (j == 0) { die("empty braces in \\x{} escape"); }
                int value = 0;
                try {
                    value = Integer.parseInt(oldstr.substring(i, i+j), 16);
                } catch (NumberFormatException nfe) {
                    die("invalid hex value for \\x escape");
                }
                newstr.append(Character.toChars(value));
                if (saw_brace) { j++; }
                i += j-1;
                break; /* switch */
            }

            case 'u': {
                if (i+4 > oldstr.length()) {
                    die("string too short for \\u escape");
                }
                i++;
                int j;
                for (j = 0; j < 4; j++) {
                    /* this also handles the surrogate issue */
                    if (oldstr.charAt(i+j) > 127) {
                        die("illegal non-ASCII hex digit in \\u escape");
                    }
                }
                int value = 0;
                try {
                    value = Integer.parseInt( oldstr.substring(i, i+j), 16);
                } catch (NumberFormatException nfe) {
                    die("invalid hex value for \\u escape");
                }
                newstr.append(Character.toChars(value));
                i += j-1;
                break; /* switch */
            }

            case 'U': {
                if (i+8 > oldstr.length()) {
                    die("string too short for \\U escape");
                }
                i++;
                int j;
                for (j = 0; j < 8; j++) {
                    /* this also handles the surrogate issue */
                    if (oldstr.charAt(i+j) > 127) {
                        die("illegal non-ASCII hex digit in \\U escape");
                    }
                }
                int value = 0;
                try {
                    value = Integer.parseInt(oldstr.substring(i, i+j), 16);
                } catch (NumberFormatException nfe) {
                    die("invalid hex value for \\U escape");
                }
                newstr.append(Character.toChars(value));
                i += j-1;
                break; /* switch */
            }

            default:   newstr.append('\\');
                       newstr.append(Character.toChars(cp));
           /*
            * say(String.format(
            *       "DEFAULT unrecognized escape %c passed through",
            *       cp));
            */
                       break; /* switch */

        }
        saw_backslash = false;
    }

    /* weird to leave one at the end */
    if (saw_backslash) {
        newstr.append('\\');
    }

    return newstr.toString();
}

/*
 * Return a string "U+XX.XXX.XXXX" etc, where each XX set is the
 * xdigits of the logical Unicode code point. No bloody brain-damaged
 * UTF-16 surrogate crap, just true logical characters.
 */
 public final static
 String uniplus(String s) {
     if (s.length() == 0) {
         return "";
     }
     /* This is just the minimum; sb will grow as needed. */
     StringBuffer sb = new StringBuffer(2 + 3 * s.length());
     sb.append("U+");
     for (int i = 0; i < s.length(); i++) {
         sb.append(String.format("%X", s.codePointAt(i)));
         if (s.codePointAt(i) > Character.MAX_VALUE) {
             i++; /****WE HATES UTF-16! WE HATES IT FOREVERSES!!!****/
         }
         if (i+1 < s.length()) {
             sb.append(".");
         }
     }
     return sb.toString();
 }

private static final
void die(String foa) {
    throw new IllegalArgumentException(foa);
}

private static final
void say(String what) {
    System.out.println(what);
}

如果它对其他人有帮助，欢迎您使用它——不附加任何条件。如果你改进它，我希望你把你的改进邮寄给我，但你当然不必这样做。

关于java - 如何在 Java 中取消转义 Java 字符串文字？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3537706/

文章推荐： java - maven错误: package org. junit不存在

文章推荐： java - Spring MVC Web 应用程序检测暴力攻击的最佳方法？

文章推荐： spring - 将唯一违规异常传播到 UI 的最佳实践

文章推荐： java - 使用流 API 合并列表

javascript - 从 Javascript 到 Perl 的 URI 转义/转义
我有一个 javascript 从用户输入中读取的 URL。这是 JavaScript 代码的一部分: document.getElementById("Snd_Cont_AddrLnk_BG").v
JavaScript 转义//
我将如何在 javascript 中转义斜杠// var j = /^(ht|f)tp(s?)://([\w-]+\.)+[\w-]+(/[\w-./?%&=]*)?$;/ 最佳答案使用 \ 进行转
JavaScript 转义？
在解析到这样的对象之前，我要转义 & 和 =: var obb = parseJSON('{"' + text.replace(/&/g, "\",\"").replace(/=/g,"\":\"")
Freemarker 转义 freemarker
我正在使用 freemarker 生成一个 freemarker 模板。但我需要一些方法来转义 freemarker 标签。我将如何逃脱标签或 ${expression} ? 最佳答案您也可以使
regex - 转义 [] 的正则表达式是什么？
我正在尝试匹配方括号，即 excel 中正则表达式 VBA 中的 []。我正在尝试使用以下代码，但它不起作用。 Public Function IsSpecial(s As String) As L
powershell - 转义 -LiteralPath
我通过设置将 PowerShell 添加到我的上下文菜单中: Windows Registry Editor Version 5.00 [HKEY_CLASSES_ROOT\Directory\she
java - 用\$转义$
我需要转义 $，因此我需要将所有出现的 $ 替换为 \$ 所以我写了这个方法: // String#replaceAll(String regex, String replacement) publi
java - 转义=java字符串中的字符
我正在格式化我的问题。非常遗憾。这是我的问题的摘要在 JSP 中我有一个字段我输入的值类似于“cQN==ujyRMdr+Qi8dO9Xm*eRun+ner==aLTyt?aKmGI” 实际行动
perl - 转义 > 字符
我有一个文本文件，其内容是C:\temp 我想要值 C:\temp替换为从变量定义的不同值此外，将从批处理文件(windows .cmd)中调用 perl oneliner set CMDDIR=C
jquery - jTemplates 转义 {$
有没有办法使用 jTemplates 来转义 {$，这样我就可以在 onBlur 中使用内联 javascript，例如 telegraaf 在 processTemplate 之后得到这个: 谢谢
bash - 转义 "#"标志
我正在尝试将 wget 与包含“#”符号的 url 一起使用。无论我做什么来逃避这个角色，它都不起作用。我用过\、' 和 "。但它们都不起作用。有人有什么建议吗？谢谢! 最佳答案如果您真的想让它有
PHP 转义 $ 符号并回显我正在获取数据库的字符串
我想知道如何从数据库中回显带有 $ 符号的字符串。此时，数据库中的值“Buy one for $5.00”将转换为“Buy one for .00”。假设该字段的名称为 title，值为 Buy o
mysql - 转义 % 登录子查询
我在 mySQL 中有一个查询，旨在返回我们网站上使用的搜索词。是的，这是一个标签云，是的，我知道它是一条鲻鱼 :) 我们有一个管理页面，管理员可以在其中查看搜索词并选择将它们排除在云端之外。这些词进
javascript - Jquery 转义 "<" ">"
我有一个文本区域。在其点击事件上。我将其插入数据库中，然后将其显示为元素列表中的第一个元素。问题是。如果我输入""在textarea中，jquery无法正确显示。它显示为空。代码是 var note
c - 字符串文字/转义
我想知道是否有某种字符串前缀，这样 cstring 就可以按原样使用，而不需要我转义所有字符。我不是 100% 确定。我记得一些关于在字符串前加上 @ 符号( char str[] = @"some\
Python 转义 "{}"符号
这个问题在这里已经有了答案: How do I escape curly-brace ({}) characters in a string while using .format (or an f
c++ - 转义(\)字符背后的魔法是什么
C/C++编译器如何操作源代码中的转义字符["\"]？如何编写用于处理该字符的编译器语法？遇到那个字符后，编译器会做什么？最佳答案大多数编译器分为几个部分:编译器前端称为 lexical anal
html 转义，以及我如何获得所有权
我计划接受用户输入，并将其插入到一个 div 中 user_content 一个用户提供内容，另一个用户接收内容。我认为我会遵循的建议来自 https://www.owasp.org/index.p
Python 转义 URL
我有一个这种形式的 url - http:\\/\\/en.wikipedia.org\\/wiki\\/The_Truman_Show。我怎样才能使它成为正常的网址。我试过使用 urllib.unq
python :转义 "\xXX"
我有一个带有转义数据的字符串 escaped_data = '\\x50\\x51' print escaped_data # gives '\x50\x51' 什么 Python 函数会对其进行反转

IT老高

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 如何在 Java 中取消转义 Java 字符串文字？

问题

解决方案