2

I cannot compile this:

String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}};

I tried to escape the special character by using \\ but no effect.

This is the error code:

Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project opk-application-util: Compilation failure: Compilation failure: 
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/util/SonderZeichenFilter.java:[50,41] '}' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,45] ';' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,46] illegal character: '#'
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,47] ';' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/opk/util/SonderZeichenFilter.java:[50,50] unclosed string literal
2
  • 3
    I guess there's no necessity to escape ampersand character Commented Sep 4, 2020 at 10:07
  • Yes - this was an editing mistake here. It fails this way: String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}}; Commented Sep 4, 2020 at 10:59

3 Answers 3

2

In Java Unicode escape sequences (\uXXXX) are handled as part of pre-processing and before String literal escape sequences are processed. Therefore when the compiler processes "\u0022" it is actually processing the String literal """ which is one empty String literal (two double quotes) followed by the opening quote of another String literal therefore resulting in the error "unclosed string literal" because there is an uneven amount of double quotes in the code.

This is a somewhat common cause for malformed Javadoc (when the author wants to write literally \uXXXX but the resulting HTML instead contains the respective Unicode character) and most IDEs are confused by this as well (e.g. \u0063lass MyClass {} is valid Java source code; \u0063 = c).

In your case you can use the special escape sequence \" to write a literal ". This will also improve readability because not everyone is familiar with the Unicode code point of ". Similarly \u0021 could be written as ! since that character has no special meaning inside a Java String. Your code could therefore be written like this:

String[][] UMLAUT_REPLACEMENTS = {{"\"", """},{"!", "!"}};

If you want the literal \uXXXX inside a Java String you will have to escape the backslash by preceeding it with another \: "\\uXXXX"

Sign up to request clarification or add additional context in comments.

5 Comments

Hi! Sorry I placed the backslash for editing reasons here- I actually use exactly this code sequence and I get this error above: `` String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}};``
@FrancescoRovetto, thanks for the clarification. I had misunderstood your problem and have now updated my answer accordingly.
Thank you so much!! \\uXXXX would remove the error, but Java doesn't recognizes it anymore as unicode somehow. But we are close to the solution :-) Unfortunatelly, we cannot use the blank signs like ! ", etc. I'm looking for a way to be able to use unicodes in a String array...
@FrancescoRovetto could you please describe the desired outcome a little bit more in detail then? Note that e.g. "\u0021" and "!" are in the compiled class and at runtime exactly the same, so if you only want to create a String containing "!" then there is no need to use a unicode escape. If you want \u0021 to be the literal value of the String, then you need to escape it. Just try it out, e.g. use System.out.println("\\u0021");
Yes actually there are only problems with the characters used by Java. And then, for example \u0022 throws an error. But I found now a solution which I will present in the answer. :-)
0

Seemingly the issue is "\u0022" string, because java compiler converts the escaping sequence to UTF before a code parsing that sometimes leads to the errors.

https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.6

Compile time error while adding unicode \u0022

So, "\u0022" must be replaced with "\""

Comments

0

I found the solution!

So, the reason, why String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}}; did not work, is, because \u0022 is already interpreted as " while compiling, which throws an error, because """ needs to be escaped.

But if you escape \u0022, it will not be recognized as character anymore.

Yet there is also a solution, which I applied.


By the way, this solution is to mask all special characters of the latin ascii letters except the very simple ones.

First, you declare a String array:

    public String escapeHtml(String input) {

    String escapedHtml = input;

String[][] UMLAUT_REPLACEMENTS =
            {
                    {"\\u0021", "&33"},
                    {"\\u0022", "&#34"},
                    {"\\u0024", "&#36"},
                    {"\\u0025", "&#37"},
                    {"\\u0026", "&#38"},
                    {"\\u0027", "&#39"},
                    {"\\u0028", "&#40"},
};

Then, you Look for the characters to replace them with the HTML Entities but use StringEscapeUtils.unescapeJava(INPUT) to unescape \uXXXX

    for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
        String unescapedSign = StringEscapeUtils.unescapeJava(UMLAUT_REPLACEMENTS[i][0]);
        escapedHtml = escapedHtml.replace(unescapedSign, UMLAUT_REPLACEMENTS[i][1]);
    }


    return escapedHtml;


Thank you for your help!!

3 Comments

Have you tested whether the other answers are not working for you? Because what you are doing is "\\u0021" -StringEscapeUtils.unescapeJava(...)-> "!" -> replace. This seems to complicate things unnecessarily when you can just write "!" in the first place, omitting unescapeJava (unless there is other code using UMLAUT_REPLACEMENTS which is not shown here).
Yeah it would work as long as the source is deployed only local. unicode is just more save - if somehow something changes with the encoding, all the characters are gone.
Both ! and " are ASCII characters so I doubt that there is any (commonly used) encoding which could mess them up. Note that in Java Strings are always UTF-16, so unless you are talking about encoding issues affecting the complete source code (and not only the String literals), there should not be any issues.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.