英文:
Not able to save HTML Entity in Java String - illegal character
问题
我无法编译这个代码:
``String[][] UMLAUT_REPLACEMENTS = {{"\u0022", """},{"\u0021", "!"}};``
我尝试使用 \\\\ 转义特殊字符,但没有效果。
这是错误代码:
无法执行目标 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile(项目 opk-application-util 上的默认编译):编译失败:编译失败:
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/util/SonderZeichenFilter.java:[50,41] 预期 '}'
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,45] 预期 ';'
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,46] 非法字符:'#'
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,47] 预期 ';'
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/opk/util/SonderZeichenFilter.java:[50,50] 未闭合的字符串文字
<details>
<summary>英文:</summary>
I cannot compile this:
``String[][] UMLAUT_REPLACEMENTS = {{"\u0022", "&#34;"},{"\u0021", "&#33;"}};``
I tried to escape the special character by using \\\\ but no effect.
This is the error code:
Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project opk-application-util: Compilation failure: Compilation failure:
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/util/SonderZeichenFilter.java:[50,41] '}' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,45] ';' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,46] illegal character: '#'
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/ch/opk/util/SonderZeichenFilter.java:[50,47] ';' expected
[ERROR] /C:/eplatform/git-repos/opk-backend/opk-application-util/src/main/java/opk/util/SonderZeichenFilter.java:[50,50] unclosed string literal
</details>
# 答案1
**得分**: 2
在Java中,Unicode转义序列(<code>\u<i>XXXX</i></code>)作为[预处理](https://docs.oracle.com/javase/specs/jls/se11/html/jls-3.html#jls-3.3)的一部分进行处理,而在处理字符串文字的转义序列之前会进行处理(https://docs.oracle.com/javase/specs/jls/se11/html/jls-3.html#jls-3.10.6)。因此,当编译器处理`"\u0022"`时,实际上正在处理字符串文字`"""`,这是一个空字符串文字(两个双引号),后面跟着另一个字符串文字的开头引号,从而导致错误"未关闭的字符串文字",因为代码中双引号的数量不均匀。
这在某种程度上是导致Javadoc格式不正确的常见原因(当作者希望写成字面上的<code>\u<i>XXXX</i></code>,但生成的HTML实际上包含了相应的Unicode字符),大多数集成开发环境(IDE)也会因此而感到困惑(例如,`\u0063lass MyClass {}`是有效的Java源代码;`\u0063` = `c`)。
在您的情况下,您可以使用特殊的转义序列`\"`来表示字面上的`"`。这也会提高可读性,因为并不是每个人都熟悉`"`的Unicode代码点。类似地,`\u0021`可以写成`!`,因为该字符在Java字符串内部没有特殊含义。因此,您的代码可以写成这样:
```java
String[][] UMLAUT_REPLACEMENTS = {{"\\"", "&#34;"}, {"!", "&#33;"}};
如果您想要在Java字符串中包含字面上的\uXXXX
,您需要通过在反斜杠前加上另一个\
来转义反斜杠:"\\uXXXX"
。
英文:
In Java Unicode escape sequences (<code>\u<i>XXXX</i></code>) are handled as part of pre-processing and before String literal escape sequences are processed. Therefore when the compiler processes "\u0022"
it is actually processing the String literal """
which is one empty String literal (two double quotes) followed by the opening quote of another String literal therefore resulting in the error "unclosed string literal" because there is an uneven amount of double quotes in the code.
This is a somewhat common cause for malformed Javadoc (when the author wants to write literally <code>\u<i>XXXX</i></code> but the resulting HTML instead contains the respective Unicode character) and most IDEs are confused by this as well (e.g. \u0063lass MyClass {}
is valid Java source code; \u0063
= c
).
In your case you can use the special escape sequence \"
to write a literal "
. This will also improve readability because not everyone is familiar with the Unicode code point of "
. Similarly \u0021
could be written as !
since that character has no special meaning inside a Java String. Your code could therefore be written like this:
String[][] UMLAUT_REPLACEMENTS = {{"\"", "&#34;"},{"!", "&#33;"}};
If you want the literal <code>\u<i>XXXX</i></code> inside a Java String you will have to escape the backslash by preceeding it with another \
: <code>"\\u<i>XXXX</i>"</code>
答案2
得分: 0
似乎问题出在 "\u0022"
这个字符串上,因为 Java 编译器会在代码解析之前将转义序列转换为 UTF,有时会导致错误。
https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.6
https://stackoverflow.com/questions/26968213/compile-time-error-while-adding-unicode-u0022
因此,"\u0022"
需要被替换为 "\""
。
英文:
Seemingly the issue is "\u0022"
string, because java compiler converts the escaping sequence to UTF before a code parsing that sometimes leads to the errors.
https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.6
https://stackoverflow.com/questions/26968213/compile-time-error-while-adding-unicode-u0022
So, "\u0022"
must be replaced with "\""
答案3
得分: 0
我找到了解决方案!
所以,之前这段代码没有生效的原因是,String[][] UMLAUT_REPLACEMENTS = {{"\\u0022", "&#34;"},{"\\u0021", "&#33;"}};
,是因为在编译时,\u0022 已经被解释为 ",这会导致错误,因为 """ 需要进行转义。
但是,如果你转义 \u0022,它就不会被识别为字符。
不过,我应用了另一种解决方案。
顺便说一下,这个解决方案是对拉丁 ASCII 字母的所有特殊字符进行掩码处理,除了那些非常简单的字符。
首先,你声明一个字符串数组:
public String escapeHtml(String input) {
String escapedHtml = input;
String[][] UMLAUT_REPLACEMENTS =
{
{"\\u0021", "&33"},
{"\\u0022", "&#34"},
{"\\u0024", "&#36"},
{"\\u0025", "&#37"},
{"\\u0026", "&#38"},
{"\\u0027", "&#39"},
{"\\u0028", "&#40"},
};
然后,你寻找要替换的字符,并将其替换为 HTML 实体,但是使用 StringEscapeUtils.unescapeJava(INPUT)
来取消转义 \uXXXX
for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
String unescapedSign = StringEscapeUtils.unescapeJava(UMLAUT_REPLACEMENTS[i][0]);
escapedHtml = escapedHtml.replace(unescapedSign, UMLAUT_REPLACEMENTS[i][1]);
}
return escapedHtml;
谢谢你的帮助!!
英文:
I found the solution!
So, the reason, why String[][] UMLAUT_REPLACEMENTS = {{"\u0022", "&#34;"},{"\u0021", "&#33;"}};
did not work, is, because \u0022 is already interpreted as " while compiling, which throws an error, because """ needs to be escaped.
But if you escape \u0022, it will not be recognized as character anymore.
Yet there is also a solution, which I applied.
By the way, this solution is to mask all special characters of the latin ascii letters except the very simple ones.
First, you declare a String array:
public String escapeHtml(String input) {
String escapedHtml = input;
String[][] UMLAUT_REPLACEMENTS =
{
{"\\u0021", "&33"},
{"\\u0022", "&#34"},
{"\\u0024", "&#36"},
{"\\u0025", "&#37"},
{"\\u0026", "&#38"},
{"\\u0027", "&#39"},
{"\\u0028", "&#40"},
};
Then, you Look for the characters to replace them with the HTML Entities but use StringEscapeUtils.unescapeJava(INPUT) to unescape \uXXXX
for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
String unescapedSign = StringEscapeUtils.unescapeJava(UMLAUT_REPLACEMENTS[i][0]);
escapedHtml = escapedHtml.replace(unescapedSign, UMLAUT_REPLACEMENTS[i][1]);
}
return escapedHtml;
Thank you for your help!!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论