这些字符对于XML是否有效?

huangapple go评论55阅读模式
英文:

Are these characters valid for XML?

问题

The provided text appears to be technical content related to XML data and error messages. Here's the translated content:

这份文本似乎是与XML数据和错误消息相关的技术内容:

"Is this valid XML data (the value of the messageContent in particular)?"
这是有效的XML数据吗(尤其是messageContent的值)?

"I am getting it from an API."
我是从一个API获取它的。

"I then get an error when I pass this XML down to a Postgres function for saving to the Postgres DB."
当我将这个XML传递给一个用于保存到Postgres数据库的Postgres函数时,我会收到一个错误。

""
"<row messageDateUTC="2020-06-01T21:20:37.120" texterAddress="" texterStreet="" messageContent="Hey beautiful it&apos;s Scott!&#55357;&#56842;" />"
"
"

"I wonder if it's an API issue, or a problem with the client-side module which generates the XML, or maybe Postgres has an issue and is not able to handle these characters."

"Error here:"
"错误信息如下:"
"Caused by: org.postgresql.util.PSQLException: ERROR: invalid XML content"
"Detail: line 5: xmlParseCharRef: invalid xmlChar value 55357"
"ddress="" texterStreet="" messageContent="Hey beautiful it&apos;s Scott!&#55357;"
" ^"
"line 5: xmlParseCharRef: invalid xmlChar value 56842"
"" texterStreet="" messageContent="Hey beautiful it&apos;s Scott!&#55357;&#56842;"
" ^"
"line 23: chunk is not well balanced"
"第23行:块不平衡"

If you have any specific questions or need further assistance with this XML-related issue, please let me know.

英文:

Is this valid XML data (the value of the messageContent in particular)?

I am getting it from an API.

I then get an error when I pass this XML down to a Postgres function for saving to the Postgres DB.

&lt;rows&gt;

&lt;row messageDateUTC=&quot;2020-06-01T21:20:37.120&quot; 

texterAddress=&quot;&quot; texterStreet=&quot;&quot; messageContent=&quot;Hey beautiful it&amp;apos;s Scott!&amp;#55357;&amp;#56842;&quot;  /&gt;


&lt;/rows&gt;

I wonder if it's an API issue, or a problem with the client-side module which generates the XML, or maybe Postgres has an issue and is not able to handle these characters.

Error here:

Caused by: org.postgresql.util.PSQLException: ERROR: invalid XML content
  Detail: line 5: xmlParseCharRef: invalid xmlChar value 55357
ddress=&quot;&quot; texterStreet=&quot;&quot; messageContent=&quot;Hey beautiful it&amp;apos;s Scott!&amp;#55357;
                                                                               ^
line 5: xmlParseCharRef: invalid xmlChar value 56842
&quot; texterStreet=&quot;&quot; messageContent=&quot;Hey beautiful it&amp;apos;s Scott!&amp;#55357;&amp;#56842;
                                                                               ^
line 23: chunk is not well balanced

答案1

得分: 4

tl;dr 不,它们不是有效的,无论是编码有问题还是输入的编码信息错误。

55357和56842分别在十六进制中是0xD83D和0xDE0A。

在Unicode中,它们分别位于称为“高代理项”和“低代理项”的范围内。

这意味着它们不是合法的Unicode代码点,而是用于UTF-16构造无法适应16位(即基本多文种平面)的单个Unicode值。

这两个特定值解码为U+1F60A 微笑的脸带着笑脸。对应的十进制HTML实体应该是&#128522;

最有可能的原因是,一些不了解UTF-16或者认为这个文本不是UTF-16的转换进行了编码(但即使在这种情况下,它也应该检测到这些值是无效的并报告错误)。

英文:

tl;dr No, they are not valid, whatever did the encoding is either buggy or got told wrong encoding information about the input.

55357 and 56842 are 0xD83D and 0xDE0A in hex respectively.

In Unicode they are in ranges called "High Surrogate" and "Low Surrogate" respectively.

That means that they are not proper Unicode codepoints, but rather used in UTF-16 to construct a single Unicode value that doesn't fit into 16 bit (i.e. the Basic Multilingual Plane).

These two specific values decode to U+1F60A SMILING FACE WITH SMILING EYES. The correct decimal HTML entity for that would be &amp;#128522;.

The most likely reason for this is that some transformation that either doesn't know about UTF-16 or thought this text is not UTF-16 did the encoding (but should have detected that those values are invalid and reported an error even in that case).

huangapple
  • 本文由 发表于 2020年7月28日 18:05:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/63131658.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定