UTF-8有多个版本吗?

huangapple go评论53阅读模式
英文:

Does UTF-8 have more than one version?

问题

"UTF-8, C form" 和 "UTF-8, D form" 是指 UTF-8 的两种表示形式吗?

英文:

I read the following in PHP Manual > Language Reference > Types: Details of the String Type:

> Given that PHP does not dictate a specific encoding for strings, one
> might wonder how string literals are encoded. For instance, is the
> string "á" equivalent to "\xE1" (ISO-8859-1), "\xC3\xA1" (UTF-8, C
> form), "\x61\xCC\x81" (UTF-8, D form) or any other possible
> representation?

What does "UTF-8, C form" and "UTF-8, D form" mean - are they two versions of UTF-8?

答案1

得分: 1

UTF-8 C形式和UTF-8 D形式是在UTF-8中编码相同Unicode代码点的两种备用方式,C形式对于可以用ASCII表示的字符使用单个代码单元,而D形式对于所有字符使用两个代码单元。示例:

  • 在UTF-8 C中,(é) 表示为两个字节:0xC3和0xA9
  • 在UTF-8 D中,(é) 表示为单个代码点:0xE9
英文:

UTF-8 C form and UTF-8 D form are two alternate ways of encoding the same Unicode code points in UTF-8, with C form using a single code unit for characters that can be represented in ASCII, and D form using two code units for all characters. Example:

  • (é) in UTF-8 C is represented as two bytes: 0xC3 and 0xA9
  • (é) UTF-8 D is represented as a single code point: 0xE9

huangapple
  • 本文由 发表于 2023年2月24日 04:05:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/75549809.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定