英文:
Does UTF-8 have more than one version?
问题
"UTF-8, C form" 和 "UTF-8, D form" 是指 UTF-8 的两种表示形式吗?
英文:
I read the following in PHP Manual > Language Reference > Types: Details of the String Type:
> Given that PHP does not dictate a specific encoding for strings, one
> might wonder how string literals are encoded. For instance, is the
> string "á
" equivalent to "\xE1
" (ISO-8859-1), "\xC3\xA1
" (UTF-8, C
> form), "\x61\xCC\x81
" (UTF-8, D form) or any other possible
> representation?
What does "UTF-8, C form" and "UTF-8, D form" mean - are they two versions of UTF-8?
答案1
得分: 1
UTF-8 C形式和UTF-8 D形式是在UTF-8中编码相同Unicode代码点的两种备用方式,C形式对于可以用ASCII表示的字符使用单个代码单元,而D形式对于所有字符使用两个代码单元。示例:
- 在UTF-8 C中,(é) 表示为两个字节:0xC3和0xA9
- 在UTF-8 D中,(é) 表示为单个代码点:0xE9
英文:
UTF-8 C form and UTF-8 D form are two alternate ways of encoding the same Unicode code points in UTF-8, with C form using a single code unit for characters that can be represented in ASCII, and D form using two code units for all characters. Example:
- (é) in UTF-8 C is represented as two bytes: 0xC3 and 0xA9
- (é) UTF-8 D is represented as a single code point: 0xE9
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论