英文:
How to output multibyte unicode character?
问题
I want to output superscript characters in a scientific report. For example, I want to display 'A/cm²' (with '²' as a superscript character). I found a Unicode character map, and in this case, the superscript character '²' is represented as a 3-byte character.
Environment
- Windows 10 64-bit
- Python v3.10.10
What I did
I read the Python documentation for the chr
function, and I tried the following command:
chr(0xe28282)
However, Python gave me the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: chr() arg not in range(0x110000)
How can I output a 3-byte character in Python?
英文:
What I want to do
I want to output superscript character in making a scientific report. It is like 'A/cm2'('2' in this case). I managed to find a character map in unicode shown below. As for this case superscript charcter of '2' is shown to be a 3-byte character.
Environment
- Windows10 64bit
- Python v3.10.10
What I did
I read the reference of Python (description for chr function), I tried following command.
chr(0xe28282)
Python showed me following error.
Traceback (most recent call last)
File "<stdin>" line 1 in <module>
ValueError: chr() arg not in range(0x110000)
How to describe to output a 3-byte character in Python ?
答案1
得分: 1
Unicode代码点(或字符)在第二行中给出,你可以在Python中使用chr
函数来表示它。第一行是UTF-8编码(可以从第二行派生出来)。你可以将它用作二进制字符串,例如b'\xe2\x82\x82'
。
但是对于Unicode(而Python中的源默认为UTF-8),你可以直接使用"₂"
。这比在表格中来回转换要容易得多。让你的编辑器/界面和Python来为你完成这项工作。
英文:
You are confusing several concepts (and probably the reason was the original site you used).
Unicode codepoint (or character) is given in the second line, and you should use that for chr
in Python. The first line is the UTF-8 encoding (which can be derived from the second line). You can use it as binary string e.g. b'\xe2\x82\x82'
.
But with Unicode (and sources in Python are by default UTF-8), you can directly use "₂"
. It is much easier then going back and forth with tables. Let's your editor/GUI and Python do the job for you.
答案2
得分: 0
U+2082
是您想要的Unicode代码点。 可以使用转义代码,直接粘贴到您的代码中,或者在数字上使用chr()
:
print('\u2082')
₂
print('₂')
₂
print(chr(0x2082))
₂
''.join(chr(i) for i in range(0x2070, 0x209d) if i not in (0x2072, 0x2073, 0x208f))
'⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎ₐₑₒₓₔₕₖₗₘₙₚₛₜ'
从Python 3.3开始,Windows Unicode API用于在cmd.exe窗口中打印,因此终端的编码不重要,但您需要有一个支持该字形的字体;否则,您可能会看到该字体的Unicode REPLACEMENT CHARACTER(�)。在Chrome浏览器的字体上,上面的所有字符在Windows上都正确呈现。
以上代码是在Windows上使用Python 3.11从cmd.exe运行的,join
中的最后八个字符在我使用的Consolas字体中呈现为替代字符:
参考链接:Unicode.org: Superscripts and Subscripts
英文:
U+2082
is the Unicode code point you want. Use an escape code, paste it directly into your code, or use chr()
on the number:
>>> print('\u2082')
₂
>>> print('₂')
₂
>>> print(chr(0x2082))
₂
>>> ''.join(chr(i) for i in range(0x2070, 0x209d) if i not in (0x2072, 0x2073, 0x208f))
'⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎ₐₑₒₓₔₕₖₗₘₙₚₛₜ'
As of Python 3.3 Windows Unicode APIs are used to print to a cmd.exe window, so the encoding of the terminal won't matter, but you will need a font that supports the glyph; otherwise, you'll likely see the font's Unicode REPLACEMENT CHARACTER (�). All the characters above rendered correctly on Windows in the Chrome browser's font.
Above code was run from cmd.exe on Windows with Python 3.11. The last eight characters from the join
were rendered as replacement characters in the Consolas font I was using:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论