英文:
Python: How to convert string of an encoded Unicode variable to a binary variable
问题
I am building an app to transliterate Myanmar (Burmese) text to International Phonetic Alphabet. I have found that it's easier to manipulate combining Unicode characters in my dictionaries as binary variables like this:
b'\xe1\x80\xad\xe1\x80\xaf\xe1\x80\x84': 'ိုင', # 'ိုင'
because otherwise they glue to neighboring characters like apostrophes and I can't get them unstuck.
I am translating UTF-8 Burmese characters to binary hex variables to build my dictionaries. Sometimes I want to convert backwards. I wrote a simple program:
while True:
binary = input("Enter binary Unicode string or 'exit' to exit: ")
if binary == 'exit':
break
else:
print(binary.decode('utf-8'))
Obviously this will not work for an input such as b'\xe1\x80\xad'
, since input() returns a string.
I would like to know the quickest way to convert the string "b'\xe1\x80\xad'"
to its Unicode form ' ိ '. My only idea is:
binary = bin(int(binary, 16))
but that returns:
ValueError: invalid literal for int() with base 16: "b'\xe1\x80\xad'"
Please help! Thanks.
英文:
I am building an app to transliterate Myanmar (Burmese) text to International Phonetic Alphabet. I have found that it's easier to manipulate combining Unicode characters in my dictionaries as binary variables like this
b'\xe1\x80\xad\xe1\x80\xaf\xe1\x80\x84': 'áɪŋ̃', #'ိုင'
because otherwise they glue to neighboring characters like apostrophes and I can't get them unstuck.
I am translating UTF-8 Burmese characters to binary hex variables to build my dictionaries. Sometimes I want to convert backwards. I wrote a simple program:
while True:
binary = input("Enter binary Unicode string or 'exit' to exit: ")
if binary == 'exit':
break
else:
print(binary.decode('utf-8'))
Obviously this will not work for an input such as b'\xe1\x80\xad'
, since input() returns a string.
I would like to know the quickest way to convert the string "b'\xe1\x80\xad'"
to it's Unicode form ' ိ '. My only idea is
binary = bin(int(binary, 16))
but that returns
ValueError: invalid literal for int() with base 16: "b'\\xe1\\x80\\xad\\'"
Please help! Thanks.
答案1
得分: 2
问题在于输入字符串“b'\xe1\x80\xad'”不是一个有效的十六进制字符串,无法直接转换为Unicode。然而,您可以使用ast模块安全地将该字符串评估为Python字面值,并获取相应的字节对象。这里是一个示例:
import ast
binary = "b'\\xe1\\x80\\xad'"
bytes_obj = ast.literal_eval(binary)
unicode_str = bytes_obj.decode('utf-8')
print(unicode_str)
这应该输出Unicode字符'ိ'。ast.literal_eval()函数安全地评估输入字符串作为Python字面值,这在本例中是一个字节对象。然后,您可以解码字节对象以获取相应的Unicode字符串。
英文:
The issue is that the input string "b'\xe1\x80\xad'" is not a valid hexadecimal string that can be converted to Unicode directly. However, you can use the ast module to safely evaluate the string as a Python literal and get the corresponding bytes object. Here's an example:
import ast
binary = "b'\\xe1\\x80\\xad'"
bytes_obj = ast.literal_eval(binary)
unicode_str = bytes_obj.decode('utf-8')
print(unicode_str)
This should output the Unicode character 'ိ'. The ast.literal_eval() function safely evaluates the input string as a Python literal, which in this case is a bytes object. You can then decode the bytes object to get the corresponding Unicode string.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论