2023年6月9日 05:36:15go评论91阅读模式

英文:

Python: How to convert string of an encoded Unicode variable to a binary variable

问题

I am building an app to transliterate Myanmar (Burmese) text to International Phonetic Alphabet. I have found that it's easier to manipulate combining Unicode characters in my dictionaries as binary variables like this:

b'\xe1\x80\xad\xe1\x80\xaf\xe1\x80\x84': 'ိုင', # 'ိုင'

because otherwise they glue to neighboring characters like apostrophes and I can't get them unstuck.

I am translating UTF-8 Burmese characters to binary hex variables to build my dictionaries. Sometimes I want to convert backwards. I wrote a simple program:

while True:
    binary = input("Enter binary Unicode string or 'exit' to exit: ")
    if binary == 'exit':
        break
    else:
        print(binary.decode('utf-8'))

Obviously this will not work for an input such as b'\xe1\x80\xad', since input() returns a string.

I would like to know the quickest way to convert the string "b'\xe1\x80\xad'" to its Unicode form ' ိ '. My only idea is:

binary = bin(int(binary, 16))

but that returns:

ValueError: invalid literal for int() with base 16: "b'\xe1\x80\xad'"

Please help! Thanks.

英文:

b&#39;\xe1\x80\xad\xe1\x80\xaf\xe1\x80\x84&#39;: &#39;&#225;ɪŋ̃&#39;,	#&#39;ိုင&#39;

because otherwise they glue to neighboring characters like apostrophes and I can't get them unstuck.
I am translating UTF-8 Burmese characters to binary hex variables to build my dictionaries. Sometimes I want to convert backwards. I wrote a simple program:

while True:
	binary = input(&quot;Enter binary Unicode string or &#39;exit&#39; to exit: &quot;)
	if binary == &#39;exit&#39;:
		break
	else:
		print(binary.decode(&#39;utf-8&#39;))

Obviously this will not work for an input such as b'\xe1\x80\xad', since input() returns a string.
I would like to know the quickest way to convert the string "b'\xe1\x80\xad'" to it's Unicode form ' ိ '. My only idea is

binary = bin(int(binary, 16))

but that returns

ValueError: invalid literal for int() with base 16: &quot;b&#39;\\xe1\\x80\\xad\\&#39;&quot;

Please help! Thanks.

答案1

得分: 2

问题在于输入字符串“b'\xe1\x80\xad'”不是一个有效的十六进制字符串，无法直接转换为Unicode。然而，您可以使用ast模块安全地将该字符串评估为Python字面值，并获取相应的字节对象。这里是一个示例：

import ast
binary = "b'\\xe1\\x80\\xad'"
bytes_obj = ast.literal_eval(binary)
unicode_str = bytes_obj.decode('utf-8')
print(unicode_str)

这应该输出Unicode字符'ိ'。ast.literal_eval()函数安全地评估输入字符串作为Python字面值，这在本例中是一个字节对象。然后，您可以解码字节对象以获取相应的Unicode字符串。

英文:

The issue is that the input string "b'\xe1\x80\xad'" is not a valid hexadecimal string that can be converted to Unicode directly. However, you can use the ast module to safely evaluate the string as a Python literal and get the corresponding bytes object. Here's an example:

import ast
binary = &quot;b&#39;\\xe1\\x80\\xad&#39;&quot;
bytes_obj = ast.literal_eval(binary)
unicode_str = bytes_obj.decode(&#39;utf-8&#39;)
print(unicode_str)

This should output the Unicode character 'ိ'. The ast.literal_eval() function safely evaluates the input string as a Python literal, which in this case is a bytes object. You can then decode the bytes object to get the corresponding Unicode string.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python：如何将编码的Unicode变量的字符串转换为二进制变量

问题

答案1

python CSV writer, pandas , How to write the csv file in two column using the same text file has only one coulmn

支持多种方言的实体

Python使用pbkdf2进行密码哈希处理。

如何导入一个init.py来为其编写单元测试？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。