2020年1月4日 00:06:58go评论99阅读模式

英文:

How to recover symbols from utf-8

问题

在Python 2.7中进行编码是非常难理解的。有人能解释一下如何获取这些字符串的符号吗？

这是我的Unicode字符串：

my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'

我想要将它转换为获取"\u2019"和"\xe9"。

我已经尝试过my_str.encode('utf-8')，但是这给我返回了以下内容：

'MFADCINEMve000301119 FACTURE EFAD CIN\xe2\x80\x99troD+000000035165 EUR FACTURE EFAD CIN\xe2\x80\x99trop\xc3\xa9MA SAS 2019/10198'

带有其他编码的符号。我不理解，我只想将它们替换为"'"和"é"符号...

更新：

如何从utf-8中恢复符号

更新2：

这是我的代码：

day = datetime.now().day
month = datetime.now().strftime("%b")
year = datetime.now().strftime("%Y")
filename = "ventes{0}{1}{2}.csv".format(day, month, year)
with io.open(filename, 'w', encoding='utf-8') as file_data:
    csvwriter = csv.writer(file_data, delimiter=',', quotechar="", quoting=csv.QUOTE_NONE)
    for line in res:
        csvwriter.writerow([x for x in line])  # 在下面发生错误
file_data.seek(0)
out = base64.encodestring(file_data.read())

发生了这个错误（不一定是显式的）：

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 546, in _handle_exception
    return super(JsonRequest, self)._handle_exception(exception)
  File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 583, in dispatch
    result = this_call_function(**this_params)
  File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 319, in this_call_function
    return checked_call(this_db, *this_args, **this_kwargs)
  File "/usr/lib/python2.7/dist-packages/openerp/service/model.py", line 118, in wrapper
    return f(this_db, *this_args, **this_kwargs)
  File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 316, in checked_call
    return this_endpoint(*this_args, **this_kw)
  File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 812, in __call__
    return this_method(*this_args, **this_kw)
  File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 412, in response_wrap
    response = this_func(*this_args, **this_kw)
  File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 953, in call_button
    action = this_call_kw(model, method, args, {})
  File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 941, in this_call_kw
    return getattr(this_registry.get(model), method)(this_cr, this_uid, *this_args, **this_kwargs)
  File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 268, in wrapper
    return old_api(this, *this_args, **this_kwargs)
  File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 399, in old_api
    result = this_method(this_recs, *this_args, **this_kwargs)
  File "/usr/lib/python2.7/dist-packages/openerp/addons_eggs/adquat_export_CEGID/models/export_cegid.py", line 31, in validate
    move_ids = this_context.get('active_ids', [])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 136: ordinal not in range(128)

这段代码有什么问题？请帮忙！

英文:

Encoding in Python 2.7 is very hard to understand. Can someone explain to me how get these string's symbols?

Here is my unicode string:

my_str = u&#39;MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198&#39;

And I want to convert it to get "\u2019" and "\xe9".

I already try to my_str.encode('utf-8') but this gives me that:

&#39;MFADCINEMve000301119 FACTURE EFAD CIN\xe2\x80\x99troD+000000035165 EUR FACTURE EFAD CIN\xe2\x80\x99trop\xc3\xa9MA SAS 2019/10198&#39;

with other encoded symbols. I don't understand that, I juste want to replace them into ' and é symbols...

UPDATE:

UPDATE 2:

Here is my code:

day = datetime.now().day
        month = datetime.now().strftime(&quot;%b&quot;)
        year = datetime.now().strftime(&quot;%Y&quot;)
        filename = &quot;ventes{0}{1}{2}.csv&quot;.format(day, month, year)
        with io.open(filename, &#39;w&#39;, encoding=&#39;utf-8&#39;) as file_data:
            csvwriter = csv.writer(file_data, delimiter=&#39;,&#39;, quotechar=&quot;&quot;, quoting=csv.QUOTE_NONE)
            for line in res:
                csvwriter.writerow([x for x in line])  # Occurs error bellow
        file_data.seek(0)
        out = base64.encodestring(file_data.read())

That occurs this error (not necessarily explicit):

Traceback (most recent call last):
  File &quot;/usr/lib/python2.7/dist-packages/openerp/http.py&quot;, line 546, in _handle_exception
    return super(JsonRequest, self)._handle_exception(exception)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/http.py&quot;, line 583, in dispatch
    result = self._call_function(**self.params)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/http.py&quot;, line 319, in _call_function
    return checked_call(self.db, *args, **kwargs)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/service/model.py&quot;, line 118, in wrapper
    return f(dbname, *args, **kwargs)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/http.py&quot;, line 316, in checked_call
    return self.endpoint(*a, **kw)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/http.py&quot;, line 812, in __call__
    return self.method(*args, **kw)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/http.py&quot;, line 412, in response_wrap
    response = f(*args, **kw)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py&quot;, line 953, in call_button
    action = self._call_kw(model, method, args, {})
  File &quot;/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py&quot;, line 941, in _call_kw
    return getattr(request.registry.get(model), method)(request.cr, request.uid, *args, **kwargs)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/api.py&quot;, line 268, in wrapper
    return old_api(self, *args, **kwargs)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/api.py&quot;, line 399, in old_api
    result = method(recs, *args, **kwargs)
  File &quot;/usr/lib/python2.7/dist-packages/openerp/addons_eggs/adquat_export_CEGID/models/export_cegid.py&quot;, line 31, in validate
    move_ids = self._context.get(&#39;active_ids&#39;, [])
UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xe9&#39; in position 136: ordinal not in range(128)

What's wrong with this code? Please help !

答案1

得分: 1

Python 2默认将字符串表示（repr()）显示为仅限ASCII的形式。ASCII范围（0-127）之外的字符将显示为转义码（\xnn或\unnnn）。只有在使用print命令打印字符时，字符才会在视觉上正确显示，前提是终端编码和字体支持该字符。

例如：

&gt;&gt;&gt; s = u&#39;\xe9&#39;
&gt;&gt;&gt; s             # 这是用于调试的字符串表示。
u&#39;\xe9&#39;
&gt;&gt;&gt; len(s)        # 它仍然只有长度为1。
1
&gt;&gt;&gt; print(s)      # 当打印时，它会正确显示。
&#233;

我的终端的编码默认情况下不支持所有Unicode字符，因此您的另一个示例无法正常打印。然而，调试表示形式可以正常显示：

&gt;&gt;&gt; s = u&#39;\u2019&#39;
&gt;&gt;&gt; s
u&#39;\u2019&#39;
&gt;&gt;&gt; len(s)
1
&gt;&gt;&gt; print(s)
Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;
  File &quot;D:\dev\Python27\lib\encodings\cp437.py&quot;, line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: &#39;charmap&#39; codec can&#39;t encode character u&#39;\u2019&#39; in position 0: character maps to &lt;undefined&gt;

如果要将Unicode字符串写入文件，您需要对其进行编码。使用所需的编码打开文件，然后写入Unicode字符串。最好使用UTF-8作为编码，因为它支持所有Unicode字符。使用io.open。它与Python 3 兼容（您应该尽快切换到Python 3），并支持encoding参数。

import io
my_str = u&#39;MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198&#39;
with io.open(&#39;out.txt&#39;,&#39;w&#39;,encoding=&#39;utf8&#39;) as f:
    f.write(my_str)

请注意，您必须在支持UTF-8的编辑器中查看文件。例如，在默认的cp437编码下，我的终端显示如下：

C:\&gt;type out.txt
MFADCINEMve000301119 FACTURE EFAD CINΓ&#199;&#214;troD+000000035165 EUR FACTURE EFAD CINΓ&#199;&#214;trop├⌐MA SAS 2019/10198

但如果我将编码更改为cp65001（UTF-8）：

C:\&gt;chcp 65001
Active code page: 65001
C:\&gt;type out.txt
MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’trop&#233;MA SAS 2019/10198

更多阅读材料：

英文:

Python 2 by default displays string representations (repr()) as ASCII-only. Any character outside the ASCII range (0-127) is displayed as an escape code (\xnn or \unnnn). The character is only displayed correctly visually if you print the character, and then only if the terminal encoding and font support the character.

For example:

&gt;&gt;&gt; s = u&#39;\xe9&#39;
&gt;&gt;&gt; s             # This is a representation of the string useful for debugging.
u&#39;\xe9&#39;
&gt;&gt;&gt; len(s)        # It is still only length 1.
1
&gt;&gt;&gt; print(s)      # It displays correctly when printed.
&#233;

My terminal's encoding doesn't support all Unicode characters by default, so you're other example doesn't print. The debug representation does, however:

&gt;&gt;&gt; s = u&#39;\u2019&#39;
&gt;&gt;&gt; s
u&#39;\u2019&#39;
&gt;&gt;&gt; len(s)
1
&gt;&gt;&gt; print(s)
Traceback (most recent call last):
  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;
  File &quot;D:\dev\Python27\lib\encodings\cp437.py&quot;, line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: &#39;charmap&#39; codec can&#39;t encode character u&#39;\u2019&#39; in position 0: character maps to &lt;undefined&gt;

If you write a Unicode string to a file, you have to encode it. Open a file with the encoding you want and write the Unicode string. It's best to use UTF-8 as the encoding, as it supports all Unicode characters. Use io.open. It is compatible with Python 3 (which you should switch to ASAP) and supports the encoding parameter.

import io
my_str = u&#39;MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198&#39;
with io.open(&#39;out.txt&#39;,&#39;w&#39;,encoding=&#39;utf8&#39;) as f:
    f.write(my_str)

Note you have to view the file in an editor that supports UTF-8. For example, on my terminal with its default cp437 encoding it looks like:

C:\&gt;type out.txt
MFADCINEMve000301119 FACTURE EFAD CINΓ&#199;&#214;troD+000000035165 EUR FACTURE EFAD CINΓ&#199;&#214;trop├⌐MA SAS 2019/10198

But if I change the encoding to cp65001 (UTF-8):

C:\&gt;chcp 65001
Active code page: 65001
C:\&gt;type out.txt
MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’trop&#233;MA SAS 2019/10198

答案2

得分: -1

只需执行 print(my_str.encode('utf-8'))

这将给您输出：

> MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198

英文:

You just need to do print(my_str.encode('utf-8'))

This will give you the output:

> MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从utf-8中恢复符号

问题

答案1

答案2

For a web application in Python running in a web server with WSGI, how to have one single WSGI Worker performing a task?

Python在运行文件时无法导入模块，但可以在交互式shell中导入该模块。

有没有办法根据它们的属性更改列表中特定元素的数据类型？

为什么生成的tkinter按钮-1事件无法识别？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。