如何从utf-8中恢复符号

huangapple go评论99阅读模式
英文:

How to recover symbols from utf-8

问题

在Python 2.7中进行编码是非常难理解的。有人能解释一下如何获取这些字符串的符号吗?

这是我的Unicode字符串:

  1. my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'

我想要将它转换为获取"\u2019"和"\xe9"。

我已经尝试过my_str.encode('utf-8'),但是这给我返回了以下内容:

  1. 'MFADCINEMve000301119 FACTURE EFAD CIN\xe2\x80\x99troD+000000035165 EUR FACTURE EFAD CIN\xe2\x80\x99trop\xc3\xa9MA SAS 2019/10198'

带有其他编码的符号。我不理解,我只想将它们替换为"'"和"é"符号...

更新:

如何从utf-8中恢复符号

更新2:

这是我的代码:

  1. day = datetime.now().day
  2. month = datetime.now().strftime("%b")
  3. year = datetime.now().strftime("%Y")
  4. filename = "ventes{0}{1}{2}.csv".format(day, month, year)
  5. with io.open(filename, 'w', encoding='utf-8') as file_data:
  6. csvwriter = csv.writer(file_data, delimiter=',', quotechar="", quoting=csv.QUOTE_NONE)
  7. for line in res:
  8. csvwriter.writerow([x for x in line]) # 在下面发生错误
  9. file_data.seek(0)
  10. out = base64.encodestring(file_data.read())

发生了这个错误(不一定是显式的):

  1. Traceback (most recent call last):
  2. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 546, in _handle_exception
  3. return super(JsonRequest, self)._handle_exception(exception)
  4. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 583, in dispatch
  5. result = this_call_function(**this_params)
  6. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 319, in this_call_function
  7. return checked_call(this_db, *this_args, **this_kwargs)
  8. File "/usr/lib/python2.7/dist-packages/openerp/service/model.py", line 118, in wrapper
  9. return f(this_db, *this_args, **this_kwargs)
  10. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 316, in checked_call
  11. return this_endpoint(*this_args, **this_kw)
  12. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 812, in __call__
  13. return this_method(*this_args, **this_kw)
  14. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 412, in response_wrap
  15. response = this_func(*this_args, **this_kw)
  16. File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 953, in call_button
  17. action = this_call_kw(model, method, args, {})
  18. File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 941, in this_call_kw
  19. return getattr(this_registry.get(model), method)(this_cr, this_uid, *this_args, **this_kwargs)
  20. File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 268, in wrapper
  21. return old_api(this, *this_args, **this_kwargs)
  22. File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 399, in old_api
  23. result = this_method(this_recs, *this_args, **this_kwargs)
  24. File "/usr/lib/python2.7/dist-packages/openerp/addons_eggs/adquat_export_CEGID/models/export_cegid.py", line 31, in validate
  25. move_ids = this_context.get('active_ids', [])
  26. UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 136: ordinal not in range(128)

这段代码有什么问题?请帮忙!

英文:

Encoding in Python 2.7 is very hard to understand. Can someone explain to me how get these string's symbols?

Here is my unicode string:

  1. my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'

And I want to convert it to get "\u2019" and "\xe9".

I already try to my_str.encode('utf-8') but this gives me that:

  1. 'MFADCINEMve000301119 FACTURE EFAD CIN\xe2\x80\x99troD+000000035165 EUR FACTURE EFAD CIN\xe2\x80\x99trop\xc3\xa9MA SAS 2019/10198'

with other encoded symbols. I don't understand that, I juste want to replace them into ' and é symbols...

UPDATE:

如何从utf-8中恢复符号

UPDATE 2:

Here is my code:

  1. day = datetime.now().day
  2. month = datetime.now().strftime("%b")
  3. year = datetime.now().strftime("%Y")
  4. filename = "ventes{0}{1}{2}.csv".format(day, month, year)
  5. with io.open(filename, 'w', encoding='utf-8') as file_data:
  6. csvwriter = csv.writer(file_data, delimiter=',', quotechar="", quoting=csv.QUOTE_NONE)
  7. for line in res:
  8. csvwriter.writerow([x for x in line]) # Occurs error bellow
  9. file_data.seek(0)
  10. out = base64.encodestring(file_data.read())

That occurs this error (not necessarily explicit):

  1. Traceback (most recent call last):
  2. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 546, in _handle_exception
  3. return super(JsonRequest, self)._handle_exception(exception)
  4. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 583, in dispatch
  5. result = self._call_function(**self.params)
  6. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 319, in _call_function
  7. return checked_call(self.db, *args, **kwargs)
  8. File "/usr/lib/python2.7/dist-packages/openerp/service/model.py", line 118, in wrapper
  9. return f(dbname, *args, **kwargs)
  10. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 316, in checked_call
  11. return self.endpoint(*a, **kw)
  12. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 812, in __call__
  13. return self.method(*args, **kw)
  14. File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 412, in response_wrap
  15. response = f(*args, **kw)
  16. File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 953, in call_button
  17. action = self._call_kw(model, method, args, {})
  18. File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 941, in _call_kw
  19. return getattr(request.registry.get(model), method)(request.cr, request.uid, *args, **kwargs)
  20. File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 268, in wrapper
  21. return old_api(self, *args, **kwargs)
  22. File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 399, in old_api
  23. result = method(recs, *args, **kwargs)
  24. File "/usr/lib/python2.7/dist-packages/openerp/addons_eggs/adquat_export_CEGID/models/export_cegid.py", line 31, in validate
  25. move_ids = self._context.get('active_ids', [])
  26. UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 136: ordinal not in range(128)

What's wrong with this code? Please help !

答案1

得分: 1

Python 2默认将字符串表示(repr())显示为仅限ASCII的形式。ASCII范围(0-127)之外的字符将显示为转义码(\xnn\unnnn)。只有在使用print命令打印字符时,字符才会在视觉上正确显示,前提是终端编码和字体支持该字符。

例如:

  1. >>> s = u'\xe9'
  2. >>> s # 这是用于调试的字符串表示。
  3. u'\xe9'
  4. >>> len(s) # 它仍然只有长度为1。
  5. 1
  6. >>> print(s) # 当打印时,它会正确显示。
  7. é

我的终端的编码默认情况下不支持所有Unicode字符,因此您的另一个示例无法正常打印。然而,调试表示形式可以正常显示:

  1. >>> s = u'\u2019'
  2. >>> s
  3. u'\u2019'
  4. >>> len(s)
  5. 1
  6. >>> print(s)
  7. Traceback (most recent call last):
  8. File "<stdin>", line 1, in <module>
  9. File "D:\dev\Python27\lib\encodings\cp437.py", line 12, in encode
  10. return codecs.charmap_encode(input,errors,encoding_map)
  11. UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 0: character maps to <undefined>

如果要将Unicode字符串写入文件,您需要对其进行编码。使用所需的编码打开文件,然后写入Unicode字符串。最好使用UTF-8作为编码,因为它支持所有Unicode字符。使用io.open。它与Python 3 兼容(您应该尽快切换到Python 3),并支持encoding参数。

  1. import io
  2. my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'
  3. with io.open('out.txt','w',encoding='utf8') as f:
  4. f.write(my_str)

请注意,您必须在支持UTF-8的编辑器中查看文件。例如,在默认的cp437编码下,我的终端显示如下:

  1. C:\>type out.txt
  2. MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198

但如果我将编码更改为cp65001(UTF-8):

  1. C:\>chcp 65001
  2. Active code page: 65001
  3. C:\>type out.txt
  4. MFADCINEMve000301119 FACTURE EFAD CINtroD+000000035165 EUR FACTURE EFAD CINtropéMA SAS 2019/10198

更多阅读材料:

英文:

Python 2 by default displays string representations (repr()) as ASCII-only. Any character outside the ASCII range (0-127) is displayed as an escape code (\xnn or \unnnn). The character is only displayed correctly visually if you print the character, and then only if the terminal encoding and font support the character.

For example:

  1. >>> s = u'\xe9'
  2. >>> s # This is a representation of the string useful for debugging.
  3. u'\xe9'
  4. >>> len(s) # It is still only length 1.
  5. 1
  6. >>> print(s) # It displays correctly when printed.
  7. é

My terminal's encoding doesn't support all Unicode characters by default, so you're other example doesn't print. The debug representation does, however:

  1. >>> s = u'\u2019'
  2. >>> s
  3. u'\u2019'
  4. >>> len(s)
  5. 1
  6. >>> print(s)
  7. Traceback (most recent call last):
  8. File "<stdin>", line 1, in <module>
  9. File "D:\dev\Python27\lib\encodings\cp437.py", line 12, in encode
  10. return codecs.charmap_encode(input,errors,encoding_map)
  11. UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 0: character maps to <undefined>

If you write a Unicode string to a file, you have to encode it. Open a file with the encoding you want and write the Unicode string. It's best to use UTF-8 as the encoding, as it supports all Unicode characters. Use io.open. It is compatible with Python 3 (which you should switch to ASAP) and supports the encoding parameter.

  1. import io
  2. my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'
  3. with io.open('out.txt','w',encoding='utf8') as f:
  4. f.write(my_str)

Note you have to view the file in an editor that supports UTF-8. For example, on my terminal with its default cp437 encoding it looks like:

  1. C:\>type out.txt
  2. MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198

But if I change the encoding to cp65001 (UTF-8):

  1. C:\>chcp 65001
  2. Active code page: 65001
  3. C:\>type out.txt
  4. MFADCINEMve000301119 FACTURE EFAD CINtroD+000000035165 EUR FACTURE EFAD CINtropéMA SAS 2019/10198

More reading:

答案2

得分: -1

只需执行 print(my_str.encode('utf-8'))

这将给您输出:

> MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198

英文:

You just need to do print(my_str.encode('utf-8'))

This will give you the output:

> MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198

huangapple
  • 本文由 发表于 2020年1月4日 00:06:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/59581715.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定