英文:
How to recover symbols from utf-8
问题
在Python 2.7中进行编码是非常难理解的。有人能解释一下如何获取这些字符串的符号吗?
这是我的Unicode字符串:
my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'
我想要将它转换为获取"\u2019"和"\xe9"。
我已经尝试过my_str.encode('utf-8')
,但是这给我返回了以下内容:
'MFADCINEMve000301119 FACTURE EFAD CIN\xe2\x80\x99troD+000000035165 EUR FACTURE EFAD CIN\xe2\x80\x99trop\xc3\xa9MA SAS 2019/10198'
带有其他编码的符号。我不理解,我只想将它们替换为"'"和"é"符号...
更新:
更新2:
这是我的代码:
day = datetime.now().day
month = datetime.now().strftime("%b")
year = datetime.now().strftime("%Y")
filename = "ventes{0}{1}{2}.csv".format(day, month, year)
with io.open(filename, 'w', encoding='utf-8') as file_data:
csvwriter = csv.writer(file_data, delimiter=',', quotechar="", quoting=csv.QUOTE_NONE)
for line in res:
csvwriter.writerow([x for x in line]) # 在下面发生错误
file_data.seek(0)
out = base64.encodestring(file_data.read())
发生了这个错误(不一定是显式的):
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 546, in _handle_exception
return super(JsonRequest, self)._handle_exception(exception)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 583, in dispatch
result = this_call_function(**this_params)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 319, in this_call_function
return checked_call(this_db, *this_args, **this_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/service/model.py", line 118, in wrapper
return f(this_db, *this_args, **this_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 316, in checked_call
return this_endpoint(*this_args, **this_kw)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 812, in __call__
return this_method(*this_args, **this_kw)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 412, in response_wrap
response = this_func(*this_args, **this_kw)
File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 953, in call_button
action = this_call_kw(model, method, args, {})
File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 941, in this_call_kw
return getattr(this_registry.get(model), method)(this_cr, this_uid, *this_args, **this_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 268, in wrapper
return old_api(this, *this_args, **this_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 399, in old_api
result = this_method(this_recs, *this_args, **this_kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/addons_eggs/adquat_export_CEGID/models/export_cegid.py", line 31, in validate
move_ids = this_context.get('active_ids', [])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 136: ordinal not in range(128)
这段代码有什么问题?请帮忙!
英文:
Encoding in Python 2.7 is very hard to understand. Can someone explain to me how get these string's symbols?
Here is my unicode string:
my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'
And I want to convert it to get "\u2019" and "\xe9".
I already try to my_str.encode('utf-8')
but this gives me that:
'MFADCINEMve000301119 FACTURE EFAD CIN\xe2\x80\x99troD+000000035165 EUR FACTURE EFAD CIN\xe2\x80\x99trop\xc3\xa9MA SAS 2019/10198'
with other encoded symbols. I don't understand that, I juste want to replace them into ' and é symbols...
UPDATE:
UPDATE 2:
Here is my code:
day = datetime.now().day
month = datetime.now().strftime("%b")
year = datetime.now().strftime("%Y")
filename = "ventes{0}{1}{2}.csv".format(day, month, year)
with io.open(filename, 'w', encoding='utf-8') as file_data:
csvwriter = csv.writer(file_data, delimiter=',', quotechar="", quoting=csv.QUOTE_NONE)
for line in res:
csvwriter.writerow([x for x in line]) # Occurs error bellow
file_data.seek(0)
out = base64.encodestring(file_data.read())
That occurs this error (not necessarily explicit):
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 546, in _handle_exception
return super(JsonRequest, self)._handle_exception(exception)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 583, in dispatch
result = self._call_function(**self.params)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 319, in _call_function
return checked_call(self.db, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/service/model.py", line 118, in wrapper
return f(dbname, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 316, in checked_call
return self.endpoint(*a, **kw)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 812, in __call__
return self.method(*args, **kw)
File "/usr/lib/python2.7/dist-packages/openerp/http.py", line 412, in response_wrap
response = f(*args, **kw)
File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 953, in call_button
action = self._call_kw(model, method, args, {})
File "/usr/lib/python2.7/dist-packages/openerp/addons/web/controllers/main.py", line 941, in _call_kw
return getattr(request.registry.get(model), method)(request.cr, request.uid, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 268, in wrapper
return old_api(self, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/api.py", line 399, in old_api
result = method(recs, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/openerp/addons_eggs/adquat_export_CEGID/models/export_cegid.py", line 31, in validate
move_ids = self._context.get('active_ids', [])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 136: ordinal not in range(128)
What's wrong with this code? Please help !
答案1
得分: 1
Python 2默认将字符串表示(repr()
)显示为仅限ASCII的形式。ASCII范围(0-127)之外的字符将显示为转义码(\xnn
或\unnnn
)。只有在使用print
命令打印字符时,字符才会在视觉上正确显示,前提是终端编码和字体支持该字符。
例如:
>>> s = u'\xe9'
>>> s # 这是用于调试的字符串表示。
u'\xe9'
>>> len(s) # 它仍然只有长度为1。
1
>>> print(s) # 当打印时,它会正确显示。
é
我的终端的编码默认情况下不支持所有Unicode字符,因此您的另一个示例无法正常打印。然而,调试表示形式可以正常显示:
>>> s = u'\u2019'
>>> s
u'\u2019'
>>> len(s)
1
>>> print(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\dev\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 0: character maps to <undefined>
如果要将Unicode字符串写入文件,您需要对其进行编码。使用所需的编码打开文件,然后写入Unicode字符串。最好使用UTF-8作为编码,因为它支持所有Unicode字符。使用io.open
。它与Python 3 兼容(您应该尽快切换到Python 3),并支持encoding
参数。
import io
my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'
with io.open('out.txt','w',encoding='utf8') as f:
f.write(my_str)
请注意,您必须在支持UTF-8的编辑器中查看文件。例如,在默认的cp437
编码下,我的终端显示如下:
C:\>type out.txt
MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198
但如果我将编码更改为cp65001(UTF-8):
C:\>chcp 65001
Active code page: 65001
C:\>type out.txt
MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198
更多阅读材料:
英文:
Python 2 by default displays string representations (repr()
) as ASCII-only. Any character outside the ASCII range (0-127) is displayed as an escape code (\xnn
or \unnnn
). The character is only displayed correctly visually if you print
the character, and then only if the terminal encoding and font support the character.
For example:
>>> s = u'\xe9'
>>> s # This is a representation of the string useful for debugging.
u'\xe9'
>>> len(s) # It is still only length 1.
1
>>> print(s) # It displays correctly when printed.
é
My terminal's encoding doesn't support all Unicode characters by default, so you're other example doesn't print
. The debug representation does, however:
>>> s = u'\u2019'
>>> s
u'\u2019'
>>> len(s)
1
>>> print(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\dev\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 0: character maps to <undefined>
If you write a Unicode string to a file, you have to encode it. Open a file with the encoding you want and write the Unicode string. It's best to use UTF-8 as the encoding, as it supports all Unicode characters. Use io.open
. It is compatible with Python 3 (which you should switch to ASAP) and supports the encoding
parameter.
import io
my_str = u'MFADCINEMve000301119 FACTURE EFAD CIN\u2019troD+000000035165 EUR FACTURE EFAD CIN\u2019trop\xe9MA SAS 2019/10198'
with io.open('out.txt','w',encoding='utf8') as f:
f.write(my_str)
Note you have to view the file in an editor that supports UTF-8. For example, on my terminal with its default cp437
encoding it looks like:
C:\>type out.txt
MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198
But if I change the encoding to cp65001 (UTF-8):
C:\>chcp 65001
Active code page: 65001
C:\>type out.txt
MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198
More reading:
答案2
得分: -1
只需执行 print(my_str.encode('utf-8'))
这将给您输出:
> MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198
英文:
You just need to do print(my_str.encode('utf-8'))
This will give you the output:
> MFADCINEMve000301119 FACTURE EFAD CIN’troD+000000035165 EUR FACTURE EFAD CIN’tropéMA SAS 2019/10198
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论