问题:在Python中读取包含字符“ê:você”的葡萄牙文文本文件时出现问题。

huangapple go评论82阅读模式
英文:

Issue reading Portuguese text file with "ê : você" character in Python

问题

我有一个用PHP创建的葡萄牙文文本文件,其中包含带有字符“ê”(带有环形音调的e)的句子。我尝试在Python中读取此文件,但遇到了与“ê”字符特别相关的问题。我已确保PHP文件和Python脚本都使用UTF-8编码。

Python脚本在终端中正常工作,但当我从PHP的exec()或shell_exec()函数中调用此Python文件时,Python无法正确读取文本文件内容并打印出以下错误:

'ascii'编解码器无法对位置6的字符'ê'进行编码:超出范围(128)

可能导致此问题的原因是什么,以及如何解决它?

我已经尝试了以下步骤:

  1. 使用UTF-8编码保存PHP文件。
  2. 在Python中打开文件时明确指定UTF-8编码。
  3. 验证Python的默认编码设置为UTF-8。

操作系统:Linux

Python默认编码:utf-8

文本文件内容:

Se você tem 1 laranja e 1 limão faça esse delicioso bolo!

Python代码:

filename = "newfile.txt"
with open(filename, "r", encoding="utf-8") as file:
    # 读取文本文件的第一行
    file_content = file.readline().strip()
    print(file_content)

终端输出:

问题:在Python中读取包含字符“ê:você”的葡萄牙文文本文件时出现问题。

PHP文件代码:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<title>read python</title>
</head>
<body>
<?php 
$pythonScript = "read.py";
$command = "python3 " . $pythonScript;
$output = shell_exec($command); 
echo $output; 
?>
</body>
</html>

感谢您提供关于如何处理此问题的任何见解或建议。谢谢!

英文:

I have a text file in Portuguese that I created using PHP, which contains sentences with the character "ê" (e with circumflex accent). I'm trying to read this file in Python, but I'm encountering issues specifically with the "ê" character. I have ensured that both the PHP file and Python script are using the UTF-8 encoding.

Python script work fine in terminal but when I call this python file from php exec() or shell_exec() function python could not read text file content properly and print this error:

&#39;ascii&#39; codec can&#39;t encode character &#39;\xea&#39; in position 6: ordinal not in range(128)

What could be causing this issue and how can I resolve it?

I have already tried the following steps:

  1. Saving the PHP file with UTF-8 encoding.
  2. Specifying the UTF-8 encoding explicitly when opening the file in Python.
  3. Verifying that the default encoding in Python is set to UTF-8.

operating system: Linux

Python default encoding: utf-8

text file content:

Se voc&#234; tem 1 laranja e 1 lim&#227;o fa&#231;a esse delicioso bolo!

Python code:

filename = &quot;newfile.txt&quot;
with open(filename, &quot;r&quot;, encoding=&quot;utf-8&quot;) as file:
 # Read the first line of the text file
 file_content = file.readline().strip()
 print(file_content)

terminal print:

问题:在Python中读取包含字符“ê:você”的葡萄牙文文本文件时出现问题。

php file code:

&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
&lt;head&gt;
&lt;meta content=&quot;text/html; charset=utf-8&quot; http-equiv=&quot;Content-Type&quot; /&gt;
&lt;title&gt;read python&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;?php 
$pythonScript = &quot;read.py&quot;;
$command = &quot;python3 &quot; . $pythonScript;
$output = shell_exec($command); 
echo $output; 
?&gt;
&lt;/body&gt;
&lt;/html&gt;

I appreciate any insights or suggestions on how to handle this issue. Thank you!

答案1

得分: 1

> ASCII字符串中的初始\ufeff是字节顺序标记(BOM)字符,有时用作UTF-8文件的签名。使用encoding='utf-8-sig'来移除它。字符串的其余部分是正确的,所以问题是显示的编码,而不是Python。如果您的终端没有配置为UTF-8,它会错误解码结果。在Windows上,使用Python 3.11在命令提示符中打印具有该内容的字符串会正确显示:Se você tem 1 laranja e 1 limão faça esse delicioso bolo!。

@MarkTolonen 是正确的,终端没有配置为UTF-8,我在使用php的exec()函数之前设置了终端的本地utf-8,现在它正常工作。

PHP代码:

$locale='pt_BR.UTF-8';
setlocale(LC_ALL,$locale);
putenv('LC_ALL='.$locale);
英文:

> The initial \ufeff in the ASCII string is the byte order mark (BOM) character sometimes used as a signature for a UTF-8 file. Use encoding=&#39;utf-8-sig&#39; to remove that. The rest of the string is correct so the problem is the encoding of the display, not Python. If your terminal isn't configured for UTF-8 it will mis-decode the result. On Windows with Python 3.11 in the command prompt a string with that content prints correctly: Se você tem 1 laranja e 1 limão faça esse delicioso bolo!.

@MarkTolonen is right, terminal was not configured for UTF-8, I set local utf-8 in terminal before using exec() function in php, now that is working.

PHP code:

$locale=&#39;pt_BR.UTF-8&#39;; 
setlocale(LC_ALL,$locale); 
putenv(&#39;LC_ALL=&#39;.$locale);

huangapple
  • 本文由 发表于 2023年6月2日 05:18:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76385776.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定