2023年2月8日 15:07:55go评论93阅读模式

英文:

Python write bytes to file using redirect of print

问题

using perl,

$ perl -e 'print "\xca"' > out

now $ xxd out

we have

00000000: ca

But with Python, I tried

$ python3 -c 'print("\xca", end="")' > out
$ xxd out

what I got is

00000000: c38a

I'm not sure what is going on.

英文:

using perl,

$ perl -e &#39;print &quot;\xca&quot;&#39; &gt; out

now $ xxd out

we have

00000000: ca

But with Python, I tried

$ python3 -c &#39;print(&quot;\xca&quot;, end=&quot;&quot;)&#39; &gt; out
$ xxd out

what I got is

00000000: c38a

I'm not sure what is going on.

答案1

得分: 3

在Python中，一个str对象是一系列Unicode码点。它在显示在屏幕上时取决于您的sys.stdout的编码方式。这是基于您的区域设置（或可能会受到各种环境变量的影响，但默认情况下是您的区域设置）选择的。因此，您的区域设置必须设置为UTF-8。这也是我的默认设置：

(py311) Juans-MBP:~ juan$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
(py311) Juans-MBP:~ juan$ python -c "print('\xca', end='')" | xxd
00000000: c38a

然而，如果我覆盖我的区域设置并告诉它使用en_US.ISO8859-1（latin-1），一个单字节的编码，我们会得到您期望的结果：

(py311) Juans-MBP:~ juan$ LC_ALL="en_US.ISO8859-1" python -c "print('\xca', end='')" | xxd
00000000: ca

解决方案是如果您想要原始字节，请使用原始字节。在Python源代码中执行此操作的方法是使用字节文字（或字符串文字，然后使用.encode方法）。我们可以使用sys.stdout.buffer中的原始缓冲区：

(py311) Juans-MBP:~ juan$ python -c "import sys; sys.stdout.buffer.write(b'\xca')" | xxd
00000000: ca

或者通过将字符串编码为字节对象：

(py311) Juans-MBP:~ juan$ python -c "import sys; sys.stdout.buffer.write('\xca'.encode('latin'))" | xxd
00000000: ca

英文:

So in Python, a str object is a series of unicode code points. How this is printed to the screen depends on the encoding of your sys.stdout. This is picked based on your locale (or possibly various environment variables can affect this, but by default, it is your locale). So yours must be set to UTF-8. That's my default too:

(py311) Juans-MBP:~ juan$ locale
LANG=&quot;en_US.UTF-8&quot;
LC_COLLATE=&quot;en_US.UTF-8&quot;
LC_CTYPE=&quot;en_US.UTF-8&quot;
LC_MESSAGES=&quot;en_US.UTF-8&quot;
LC_MONETARY=&quot;en_US.UTF-8&quot;
LC_NUMERIC=&quot;en_US.UTF-8&quot;
LC_TIME=&quot;en_US.UTF-8&quot;
LC_ALL=
(py311) Juans-MBP:~ juan$ python -c &quot;print(&#39;\xca&#39;, end=&#39;&#39;)&quot; | xxd
00000000: c38a

However, if I override my locale and tell it to use en_US.ISO8859-1 (latin-1), a single-byte encoding, we get what you expect:

(py311) Juans-MBP:~ juan$ LC_ALL=&quot;en_US.ISO8859-1&quot; python -c &quot;print(&#39;\xca&#39;, end=&#39;&#39;)&quot; | xxd
00000000: ca

The solution is to work with raw bytes if you want raw bytes. The way to do that in Python source code is to use a bytes literal (or a string literal and then .encode it). We can use the raw buffer at sys.stdout.buffer:

(py311) Juans-MBP:~ juan$ python -c &quot;import sys; sys.stdout.buffer.write(b&#39;\xca&#39;)&quot; | xxd
00000000: ca

Or by encoding a string to a bytes object:

(py311) Juans-MBP:~ juan$ python -c &quot;import sys; sys.stdout.buffer.write(&#39;\xca&#39;.encode(&#39;latin&#39;))&quot; | xxd
00000000: ca

答案2

得分: 1

> 在Python中，\xca 被解释为UTF-8编码中的两个字节字符串，这就是为什么当一个值被写入文件时，它会自动以c3 8a的形式存储两个字节到文件中。
>
> 但是在Perl中，\xca 被解释为单字节，其十六进制值为0xca，因此当这个值被存储到文件中时，它将不会被编码。
你可以查看更多细节

英文:

> In python \xca is interpreted as a two-byte string in the UTF-8
> encoding and that's why when a value is written inside a file it
> automatically stored two bytes in the file as c3 8a
>
> But in perl \xca is interpreted as a single byte with the hexadecimal
> value 0xca and for that when the value is stored inside the file it will save
> without encoding.

You can check more details

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python使用print重定向将字节写入文件

问题

答案1

答案2

按列对字符串进行排序（不包括数字），并在制作图表时保持顺序。

如何在Python中在类之间传递数据？

Pandas转换为Numpy：为什么CSV文件的最后一行列缺失？

你如何等待由EventBridge触发的端到端Step Function测试完成？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。