Python使用print重定向将字节写入文件

huangapple go评论65阅读模式
英文:

Python write bytes to file using redirect of print

问题

using perl,

$ perl -e 'print "\xca"' > out

now $ xxd out

we have

00000000: ca

But with Python, I tried

$ python3 -c 'print("\xca", end="")' > out
$ xxd out

what I got is

00000000: c38a

I'm not sure what is going on.

英文:

using perl,

$ perl -e 'print "\xca"' > out

now $ xxd out

we have

00000000: ca

But with Python, I tried

$ python3 -c 'print("\xca", end="")' > out
$ xxd out

what I got is

00000000: c38a

I'm not sure what is going on.

答案1

得分: 3

在Python中,一个str对象是一系列Unicode码点。它在显示在屏幕上时取决于您的sys.stdout的编码方式。这是基于您的区域设置(或可能会受到各种环境变量的影响,但默认情况下是您的区域设置)选择的。因此,您的区域设置必须设置为UTF-8。这也是我的默认设置:

(py311) Juans-MBP:~ juan$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
(py311) Juans-MBP:~ juan$ python -c "print('\xca', end='')" | xxd
00000000: c38a

然而,如果我覆盖我的区域设置并告诉它使用en_US.ISO8859-1(latin-1),一个单字节的编码,我们会得到您期望的结果:

(py311) Juans-MBP:~ juan$ LC_ALL="en_US.ISO8859-1" python -c "print('\xca', end='')" | xxd
00000000: ca

解决方案是如果您想要原始字节,请使用原始字节。在Python源代码中执行此操作的方法是使用字节文字(或字符串文字,然后使用.encode方法)。我们可以使用sys.stdout.buffer中的原始缓冲区:

(py311) Juans-MBP:~ juan$ python -c "import sys; sys.stdout.buffer.write(b'\xca')" | xxd
00000000: ca

或者通过将字符串编码为字节对象:

(py311) Juans-MBP:~ juan$ python -c "import sys; sys.stdout.buffer.write('\xca'.encode('latin'))" | xxd
00000000: ca
英文:

So in Python, a str object is a series of unicode code points. How this is printed to the screen depends on the encoding of your sys.stdout. This is picked based on your locale (or possibly various environment variables can affect this, but by default, it is your locale). So yours must be set to UTF-8. That's my default too:

(py311) Juans-MBP:~ juan$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
(py311) Juans-MBP:~ juan$ python -c "print('\xca', end='')" | xxd
00000000: c38a

However, if I override my locale and tell it to use en_US.ISO8859-1 (latin-1), a single-byte encoding, we get what you expect:

(py311) Juans-MBP:~ juan$ LC_ALL="en_US.ISO8859-1" python -c "print('\xca', end='')" | xxd
00000000: ca

The solution is to work with raw bytes if you want raw bytes. The way to do that in Python source code is to use a bytes literal (or a string literal and then .encode it). We can use the raw buffer at sys.stdout.buffer:

(py311) Juans-MBP:~ juan$ python -c "import sys; sys.stdout.buffer.write(b'\xca')" | xxd
00000000: ca

Or by encoding a string to a bytes object:

(py311) Juans-MBP:~ juan$ python -c "import sys; sys.stdout.buffer.write('\xca'.encode('latin'))" | xxd
00000000: ca

答案2

得分: 1

> 在Python中,\xca 被解释为UTF-8编码中的两个字节字符串,这就是为什么当一个值被写入文件时,它会自动以c3 8a的形式存储两个字节到文件中。
>
> 但是在Perl中,\xca 被解释为单字节,其十六进制值为0xca,因此当这个值被存储到文件中时,它将不会被编码。
你可以查看更多细节

英文:

> In python \xca is interpreted as a two-byte string in the UTF-8
> encoding and that's why when a value is written inside a file it
> automatically stored two bytes in the file as c3 8a
>
> But in perl \xca is interpreted as a single byte with the hexadecimal
> value 0xca and for that when the value is stored inside the file it will save
> without encoding.

You can check more details

huangapple
  • 本文由 发表于 2023年2月8日 15:07:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382397.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定