为什么竖直和水平方向的图像大小不同?

huangapple go评论74阅读模式
英文:

Why do image size differ when vertical vs horizontal?

问题

尝试使用PIL创建一个随机图像,如以下示例:

import numpy
from PIL import Image

a = numpy.random.rand(48, 84)
img = Image.fromarray(a.astype('uint8')).convert('1')
print(len(img.tobytes()))

这段代码会输出528。

当我们反转numpy数组的维度时:

a = numpy.random.rand(84, 48)

我们得到的输出是504。

为什么会这样呢?

我原本期望字节数量是相同的,因为numpy数组的大小相同。

英文:

Tried to create a random image with PIL as per the example:

import numpy
from PIL import image

a = numpy.random.rand(48,84)
img = Image.fromarray(a.astype('uint8')).convert('1')
print(len(img.tobytes()))

This particular code will output 528.
Wen we flip the numbers of the numpy array:

a = numpy.random.rand(84,48)

The output we get is 504.
Why is that?

I was expecting for the byte number to be the same, since the numpy arrays are the same size.

答案1

得分: 5

调用tobytes()方法时,布尔数组*的数据可能按行进行编码。在您的第二个示例中,img的每一行都包含48个布尔值。因此,每行可以用6个字节(48位)表示。6字节 * 84行 = img中的504字节。然而,在您的第一个示例中,每一行有84个像素,不是8的整数倍。在这种情况下,编码器使用11个字节(88位)表示每一行。每行有4个额外的填充位。因此,现在总大小为11字节 * 48行 = 528字节。

如果您测试一系列随机输入形状以编码2D布尔数组,您将发现当每行的元素数量是8的整数倍时,编码的总字节数等于宽度 * 高度 / 8。然而,当行长度不是8的整数倍时,编码将包含更多字节,因为它必须为每行填充1到7位。

总之,理想情况下,我们希望每个字节存储八个布尔值,但由于行长度并不总是8的整数倍,而编码器按行对数组进行序列化,这变得复杂。

用于澄清的编辑: *在模式 "1"(二进制或 "bilevel" 图像)中,PIL.Image对象实际上表示一个布尔数组。在模式1中,原始图像(在这种情况下是NumPy数组 a)被阈值化以将其转换为二进制图像。

英文:

When you call tobytes() on the boolean array*, the data is likely encoded per row. In your second example, there are 48 booleans in each row of img. So each row can be represented with 6 bytes (48 bits). 6 bytes * 84 rows = 504 bytes in img. However, in your first example, there are 84 pixels per row, which is not divisible by 8. In this case, the encoder represents each row with 11 bytes (88 bits). There are 4 extra bits of padding per row. So now the total size is 11 bytes * 48 rows = 528 bytes.

If you test a bunch of random input shapes for a 2d boolean array to encode, you will find that when the number of elements per row is divisible by 8, the number of total bytes in the encoding is equal to the width * height / 8. However, when the row length is not divisible by 8, the encoding will contain more bytes because it has to pad each row with between 1 and 7 bits.

In summary - ideally, we would want to store eight boolean values per byte, but this is complicated by the fact that the row length isn't always divisible by 8, and the encoder serializes the array by row.

Edit for clarification: *the PIL.Image object in mode "1" (binary or "bilevel" image) effectively represents a boolean array. In mode 1, the original image (in this case, the numpy array a) is thresholded to convert it to a binary image.

huangapple
  • 本文由 发表于 2023年3月4日 09:13:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75633075.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定