英文:
Why does np.astype('uint8') give different results on Windows versus Mac?
问题
我有一个形状为(1000,1000,3)
的NumPy数组(数据类型为'float32'),当我将它转换为'dtype'为'uint8'时,我在Windows和Mac上得到了不同的结果。
在Mac上:
import numpy as np
X = np.load('image.npy')
X = X.astype('uint8')
X.sum()
167942490
在Windows上:
import numpy as np
X = np.load('image.npy')
X = X.astype('uint8')
X.sum()
323510676
这个问题还可以通过以下数组进行复现:
import numpy as np
X = np.array([
[[46410., 42585., 32640.],
[45645., 41820., 31875.],
[45390., 41310., 32130.]],
[[44880., 41055., 31110.],
[44115., 40290., 30345.],
[46410., 42330., 33150.]],
[[45390., 41310., 32130.],
[46155., 42075., 32895.],
[42840., 38760., 30090.]]], dtype=np.float32)
print(X.sum(), X.astype('uint8').sum())
在Windows上打印结果为 1065135.0 2735
,在Mac上打印结果为 1065135.0 1860
。
以下是不同操作系统和Python与NumPy版本的结果:
- Python 3.8.8(Windows), Numpy 1.22.4 => 1065135.0 2735
- Python 3.10.6(Mac), Numpy 1.24.2 => 1065135.0 2735
- Python 3.7.12(Mac), Numpy 1.21.6 => 1065135.0 1860
英文:
I have a (1000,1000,3)
shaped numpy array (dtype='float32'
) and when I cast it to dtype='uint8'
I get different results on Windows versus Mac.
Array is available here: https://www.dropbox.com/s/jrs4n2ayh86s0fn/image.npy?dl=0
On Mac
>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
167942490
On Windows
>>> import numpy as np
>>> X = np.load('image.npy')
>>> X = X.astype('uint8')
>>> X.sum()
323510676
Also reproduces with this array:
import numpy as np
X = np.array([
[[46410., 42585., 32640.],
[45645., 41820., 31875.],
[45390., 41310., 32130.]],
[[44880., 41055., 31110.],
[44115., 40290., 30345.],
[46410., 42330., 33150.]],
[[45390., 41310., 32130.],
[46155., 42075., 32895.],
[42840., 38760., 30090.]]], dtype=np.float32)
print(X.sum(), X.astype('uint8').sum())
Prints 1065135.0 2735
on Windows and 1065135.0 1860
on Mac.
Here are results with different OS and Python and Numpy:
Python 3.8.8 (Win) Numpy 1.22.4 => 1065135.0 2735
Python 3.10.6 (Mac) Numpy 1.24.2 => 1065135.0 2735
Python 3.7.12 (Mac) Numpy 1.21.6 => 1065135.0 1860
答案1
得分: 5
这个问题是由于错误的转换引起的,导致整数溢出。事实上,Numpy使用C强制转换来转换值,但将浮点数转换为超出0-255范围的8位无符号整数在C中会导致未定义行为。我们尽力在这种情况下报告错误,而不影响性能,但并非在所有情况下都可能做到。最新版本的Numpy应该修复了这个问题,但问题仍然部分未解决。请参阅1.24.0版本发布说明、这个问题以及这个问题,还有这个PR(据我所知,首次提到这个问题的参考资料在这里找到)。
无论如何,尽管在目标机器上可能无法检测到错误,但将浮点数强制转换为超出0-255范围的浮点数是不安全的,您不应该期望得到正确的结果。您需要调整您的代码以避免溢出。我还建议您至少使用Numpy的1.24.0版本,以更好地跟踪此类错误。
相关帖子:为什么Numpy处理溢出不一致?
英文:
This problem is due to a bad conversion causing integer overflows. Indeed, Numpy use C casts so to convert values, but converting floats outside the range 0-255 to 8-bit unsigned integers results in an undefined behaviour in C. We tried to do our best to report errors in this case without impacting performance but this is not possible in all cases. The latest versions of Numpy should fix this but the issue is still partially unsolved. See the 1.24.0 release notes, this issue and this one, as well as this PR (AFAIK, the first reference to this issue is found here).
Anyway, while the error may not be detected on your target machine, casting floating-point number outside the range 0-255 is unsafe and you should not expect a correct result. You need to adapt your code so there is no overflow in the first place. I also advise you to use at least the version 1.24.0 of Numpy so to better track such errors.
Related post: Why does numpy handle overflows inconsistently?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论