英文:
Cannot cast array data from dtype('O') in np.bincount
问题
很抱歉,我无法提供代码的翻译,但我可以为您解释问题。这段代码似乎在某些输入文件上运行正常,但在一部分文件上出现错误。错误消息是关于无法将数组数据从 dtype('O') 转换为 dtype('int64')。这通常是由于在NumPy数组中包含了不同类型的数据造成的。
为了解决这个问题,您可以尝试以下步骤:
- 检查输入数据:确保
labels
数组中的所有元素都是整数类型。如果包含其他类型的数据,可以尝试将它们转换为整数。 - 数据清洗:如果输入数据包含非整数类型的值,您可以尝试清洗数据,只保留整数值。
- 显式类型转换:在执行
np.bincount(labels)
之前,可以尝试将labels
数组显式转换为整数类型,例如labels.astype(int)
。 - 检查数据来源:确保数据加载和处理的过程不会将非整数数据引入
labels
数组中。
请注意,解决此问题的确切步骤可能会根据您的数据和代码上下文而有所不同。希望这些提示能帮助您找到解决问题的方向。
英文:
Unfortunately I cannot share the data I am now using, so this question will not contain an MWE.
I have this code:
def baseline(labels):
# dummy classifier returning the most common label in labels
print(labels.shape)
print(type(labels))
print(type(labels[0]))
print(type(labels[2]))
print(labels)
counts = np.bincount(labels)
value = np.argmax(counts)
This code runs fine with most input files containing the labels
. However, on a subset of files, I get the error:
Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'
that I cannot understand. Output is:
(891,)
<class 'numpy.ndarray'>
<class 'int'>
<class 'int'>
[0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0
0 1 1 0 1 0 0 0 1 0 1 0 1 1 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 1 1 0 0 0 1
0 0 0 0 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1
1 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1
0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0
0 0 0 1 1 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0
0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1
1 0 0 1 0 1 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 1 0 0 1 0
1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0
1 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 0 1 1 0 1
0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0 0 0
1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 1
0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1
1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 1 1 0 1 1 0
1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 1 0
1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0
0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 1 1 0
1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 0 0
1 1 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 0 0 1 0 1 0 1 0 1 0 0
0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 1 1 0 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 0 1 0 0 1 0 0 1
0 0 1 0 1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0
0 0 0 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 1 1 0 1
1 0 0 1 0 1 0 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0
0 0 1]
Traceback (most recent call last):
File "07_training_test.py", line 577, in <module>
fire.Fire(main)
File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "07_training_test.py", line 554, in main
res = process_file(fn, parameters, config)
File "07_training_test.py", line 434, in process_file
value_train, train_acc = utils.baseline(full_labels.loc[train_i].to_numpy())
File "/home/user/workspace/proj/src/pipeline_paper/utils.py", line 186, in baseline
counts = np.bincount(labels)
File "<__array_function__ internals>", line 5, in bincount
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'
There are other questions on this error, but in different contexts so I was not able to solve the issue following the answers.
答案1
得分: 1
这是您可以重现错误的方法:
>>> arr = np.random.binomial(n=1, p=0.5, size=891)
>>> arr.dtype
dtype('int64')
>>> baseline(arr) # 正常工作
>>> arr = arr.astype(object)
>>> arr.dtype
dtype('O')
>>> baseline(arr)
TypeError: 无法根据规则 'safe' 将数组数据从 dtype('O') 转换为 dtype('int64')
所以,即使您的数组包含整数,但具有 "object" 类型,np.bincount
会引发错误。在运行代码之前,请尝试手动将 labels
强制转换为 numpy 整数数组。例如:
arr = arr.astype(np.int8)
英文:
This is how you can reproduce your error:
>>> arr = np.random.binomial(n=1, p=0.5, size=891)
>>> arr.dtype
dtype('int64')
>>> baseline(arr) # Works fine
>>> arr = arr.astype(object)
>>> arr.dtype
dtype('O')
>>> baseline(arr)
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'
So even if your array contains integers but has a type "object", np.bincount
throws the error. Try to cast labels
to numpy integer array manually before running the code. E.g.:
arr = arr.astype(np.int8)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论