无法将数组数据从dtype(‘O’)转换为np.bincount

huangapple go评论109阅读模式
英文:

Cannot cast array data from dtype('O') in np.bincount

问题

很抱歉,我无法提供代码的翻译,但我可以为您解释问题。这段代码似乎在某些输入文件上运行正常,但在一部分文件上出现错误。错误消息是关于无法将数组数据从 dtype('O') 转换为 dtype('int64')。这通常是由于在NumPy数组中包含了不同类型的数据造成的。

为了解决这个问题,您可以尝试以下步骤:

  1. 检查输入数据:确保labels 数组中的所有元素都是整数类型。如果包含其他类型的数据,可以尝试将它们转换为整数。
  2. 数据清洗:如果输入数据包含非整数类型的值,您可以尝试清洗数据,只保留整数值。
  3. 显式类型转换:在执行np.bincount(labels)之前,可以尝试将labels数组显式转换为整数类型,例如labels.astype(int)
  4. 检查数据来源:确保数据加载和处理的过程不会将非整数数据引入labels数组中。

请注意,解决此问题的确切步骤可能会根据您的数据和代码上下文而有所不同。希望这些提示能帮助您找到解决问题的方向。

英文:

Unfortunately I cannot share the data I am now using, so this question will not contain an MWE.

I have this code:

def baseline(labels):
    # dummy classifier returning the most common label in labels
    print(labels.shape)
    print(type(labels))
    print(type(labels[0]))
    print(type(labels[2]))
    print(labels)
    counts = np.bincount(labels)
    value = np.argmax(counts)

This code runs fine with most input files containing the labels. However, on a subset of files, I get the error:

Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

that I cannot understand. Output is:

(891,)
<class 'numpy.ndarray'>
<class 'int'>
<class 'int'>
[0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0
 0 1 1 0 1 0 0 0 1 0 1 0 1 1 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 1 1 0 0 0 1
 0 0 0 0 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1
 1 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1
 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0
 0 0 0 1 1 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0
 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1
 1 0 0 1 0 1 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 1 0 0 1 0
 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0
 1 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 0 1 1 0 1
 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0 0 0
 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 1
 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1
 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 1 1 0 1 1 0
 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 1 0
 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0
 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 1 1 0
 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 0 0 0
 1 1 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1 0 0 1 0 1 0 1 0 1 0 0
 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0
 0 0 1 1 0 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 0 1 0 0 1 0 0 1
 0 0 1 0 1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0
 0 0 0 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 1 1 0 1
 1 0 0 1 0 1 0 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0
 0 0 1]
Traceback (most recent call last):
  File "07_training_test.py", line 577, in <module>
    fire.Fire(main)
  File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/user/miniconda3/envs/proj/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "07_training_test.py", line 554, in main
    res = process_file(fn, parameters, config)
  File "07_training_test.py", line 434, in process_file
    value_train, train_acc = utils.baseline(full_labels.loc[train_i].to_numpy())
  File "/home/user/workspace/proj/src/pipeline_paper/utils.py", line 186, in baseline
    counts = np.bincount(labels)
  File "<__array_function__ internals>", line 5, in bincount
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

There are other questions on this error, but in different contexts so I was not able to solve the issue following the answers.

答案1

得分: 1

这是您可以重现错误的方法:

>>> arr = np.random.binomial(n=1, p=0.5, size=891)
>>> arr.dtype
dtype('int64')

>>> baseline(arr)  # 正常工作
>>> arr = arr.astype(object)
>>> arr.dtype
dtype('O')
>>> baseline(arr)
TypeError: 无法根据规则 'safe' 将数组数据从 dtype('O') 转换为 dtype('int64')

所以,即使您的数组包含整数,但具有 "object" 类型,np.bincount 会引发错误。在运行代码之前,请尝试手动将 labels 强制转换为 numpy 整数数组。例如:

arr = arr.astype(np.int8)
英文:

This is how you can reproduce your error:

>>> arr = np.random.binomial(n=1, p=0.5, size=891)
>>> arr.dtype
dtype('int64')

>>> baseline(arr)  # Works fine
>>> arr = arr.astype(object)
>>> arr.dtype
dtype('O')
>>> baseline(arr)
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

So even if your array contains integers but has a type "object", np.bincount throws the error. Try to cast labels to numpy integer array manually before running the code. E.g.:

arr = arr.astype(np.int8)

huangapple
  • 本文由 发表于 2023年3月20日 22:41:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75791698.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定