Pandas按“名义”列进行分组(独热编码列表)

huangapple go评论69阅读模式
英文:

Pandas groupby "nominal" column (one hot list)

问题

我正在尝试按类别对数据框进行分组,其中类别表示为独热编码。我将独热编码构建为列中的列表。然而,当我尝试进行分组时,会引发错误:

TypeError: 无法哈希化的类型:'list'

以下是一个最小的可重现示例:

import pandas

data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
                         "class": [1, 0, 0, 0, 1]})

class_groups = [data for group, data in data.groupby("class")]
print(class_groups)

data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
                         "class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})

class_groups = [data for group, data in data.groupby("class")]
print(class_groups)

第一个示例是一种顺序示例,可以正常工作,然后是类似的名义格式,会引发错误。也许有另一种更容易进行分组的格式化方法,但它确实需要是独热编码。

英文:

I'm attempting to group a dataframe by class where the class is represented as a one hot encoding. I build the one-hot encoding as a list in the column. However when I attempt to do a groupby it raises and error that:

> TypeError: unhashable type: 'list'

Here is a minimal reproducible example:

import pandas

data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
                         "class": [1, 0, 0, 0, 1]})

class_groups = [data for group, data in data.groupby("class")]
print(class_groups)

data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
                         "class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})

class_groups = [data for group, data in data.groupby("class")]
print(class_groups)

The first is an ordinal example which works well and then a similar nominal format which throws an error. Maybe there's another way I really should be formatting this that would be easier to groupby. But it does need to be a one-hot encoding.

答案1

得分: 1

需要在分组 Series 中使用不可变对象。一种选择是将其转换为元组:

class_groups = [data for group, data in data.groupby(data["class"].apply(tuple))]

输出:

[   first   class
0      0  [0, 1]
3      3  [0, 1],
    first   class
1      1  [1, 0]
2      2  [1, 0]
4      4  [1, 0]]
英文:

You need to use non-mutable objects in the grouping Series. One option is to convert to tuple:

class_groups = [data for group, data in data.groupby(data["class"].apply(tuple))]

Output:

[   first   class
0      0  [0, 1]
3      3  [0, 1],
    first   class
1      1  [1, 0]
2      2  [1, 0]
4      4  [1, 0]]

答案2

得分: 1

你可以执行以下操作:

[data for group, data in data.groupby(data['class'].map(tuple))]

#输出

[   first   class
 0      0  [0, 1]
 3      3  [0, 1],
    first   class
 1      1  [1, 0]
 2      2  [1, 0]
 4      4  [1, 0]]

编辑:

使用你的数据,你可以通过if/else得到两个类别组的工作:

if isinstance(data['class'][0], list):
   class_groups = [data for group, data in data.groupby(data['class'].map(tuple))] 
else:
   class_groups = [data for group, data in data.groupby("class")]

情况1:

import pandas

data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
                         "class": [1, 0, 0, 0, 1]})
print(class_groups)

[   first  class
 1      1      0
 2      2      0
 3      3      0,
    first  class
 0      0      1
 4      4      1]

情况2:

data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
                         "class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})

print(class_groups)

[   first   class
 0      0  [0, 1]
 3      3  [0, 1],
    first   class
 1      1  [1, 0]
 2      2  [1, 0]
 4      4  [1, 0]]
英文:

You can do:

[data for group, data in data.groupby(data['class'].map(tuple))]

#output

[   first   class
 0      0  [0, 1]
 3      3  [0, 1],
    first   class
 1      1  [1, 0]
 2      2  [1, 0]
 4      4  [1, 0]]

Edit:

With your data you can get both class groups working by if/else:

if isinstance(data['class'][0], list):
   class_groups = [data for group, data in data.groupby(data['class'].map(tuple))] 
else:
   class_groups = [data for group, data in data.groupby("class")]

Case 1:

import pandas

data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
                         "class": [1, 0, 0, 0, 1]})
print(class_groups)

[   first  class
 1      1      0
 2      2      0
 3      3      0,
    first  class
 0      0      1
 4      4      1]

Case 2:

data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
                         "class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})

print(class_groups)

[   first   class
 0      0  [0, 1]
 3      3  [0, 1],
    first   class
 1      1  [1, 0]
 2      2  [1, 0]
 4      4  [1, 0]]

huangapple
  • 本文由 发表于 2023年6月13日 02:43:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76459455.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定