Pandas按“名义”列进行分组(独热编码列表)

huangapple go评论106阅读模式
英文:

Pandas groupby "nominal" column (one hot list)

问题

我正在尝试按类别对数据框进行分组,其中类别表示为独热编码。我将独热编码构建为列中的列表。然而,当我尝试进行分组时,会引发错误:

TypeError: 无法哈希化的类型:'list'

以下是一个最小的可重现示例:

  1. import pandas
  2. data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
  3. "class": [1, 0, 0, 0, 1]})
  4. class_groups = [data for group, data in data.groupby("class")]
  5. print(class_groups)
  6. data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
  7. "class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})
  8. class_groups = [data for group, data in data.groupby("class")]
  9. print(class_groups)

第一个示例是一种顺序示例,可以正常工作,然后是类似的名义格式,会引发错误。也许有另一种更容易进行分组的格式化方法,但它确实需要是独热编码。

英文:

I'm attempting to group a dataframe by class where the class is represented as a one hot encoding. I build the one-hot encoding as a list in the column. However when I attempt to do a groupby it raises and error that:

> TypeError: unhashable type: 'list'

Here is a minimal reproducible example:

  1. import pandas
  2. data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
  3. "class": [1, 0, 0, 0, 1]})
  4. class_groups = [data for group, data in data.groupby("class")]
  5. print(class_groups)
  6. data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
  7. "class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})
  8. class_groups = [data for group, data in data.groupby("class")]
  9. print(class_groups)

The first is an ordinal example which works well and then a similar nominal format which throws an error. Maybe there's another way I really should be formatting this that would be easier to groupby. But it does need to be a one-hot encoding.

答案1

得分: 1

需要在分组 Series 中使用不可变对象。一种选择是将其转换为元组:

  1. class_groups = [data for group, data in data.groupby(data["class"].apply(tuple))]

输出:

  1. [ first class
  2. 0 0 [0, 1]
  3. 3 3 [0, 1],
  4. first class
  5. 1 1 [1, 0]
  6. 2 2 [1, 0]
  7. 4 4 [1, 0]]
英文:

You need to use non-mutable objects in the grouping Series. One option is to convert to tuple:

  1. class_groups = [data for group, data in data.groupby(data["class"].apply(tuple))]

Output:

  1. [ first class
  2. 0 0 [0, 1]
  3. 3 3 [0, 1],
  4. first class
  5. 1 1 [1, 0]
  6. 2 2 [1, 0]
  7. 4 4 [1, 0]]

答案2

得分: 1

你可以执行以下操作:

  1. [data for group, data in data.groupby(data['class'].map(tuple))]
  2. #输出
  3. [ first class
  4. 0 0 [0, 1]
  5. 3 3 [0, 1],
  6. first class
  7. 1 1 [1, 0]
  8. 2 2 [1, 0]
  9. 4 4 [1, 0]]

编辑:

使用你的数据,你可以通过if/else得到两个类别组的工作:

  1. if isinstance(data['class'][0], list):
  2. class_groups = [data for group, data in data.groupby(data['class'].map(tuple))]
  3. else:
  4. class_groups = [data for group, data in data.groupby("class")]

情况1:

  1. import pandas
  2. data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
  3. "class": [1, 0, 0, 0, 1]})
  4. print(class_groups)
  5. [ first class
  6. 1 1 0
  7. 2 2 0
  8. 3 3 0,
  9. first class
  10. 0 0 1
  11. 4 4 1]

情况2:

  1. data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
  2. "class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})
  3. print(class_groups)
  4. [ first class
  5. 0 0 [0, 1]
  6. 3 3 [0, 1],
  7. first class
  8. 1 1 [1, 0]
  9. 2 2 [1, 0]
  10. 4 4 [1, 0]]
英文:

You can do:

  1. [data for group, data in data.groupby(data['class'].map(tuple))]
  2. #output
  3. [ first class
  4. 0 0 [0, 1]
  5. 3 3 [0, 1],
  6. first class
  7. 1 1 [1, 0]
  8. 2 2 [1, 0]
  9. 4 4 [1, 0]]

Edit:

With your data you can get both class groups working by if/else:

  1. if isinstance(data['class'][0], list):
  2. class_groups = [data for group, data in data.groupby(data['class'].map(tuple))]
  3. else:
  4. class_groups = [data for group, data in data.groupby("class")]

Case 1:

  1. import pandas
  2. data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
  3. "class": [1, 0, 0, 0, 1]})
  4. print(class_groups)
  5. [ first class
  6. 1 1 0
  7. 2 2 0
  8. 3 3 0,
  9. first class
  10. 0 0 1
  11. 4 4 1]

Case 2:

  1. data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
  2. "class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})
  3. print(class_groups)
  4. [ first class
  5. 0 0 [0, 1]
  6. 3 3 [0, 1],
  7. first class
  8. 1 1 [1, 0]
  9. 2 2 [1, 0]
  10. 4 4 [1, 0]]

huangapple
  • 本文由 发表于 2023年6月13日 02:43:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76459455.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定