英文:
Pandas groupby "nominal" column (one hot list)
问题
我正在尝试按类别对数据框进行分组,其中类别表示为独热编码。我将独热编码构建为列中的列表。然而,当我尝试进行分组时,会引发错误:
TypeError: 无法哈希化的类型:'list'
以下是一个最小的可重现示例:
import pandas
data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
"class": [1, 0, 0, 0, 1]})
class_groups = [data for group, data in data.groupby("class")]
print(class_groups)
data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
"class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})
class_groups = [data for group, data in data.groupby("class")]
print(class_groups)
第一个示例是一种顺序示例,可以正常工作,然后是类似的名义格式,会引发错误。也许有另一种更容易进行分组的格式化方法,但它确实需要是独热编码。
英文:
I'm attempting to group a dataframe by class where the class is represented as a one hot encoding. I build the one-hot encoding as a list in the column. However when I attempt to do a groupby it raises and error that:
> TypeError: unhashable type: 'list'
Here is a minimal reproducible example:
import pandas
data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
"class": [1, 0, 0, 0, 1]})
class_groups = [data for group, data in data.groupby("class")]
print(class_groups)
data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
"class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})
class_groups = [data for group, data in data.groupby("class")]
print(class_groups)
The first is an ordinal example which works well and then a similar nominal format which throws an error. Maybe there's another way I really should be formatting this that would be easier to groupby. But it does need to be a one-hot encoding.
答案1
得分: 1
需要在分组 Series 中使用不可变对象。一种选择是将其转换为元组:
class_groups = [data for group, data in data.groupby(data["class"].apply(tuple))]
输出:
[ first class
0 0 [0, 1]
3 3 [0, 1],
first class
1 1 [1, 0]
2 2 [1, 0]
4 4 [1, 0]]
英文:
You need to use non-mutable objects in the grouping Series. One option is to convert to tuple:
class_groups = [data for group, data in data.groupby(data["class"].apply(tuple))]
Output:
[ first class
0 0 [0, 1]
3 3 [0, 1],
first class
1 1 [1, 0]
2 2 [1, 0]
4 4 [1, 0]]
答案2
得分: 1
你可以执行以下操作:
[data for group, data in data.groupby(data['class'].map(tuple))]
#输出
[ first class
0 0 [0, 1]
3 3 [0, 1],
first class
1 1 [1, 0]
2 2 [1, 0]
4 4 [1, 0]]
编辑:
使用你的数据,你可以通过if/else
得到两个类别组的工作:
if isinstance(data['class'][0], list):
class_groups = [data for group, data in data.groupby(data['class'].map(tuple))]
else:
class_groups = [data for group, data in data.groupby("class")]
情况1:
import pandas
data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
"class": [1, 0, 0, 0, 1]})
print(class_groups)
[ first class
1 1 0
2 2 0
3 3 0,
first class
0 0 1
4 4 1]
情况2:
data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
"class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})
print(class_groups)
[ first class
0 0 [0, 1]
3 3 [0, 1],
first class
1 1 [1, 0]
2 2 [1, 0]
4 4 [1, 0]]
英文:
You can do:
[data for group, data in data.groupby(data['class'].map(tuple))]
#output
[ first class
0 0 [0, 1]
3 3 [0, 1],
first class
1 1 [1, 0]
2 2 [1, 0]
4 4 [1, 0]]
Edit:
With your data you can get both class groups working by if/else
:
if isinstance(data['class'][0], list):
class_groups = [data for group, data in data.groupby(data['class'].map(tuple))]
else:
class_groups = [data for group, data in data.groupby("class")]
Case 1:
import pandas
data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
"class": [1, 0, 0, 0, 1]})
print(class_groups)
[ first class
1 1 0
2 2 0
3 3 0,
first class
0 0 1
4 4 1]
Case 2:
data = pandas.DataFrame({"first": [0, 1, 2, 3, 4],
"class": [[0, 1], [1, 0], [1, 0], [0, 1], [1, 0]]})
print(class_groups)
[ first class
0 0 [0, 1]
3 3 [0, 1],
first class
1 1 [1, 0]
2 2 [1, 0]
4 4 [1, 0]]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论