英文:
df.str.get_dummies() vs pd.get_dummies() (Python)
问题
我有一个类似如下的系列:
0 mcdonalds, popeyes
1 wendys
2 popeyes
3 mcdonalds
4 mcdonalds
我使用以下代码:
df.str.get_dummies(sep = ', ')
来获取以下数据框:
popeyes wendys mcdonalds
1 0 1
0 1 0
1 0 0
0 0 1
0 0 1
我想要删除一列,以解决虚拟变量陷阱的问题。我应该如何做,就像pd.get_dummies()中的drop_first参数一样?
期望的输出可能类似于以下内容,但我不想硬编码删除一个随机列:
popeyes wendys
1 0
0 1
1 0
0 0
0 0
英文:
I have a series like so:
0 mcdonalds, popeyes
1 wendys
2 popeyes
3 mcdonalds
4 mcdonalds
I use the following code:
df.str.get_dummies(sep = ', ')
to get the following data frame:
popeyes wendys mcdonalds
1 0 1
0 1 0
1 0 0
0 0 1
0 0 1
I want to remove a column though to account for the dummy variable trap. how do i do this like in the drop_first argument in pd.get_dummies()?
expected output might look something like this, but i don't want to hardcode to drop a random column:
popeyes wendys
1 0
0 1
1 0
0 0
0 0
答案1
得分: 1
You can explode your Series
before using pd.get_dummies
:
(pd.get_dummies(df.str.split(',').explode(), drop_first=True)
.groupby(level=0).max())
popeyes wendys
0 1 0
1 0 1
2 1 0
3 0 0
4 0 0
Alternative:
df.str.get_dummies(sep=',').drop(columns=df.iloc[0].split(',')[0])
popeyes wendys
0 1 0
1 0 1
2 1 0
3 0 0
4 0 0
英文:
You can explode your Series
before using pd.get_dummies
:
>>> (pd.get_dummies(df.str.split(', ').explode(), drop_first=True)
.groupby(level=0).max())
popeyes wendys
0 1 0
1 0 1
2 1 0
3 0 0
4 0 0
Details:
>>> df.str.split(', ').explode()
0 mcdonalds
0 popeyes
1 wendys
2 popeyes
3 mcdonalds
4 mcdonalds
dtype: object
>>> pd.get_dummies(df.str.split(', ').explode(), drop_first=True)
popeyes wendys
0 0 0
0 1 0
1 0 1
2 1 0
3 0 0
4 0 0
Alternative:
>>> df.str.get_dummies(sep=', ').drop(columns=df.iloc[0].split(', ')[0])
popeyes wendys
0 1 0
1 0 1
2 1 0
3 0 0
4 0 0
答案2
得分: 0
可以使用切片来移除第一列:
s.str.get_dummies(', ').iloc[:, 1:]
输出:
popeyes wendys
0 1 0
1 0 1
2 1 0
3 0 0
4 0 0
英文:
You can slice to remove the first column:
s.str.get_dummies(', ').iloc[:, 1:]
Output:
popeyes wendys
0 1 0
1 0 1
2 1 0
3 0 0
4 0 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论