df.str.get_dummies() 与 pd.get_dummies() (Python)

huangapple go评论65阅读模式
英文:

df.str.get_dummies() vs pd.get_dummies() (Python)

问题

我有一个类似如下的系列:

0 mcdonalds, popeyes
1 wendys
2 popeyes
3 mcdonalds
4 mcdonalds

我使用以下代码:

df.str.get_dummies(sep = ', ')

来获取以下数据框:

popeyes wendys mcdonalds
1       0      1
0       1      0
1       0      0
0       0      1
0       0      1

我想要删除一列,以解决虚拟变量陷阱的问题。我应该如何做,就像pd.get_dummies()中的drop_first参数一样?

期望的输出可能类似于以下内容,但我不想硬编码删除一个随机列:

popeyes wendys 
1       0      
0       1      
1       0      
0       0      
0       0      
英文:

I have a series like so:

0 mcdonalds, popeyes
1 wendys
2 popeyes
3 mcdonalds
4 mcdonalds

I use the following code:

df.str.get_dummies(sep = ', ')

to get the following data frame:

popeyes wendys mcdonalds
1       0      1
0       1      0
1       0      0
0       0      1
0       0      1

I want to remove a column though to account for the dummy variable trap. how do i do this like in the drop_first argument in pd.get_dummies()?

expected output might look something like this, but i don't want to hardcode to drop a random column:

popeyes wendys 
1       0      
0       1      
1       0      
0       0      
0       0      

答案1

得分: 1

You can explode your Series before using pd.get_dummies:

(pd.get_dummies(df.str.split(',').explode(), drop_first=True)
   .groupby(level=0).max())

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

Alternative:

df.str.get_dummies(sep=',').drop(columns=df.iloc[0].split(',')[0])

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0
英文:

You can explode your Series before using pd.get_dummies:

>>> (pd.get_dummies(df.str.split(', ').explode(), drop_first=True)
       .groupby(level=0).max())

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

Details:

>>> df.str.split(', ').explode()
0    mcdonalds
0      popeyes
1       wendys
2      popeyes
3    mcdonalds
4    mcdonalds
dtype: object

>>> pd.get_dummies(df.str.split(', ').explode(), drop_first=True)
   popeyes  wendys
0        0       0
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

Alternative:

>>> df.str.get_dummies(sep=', ').drop(columns=df.iloc[0].split(', ')[0])

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

答案2

得分: 0

可以使用切片来移除第一列:

s.str.get_dummies(', ').iloc[:, 1:]

输出:

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0
英文:

You can slice to remove the first column:

s.str.get_dummies(', ').iloc[:, 1:]

Output:

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

huangapple
  • 本文由 发表于 2023年6月8日 02:47:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76426243.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定