df.str.get_dummies() 与 pd.get_dummies() (Python)

huangapple go评论102阅读模式
英文:

df.str.get_dummies() vs pd.get_dummies() (Python)

问题

我有一个类似如下的系列:

  1. 0 mcdonalds, popeyes
  2. 1 wendys
  3. 2 popeyes
  4. 3 mcdonalds
  5. 4 mcdonalds

我使用以下代码:

  1. df.str.get_dummies(sep = ', ')

来获取以下数据框:

  1. popeyes wendys mcdonalds
  2. 1 0 1
  3. 0 1 0
  4. 1 0 0
  5. 0 0 1
  6. 0 0 1

我想要删除一列,以解决虚拟变量陷阱的问题。我应该如何做,就像pd.get_dummies()中的drop_first参数一样?

期望的输出可能类似于以下内容,但我不想硬编码删除一个随机列:

  1. popeyes wendys
  2. 1 0
  3. 0 1
  4. 1 0
  5. 0 0
  6. 0 0
英文:

I have a series like so:

  1. 0 mcdonalds, popeyes
  2. 1 wendys
  3. 2 popeyes
  4. 3 mcdonalds
  5. 4 mcdonalds

I use the following code:

  1. df.str.get_dummies(sep = ', ')

to get the following data frame:

  1. popeyes wendys mcdonalds
  2. 1 0 1
  3. 0 1 0
  4. 1 0 0
  5. 0 0 1
  6. 0 0 1

I want to remove a column though to account for the dummy variable trap. how do i do this like in the drop_first argument in pd.get_dummies()?

expected output might look something like this, but i don't want to hardcode to drop a random column:

  1. popeyes wendys
  2. 1 0
  3. 0 1
  4. 1 0
  5. 0 0
  6. 0 0

答案1

得分: 1

You can explode your Series before using pd.get_dummies:

  1. (pd.get_dummies(df.str.split(',').explode(), drop_first=True)
  2. .groupby(level=0).max())
  3. popeyes wendys
  4. 0 1 0
  5. 1 0 1
  6. 2 1 0
  7. 3 0 0
  8. 4 0 0

Alternative:

  1. df.str.get_dummies(sep=',').drop(columns=df.iloc[0].split(',')[0])
  2. popeyes wendys
  3. 0 1 0
  4. 1 0 1
  5. 2 1 0
  6. 3 0 0
  7. 4 0 0
英文:

You can explode your Series before using pd.get_dummies:

  1. >>> (pd.get_dummies(df.str.split(', ').explode(), drop_first=True)
  2. .groupby(level=0).max())
  3. popeyes wendys
  4. 0 1 0
  5. 1 0 1
  6. 2 1 0
  7. 3 0 0
  8. 4 0 0

Details:

  1. >>> df.str.split(', ').explode()
  2. 0 mcdonalds
  3. 0 popeyes
  4. 1 wendys
  5. 2 popeyes
  6. 3 mcdonalds
  7. 4 mcdonalds
  8. dtype: object
  9. >>> pd.get_dummies(df.str.split(', ').explode(), drop_first=True)
  10. popeyes wendys
  11. 0 0 0
  12. 0 1 0
  13. 1 0 1
  14. 2 1 0
  15. 3 0 0
  16. 4 0 0

Alternative:

  1. >>> df.str.get_dummies(sep=', ').drop(columns=df.iloc[0].split(', ')[0])
  2. popeyes wendys
  3. 0 1 0
  4. 1 0 1
  5. 2 1 0
  6. 3 0 0
  7. 4 0 0

答案2

得分: 0

可以使用切片来移除第一列:

  1. s.str.get_dummies(', ').iloc[:, 1:]

输出:

  1. popeyes wendys
  2. 0 1 0
  3. 1 0 1
  4. 2 1 0
  5. 3 0 0
  6. 4 0 0
英文:

You can slice to remove the first column:

  1. s.str.get_dummies(', ').iloc[:, 1:]

Output:

  1. popeyes wendys
  2. 0 1 0
  3. 1 0 1
  4. 2 1 0
  5. 3 0 0
  6. 4 0 0

huangapple
  • 本文由 发表于 2023年6月8日 02:47:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76426243.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定