2023年6月8日 02:47:50go评论172阅读模式

英文:

df.str.get_dummies() vs pd.get_dummies() (Python)

问题

我有一个类似如下的系列：

0 mcdonalds, popeyes
1 wendys
2 popeyes
3 mcdonalds
4 mcdonalds

我使用以下代码：

df.str.get_dummies(sep = ', ')

来获取以下数据框：

popeyes wendys mcdonalds
1       0      1
0       1      0
1       0      0
0       0      1
0       0      1

我想要删除一列，以解决虚拟变量陷阱的问题。我应该如何做，就像pd.get_dummies()中的drop_first参数一样？

期望的输出可能类似于以下内容，但我不想硬编码删除一个随机列：

popeyes wendys 
1       0      
0       1      
1       0      
0       0      
0       0

英文:

I have a series like so:

0 mcdonalds, popeyes
1 wendys
2 popeyes
3 mcdonalds
4 mcdonalds

I use the following code:

df.str.get_dummies(sep = &#39;, &#39;)

to get the following data frame:

popeyes wendys mcdonalds
1       0      1
0       1      0
1       0      0
0       0      1
0       0      1

I want to remove a column though to account for the dummy variable trap. how do i do this like in the drop_first argument in pd.get_dummies()?

expected output might look something like this, but i don't want to hardcode to drop a random column:

popeyes wendys 
1       0      
0       1      
1       0      
0       0      
0       0

答案1

得分: 1

You can explode your Series before using pd.get_dummies:

(pd.get_dummies(df.str.split(',').explode(), drop_first=True)
   .groupby(level=0).max())

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

Alternative:

df.str.get_dummies(sep=',').drop(columns=df.iloc[0].split(',')[0])

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

英文:

You can explode your Series before using pd.get_dummies:

&gt;&gt;&gt; (pd.get_dummies(df.str.split(&#39;, &#39;).explode(), drop_first=True)
       .groupby(level=0).max())

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

Details:

&gt;&gt;&gt; df.str.split(&#39;, &#39;).explode()
0    mcdonalds
0      popeyes
1       wendys
2      popeyes
3    mcdonalds
4    mcdonalds
dtype: object

&gt;&gt;&gt; pd.get_dummies(df.str.split(&#39;, &#39;).explode(), drop_first=True)
   popeyes  wendys
0        0       0
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

Alternative:

&gt;&gt;&gt; df.str.get_dummies(sep=&#39;, &#39;).drop(columns=df.iloc[0].split(&#39;, &#39;)[0])

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

答案2

得分: 0

可以使用切片来移除第一列：

s.str.get_dummies(', ').iloc[:, 1:]

输出：

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

英文:

You can slice to remove the first column:

s.str.get_dummies(&#39;, &#39;).iloc[:, 1:]

Output:

   popeyes  wendys
0        1       0
1        0       1
2        1       0
3        0       0
4        0       0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

df.str.get_dummies() 与 pd.get_dummies() (Python)

问题

答案1

答案2

TypeError导入hdbscan时出现问题。

使用C++中的OpenCV矩阵和Eigen旋转图像90度。

在Pandas中填充不同数据框列切片中的NA值。

Python paho MQTT的`loop_forever()`：如何在脚本运行时将输出重定向到文件？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论