2023年2月6日 06:07:32go评论87阅读模式

英文:

How do I get columns that are generated by pandas.get_dummies()?

问题

我有以下的数据框：

如果我想为 c1, c2, c3 列创建一个独热编码列：

但如何获取由 get_dummies() 生成的列的列表呢？

例如：['c1_a', 'c1_b', 'c1_c', 'c1_nan', 'c2_bbw', 'c2_h1', 'c2_we', 'c2_nan', 'c3_ebay', 'c3_tt', 'c3_yahoo', 'c3_nan']

我知道一种方法是使用 list(set(df_updated.columns) - set(df.columns))，但是否有更好的方法？

英文:

I have the following dataframe:

&gt;&gt;&gt; df
   n1  n2   dense c1   c2     c3
0   1   4  [1, 4]  a   h1     tt
1   2   5  [2, 5]  b  bbw   ebay
2   3   6  [3, 6]  c   we  yahoo

If I want to create a one-hot encoding columns for c1, c2, c3 columns:

&gt;&gt;&gt; df_updated = pd.get_dummies(df, prefix_sep=&#39;_&#39;, dummy_na=True, columns=[&#39;c1&#39;, &#39;c2&#39;, &#39;c3&#39;])
&gt;&gt;&gt; df_updated
   n1  n2   dense  c1_a  c1_b  c1_c  c1_nan  c2_bbw  c2_h1  c2_we  c2_nan  c3_ebay  c3_tt  c3_yahoo  c3_nan
0   1   4  [1, 4]     1     0     0       0       0      1      0       0        0      1         0       0
1   2   5  [2, 5]     0     1     0       0       1      0      0       0        1      0         0       0
2   3   6  [3, 6]     0     0     1       0       0      0      1       0        0      0         1       0

But how can I get a list of columns that is generated by get_dummies()?

Ex. ['c1_a', 'c1_b', 'c1_c', 'c1_nan', 'c2_bbw', 'c2_h1', 'c2_we', 'c2_nan', 'c3_ebay', 'c3_tt', 'c3_yahoo', 'c3_nan']

I know one way of doing that is list(set(df_updated.columns) - set(df.columns)) but is there a better way?

答案1

得分: 0

One way is to store the pre hot-encoded columns in a variable and then use filter :

cols, sep = ['c1', 'c2', 'c3'], '_'
df_updated = pd.get_dummies(df, prefix_sep=sep,
                            dummy_na=True, columns=cols)
df_dum = df_updated.filter(regex=f'^{"|".join(cols)}{sep}\w+', axis=1)

Or, simply and even better, use difference :

cols_dum = list(df_updated.columns.difference(df))

Output :

print(list(df_dum.columns)) #or print(cols_dum)
['c1_a', 'c1_b', 'c1_c', 'c1_nan', 'c2_bbw', 'c2_h1',
 'c2_we', 'c2_nan', 'c3_ebay', 'c3_tt', 'c3_yahoo', 'c3_nan']

英文:

One way is to store the pre hot-encoded columns in a variable and then use filter :

cols, sep = [&#39;c1&#39;, &#39;c2&#39;, &#39;c3&#39;], &#39;_&#39;
df_updated = pd.get_dummies(df, prefix_sep=sep,
                            dummy_na=True, columns=cols)
df_dum = df_updated.filter(regex=f&#39;^{&quot;|&quot;.join(cols)}{sep}\w+&#39;, axis=1)

Or, simply and even better, use difference :

cols_dum = list(df_updated.columns.difference(df))

Output :

print(list(df_dum.columns)) #or print(cols_dum)
[&#39;c1_a&#39;, &#39;c1_b&#39;, &#39;c1_c&#39;, &#39;c1_nan&#39;, &#39;c2_bbw&#39;, &#39;c2_h1&#39;,
 &#39;c2_we&#39;, &#39;c2_nan&#39;, &#39;c3_ebay&#39;, &#39;c3_tt&#39;, &#39;c3_yahoo&#39;, &#39;c3_nan&#39;]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何获取由pandas.get_dummies()生成的列？

问题

答案1

忽略 Python 断言中的 f-string 片段。

Python pandas，筛选时输出不佳。

TensorFlow仅显示（并使用）CPU，而GPU可用。

在Pandas数据框中包含None和数字的列无法被修改。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。