2023年3月7日 22:23:37go评论97阅读模式

英文:

Subsetting a pandas dataframe based on the match between df's column names and a list of column names with repeated names

问题

我有一列具有重复名称的列名列表，例如：
list = ['col1', 'col3', 'col1']

以及一个pandas数据帧，其中一些列名与列表项匹配：

   col1  col2  col3  col4
0    a     c     e     g
1    b     d     f     h

我想要对数据帧进行子集操作，使得返回的数据帧具有col1、col2和再次的col1，如下：

   col1  col2  col1
0    a     c    a
1    b     d    b

我尝试了以下代码：

names = ['col1', 'col4', 'col1']
df = pd.DataFrame({'col1': ['a', 'b'], 'col2': ['c', 'd'], 'col3': ['e', 'f'], 'col4': ['g', 'h']} )
df2 = df[df.columns & names]
df2

这将只返回col1和col4，而不会重复col1：

   col1 col4
0    a    g
1    b    h

英文:

I have list of column names with duplicated names, for example:
list = ['col1', 'col3', 'col1']

and a pandas dataframe that some of its column names match the list items:

col1	col2	col3	col4
 a	     c	      e 	  g
 b	     d	      f	      h

I would like to subset the dataframe so that the returned df would have col1, col2 and again col1 like:

 col1	col2     col1
   a	c        a
   b	d        b

I have tried

names = [&#39;col1&#39;, &#39;col4&#39;, &#39;col1&#39;]
df = pd.DataFrame({&#39;col1&#39;: [&#39;a&#39;,&#39;b&#39;],&#39;col2&#39;: [&#39;c&#39;, &#39;d&#39;], &#39;col3&#39;: [&#39;e&#39;, &#39;f&#39;], &#39;col4&#39;: [&#39;g&#39;, &#39;h&#39;]} )
df2 = df[df.columns &amp; names]
df2

which returns only col1 and col4 without duplicating col1:

col1	col4
  a	     g
  b    	 h

答案1

得分: 1

只需将包含要提取的所有列（包括重复列）的列表传递给DataFrame索引运算符。

# 要提取的列的名称，包括重复列。
names = ['col1', 'col2', 'col1']

# 创建一个只包含列表中列的新DataFrame。
df2 = df[names]

print(df2)

输出：

  col1 col2 col1
0    a    c    a
1    b    d    b

英文:

Simply pass a list to the DataFrame indexing operator, containing all the columns you want to extract (including repeats).

# The names of the columns to be extracted, including duplicates. 
names = [&#39;col1&#39;, &#39;col2&#39;, &#39;col1&#39;]

# Create a new DataFrame made up only of the columns from the list.
df2 = df[names]

print(df2)

Output:

  col1 col2 col1
0    a    c    a
1    b    d    b

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据`df`的列名与重复列名列表之间的匹配来对Pandas数据帧进行子集化。

问题

答案1

如何创建一个回调函数，使其在 Python Dash 按钮工作时绘制图表。

TypeError: 不支持的操作类型：’Tensor’ 和 ‘NoneType’

不能从hdfscli导入Python hdfs客户端或配置模块。

Python is outputting dictionary as a single string instead of an organised list of values, how do I fix this?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论