英文:
Subsetting a pandas dataframe based on the match between df's column names and a list of column names with repeated names
问题
我有一列具有重复名称的列名列表,例如:
list = ['col1', 'col3', 'col1']
以及一个pandas数据帧,其中一些列名与列表项匹配:
col1 col2 col3 col4
0 a c e g
1 b d f h
我想要对数据帧进行子集操作,使得返回的数据帧具有col1、col2和再次的col1,如下:
col1 col2 col1
0 a c a
1 b d b
我尝试了以下代码:
names = ['col1', 'col4', 'col1']
df = pd.DataFrame({'col1': ['a', 'b'], 'col2': ['c', 'd'], 'col3': ['e', 'f'], 'col4': ['g', 'h']} )
df2 = df[df.columns & names]
df2
这将只返回col1和col4,而不会重复col1:
col1 col4
0 a g
1 b h
英文:
I have list of column names with duplicated names, for example:
list = ['col1', 'col3', 'col1']
and a pandas dataframe that some of its column names match the list items:
col1 col2 col3 col4
a c e g
b d f h
I would like to subset the dataframe so that the returned df would have col1, col2 and again col1 like:
col1 col2 col1
a c a
b d b
I have tried
names = ['col1', 'col4', 'col1']
df = pd.DataFrame({'col1': ['a','b'],'col2': ['c', 'd'], 'col3': ['e', 'f'], 'col4': ['g', 'h']} )
df2 = df[df.columns & names]
df2
which returns only col1 and col4 without duplicating col1:
col1 col4
a g
b h
答案1
得分: 1
只需将包含要提取的所有列(包括重复列)的列表传递给DataFrame索引运算符。
# 要提取的列的名称,包括重复列。
names = ['col1', 'col2', 'col1']
# 创建一个只包含列表中列的新DataFrame。
df2 = df[names]
print(df2)
输出:
col1 col2 col1
0 a c a
1 b d b
英文:
Simply pass a list to the DataFrame indexing operator, containing all the columns you want to extract (including repeats).
# The names of the columns to be extracted, including duplicates.
names = ['col1', 'col2', 'col1']
# Create a new DataFrame made up only of the columns from the list.
df2 = df[names]
print(df2)
Output:
col1 col2 col1
0 a c a
1 b d b
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论