英文:
Pandas select columns ordered at the beginning and the rest remain unchanged
问题
例如,我有一个包含许多列的数据框,列的数量不确定,例如在10到20之间。
列名如下:
RecordID, price, company, date, feature1, return, some_inf, feature2, feature3, ...
示例数据:
column_names = ["RecordID", "price", "company", "date", "feature1", "return", "some_inf", "feature2", "feature3"]
values = [1, 9.99, "ABC", 20230101, 888, 0.666, "happy_everyday", "helloworld", "test"]
df = pd.DataFrame(values).T
df.columns = column_names
在所有这些列中,我想挑选出一些列(如果它们存在),并将它们放在最前面,其余的列顺序不变。例如,如果我想选择date, volume, price, return
,那么输出(带有重新排序的列)将是:
date, price, return, RecordID, company, feature1, some_inf, feature2, feature3, ...
volume
列在原始数据框中不存在,因此它也不应出现在最终输出中。即输出数据框应包含选择列表中的前几列(如果它们也在原始数据框中),然后是不在此列表中的列,顺序不变。
有没有快速实现这个的方法?
英文:
For example, I have dataframe with many columns, with the number of columns not clear, e.g.. between 10 and 20.
The column name in the follows:
RecordID, price, company, date, feature1, return, some_inf, feature2, feature3, ...
Sample data:
column_names = ["RecordID", "price", "company", "date", "feature1", "return", "some_inf", "feature2", "feature3"]
values = [1, 9.99, "ABC", 20230101, 888, 0.666, "happy_everyday", "helloworld", "test"]
df = pd.DataFrame(values).T
df.columns = column_names
Among all these columns, I would like to pick out some columns (if they exist) and put them at the beginning, and the rest columns follows with order unchanged. For example, if I want to select date, volume, price, return
Then the output (with re-ordered columns) will be
date, price, return, RecordID, company, feature1, some_inf, feature2, feature3, ...
The volume
column does not exist in the original dataframe, so that it should also not be in the final output. I.e. The output dataframe should have the first several column in the selection list (if they also are in the original dataframe), then followed by columns not in this list, with orders unchanged.
Any fast way to implement this?
答案1
得分: 3
Use Index.intersection
for all columns for beginning with Index.append
by columns from Index.difference
:
cols = ['date', 'volume', 'price', 'return']
new = (pd.Index(cols).intersection(df.columns, sort=False)
.append(df.columns.difference(cols, sort=False)))
df = df[new]
print (df)
date price return RecordID company feature1 some_inf \
0 20230101 9.99 0.666 1 ABC 888 happy_everyday
feature2 feature3
0 helloworld test
英文:
Use Index.intersection
for all columns for begining with Index.append
by columns from Index.difference
:
cols = ['date', 'volume', 'price', 'return']
new = (pd.Index(cols).intersection(df.columns, sort=False)
.append(df.columns.difference(cols, sort=False)))
df = df[new]
print (df)
date price return RecordID company feature1 some_inf \
0 20230101 9.99 0.666 1 ABC 888 happy_everyday
feature2 feature3
0 helloworld test
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论