Pandas选择列按顺序排在前面,其余列保持不变。

huangapple go评论70阅读模式
英文:

Pandas select columns ordered at the beginning and the rest remain unchanged

问题

例如,我有一个包含许多列的数据框,列的数量不确定,例如在10到20之间。

列名如下:

RecordID, price, company, date, feature1, return, some_inf, feature2, feature3, ...

示例数据:

column_names = ["RecordID", "price", "company", "date", "feature1", "return", "some_inf", "feature2", "feature3"]
values = [1, 9.99, "ABC", 20230101, 888, 0.666, "happy_everyday", "helloworld", "test"]
df = pd.DataFrame(values).T
df.columns = column_names

在所有这些列中,我想挑选出一些列(如果它们存在),并将它们放在最前面,其余的列顺序不变。例如,如果我想选择date, volume, price, return,那么输出(带有重新排序的列)将是:

date, price, return, RecordID, company, feature1, some_inf, feature2, feature3, ...

volume 列在原始数据框中不存在,因此它也不应出现在最终输出中。即输出数据框应包含选择列表中的前几列(如果它们也在原始数据框中),然后是不在此列表中的列,顺序不变。

有没有快速实现这个的方法?

英文:

For example, I have dataframe with many columns, with the number of columns not clear, e.g.. between 10 and 20.

The column name in the follows:

RecordID, price, company, date, feature1, return, some_inf, feature2, feature3, ...

Sample data:

column_names = ["RecordID", "price", "company", "date", "feature1", "return", "some_inf", "feature2", "feature3"]
values = [1, 9.99, "ABC", 20230101, 888, 0.666, "happy_everyday", "helloworld", "test"]
df = pd.DataFrame(values).T
df.columns = column_names

Among all these columns, I would like to pick out some columns (if they exist) and put them at the beginning, and the rest columns follows with order unchanged. For example, if I want to select date, volume, price, return

Then the output (with re-ordered columns) will be

date, price, return, RecordID, company, feature1, some_inf, feature2, feature3, ...

The volume column does not exist in the original dataframe, so that it should also not be in the final output. I.e. The output dataframe should have the first several column in the selection list (if they also are in the original dataframe), then followed by columns not in this list, with orders unchanged.

Any fast way to implement this?

答案1

得分: 3

Use Index.intersection for all columns for beginning with Index.append by columns from Index.difference:

cols = ['date', 'volume', 'price', 'return']
new = (pd.Index(cols).intersection(df.columns, sort=False)
         .append(df.columns.difference(cols, sort=False)))
df = df[new]
print (df)
       date price return RecordID company feature1        some_inf  \
0  20230101  9.99  0.666        1     ABC      888  happy_everyday   

     feature2 feature3  
0  helloworld     test
英文:

Use Index.intersection for all columns for begining with Index.append by columns from Index.difference:

cols = ['date', 'volume', 'price', 'return']
new = (pd.Index(cols).intersection(df.columns, sort=False)
         .append(df.columns.difference(cols, sort=False)))
df = df[new]
print (df)
       date price return RecordID company feature1        some_inf  \
0  20230101  9.99  0.666        1     ABC      888  happy_everyday   

     feature2 feature3  
0  helloworld     test  

huangapple
  • 本文由 发表于 2023年2月6日 14:07:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75357850.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定