创建新列基于缺失值

huangapple go评论71阅读模式
英文:

Create new column based on missing values

问题

我想创建一个基于其他列的新列。 product5 是最好的,product2 是第二好的。因此,如果可用,新列应该使用product5。如果不可用,然后尝试product2等。它需要根据在列表中指定的项目顺序来概括到许多列中:

cols_pref_inorder = ["product5", "product2", "product"...]
df   product  product2 product5
0    apple    Appl     Apple
1    banan    Banan     NaN    

我尝试过:

def create(x):
  if pd.notnull(df["product5"]):
     return df["product5"]
  ...

df["Product_final"] = df.apply(create, axis=1)

期望输出:

df   product  product2 product5 Product_final
0    apple    Appl     Apple     Apple          Product_final 使用 product5因为可用
1    banan    Banan    NaN       Banan          Product_final 使用 product2因为product5缺失
英文:

I want to create a new column that is based on the other columns. product5 is the best, product2 is the second-best. So the new column should use product5 if available. If not, then try product2 etc. It needs to generalise to many columns based on the order of the items specified in the list:

cols_pref_inorder = ["product5", "product2", "product"...]
df   product  product2 product5
0    apple    Appl     Apple
1    banan    Banan     NaN    

I tried:

def create(x):
  if pd.notnull(df["product5"]):
     return df["product5"]
  ...

df["Product_final"] = df.apply(create, axis=1)

Expected Output

df   product  product2 product5 Product_final
0    apple    Appl     Apple     Apple          (Product_final uses product5 since available)
1    banan    Banan    NaN       Banan          (Product_final uses product2 since product5 is missing)

答案1

得分: 2

你可以使用 reindexbfill,以及 iloc 来操作第一列:

df['Product_final'] = (df.reindex(cols_pref_inorder, axis=1)
                     .bfill(axis=1).iloc[:, 0]
                  )

输出结果:

   df product product2 product5 Product_final
0   0   apple     Appl    Apple         Apple
1   1   banan    Banan      NaN         Banan

在切片之前的中间步骤(重新索引和后向填充):

  product5 product2 product
0    Apple     Appl   apple
1    Banan    Banan   banan
英文:

You can reindex, bfill and slice the first column with iloc:

df['Product_final'] = (df.reindex(cols_pref_inorder, axis=1)
                         .bfill(axis=1).iloc[:, 0]
                      )

Output:

   df product product2 product5 Product_final
0   0   apple     Appl    Apple         Apple
1   1   banan    Banan      NaN         Banan

Intermediate (reindexed and bfilled) before slicing:

  product5 product2 product
0    Apple     Appl   apple
1    Banan    Banan   banan

huangapple
  • 本文由 发表于 2023年7月3日 17:20:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603429.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定