创建新列基于缺失值

huangapple go评论92阅读模式
英文:

Create new column based on missing values

问题

我想创建一个基于其他列的新列。 product5 是最好的,product2 是第二好的。因此,如果可用,新列应该使用product5。如果不可用,然后尝试product2等。它需要根据在列表中指定的项目顺序来概括到许多列中:

  1. cols_pref_inorder = ["product5", "product2", "product"...]
  2. df product product2 product5
  3. 0 apple Appl Apple
  4. 1 banan Banan NaN

我尝试过:

  1. def create(x):
  2. if pd.notnull(df["product5"]):
  3. return df["product5"]
  4. ...
  5. df["Product_final"] = df.apply(create, axis=1)

期望输出:

  1. df product product2 product5 Product_final
  2. 0 apple Appl Apple Apple Product_final 使用 product5因为可用
  3. 1 banan Banan NaN Banan Product_final 使用 product2因为product5缺失
英文:

I want to create a new column that is based on the other columns. product5 is the best, product2 is the second-best. So the new column should use product5 if available. If not, then try product2 etc. It needs to generalise to many columns based on the order of the items specified in the list:

  1. cols_pref_inorder = ["product5", "product2", "product"...]
  2. df product product2 product5
  3. 0 apple Appl Apple
  4. 1 banan Banan NaN

I tried:

  1. def create(x):
  2. if pd.notnull(df["product5"]):
  3. return df["product5"]
  4. ...
  5. df["Product_final"] = df.apply(create, axis=1)

Expected Output

  1. df product product2 product5 Product_final
  2. 0 apple Appl Apple Apple (Product_final uses product5 since available)
  3. 1 banan Banan NaN Banan (Product_final uses product2 since product5 is missing)

答案1

得分: 2

你可以使用 reindexbfill,以及 iloc 来操作第一列:

  1. df['Product_final'] = (df.reindex(cols_pref_inorder, axis=1)
  2. .bfill(axis=1).iloc[:, 0]
  3. )

输出结果:

  1. df product product2 product5 Product_final
  2. 0 0 apple Appl Apple Apple
  3. 1 1 banan Banan NaN Banan

在切片之前的中间步骤(重新索引和后向填充):

  1. product5 product2 product
  2. 0 Apple Appl apple
  3. 1 Banan Banan banan
英文:

You can reindex, bfill and slice the first column with iloc:

  1. df['Product_final'] = (df.reindex(cols_pref_inorder, axis=1)
  2. .bfill(axis=1).iloc[:, 0]
  3. )

Output:

  1. df product product2 product5 Product_final
  2. 0 0 apple Appl Apple Apple
  3. 1 1 banan Banan NaN Banan

Intermediate (reindexed and bfilled) before slicing:

  1. product5 product2 product
  2. 0 Apple Appl apple
  3. 1 Banan Banan banan

huangapple
  • 本文由 发表于 2023年7月3日 17:20:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603429.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定