2023年7月3日 17:20:41go评论92阅读模式

英文:

Create new column based on missing values

问题

我想创建一个基于其他列的新列。 product5 是最好的，product2 是第二好的。因此，如果可用，新列应该使用product5。如果不可用，然后尝试product2等。它需要根据在列表中指定的项目顺序来概括到许多列中：

cols_pref_inorder = ["product5", "product2", "product"...]
df   product  product2 product5
0    apple    Appl     Apple
1    banan    Banan     NaN

我尝试过：

def create(x):
  if pd.notnull(df["product5"]):
     return df["product5"]
  ...
df["Product_final"] = df.apply(create, axis=1)

期望输出：

df   product  product2 product5 Product_final
0    apple    Appl     Apple     Apple          （Product_final 使用 product5，因为可用）
1    banan    Banan    NaN       Banan          （Product_final 使用 product2，因为product5缺失）

英文:

I want to create a new column that is based on the other columns. product5 is the best, product2 is the second-best. So the new column should use product5 if available. If not, then try product2 etc. It needs to generalise to many columns based on the order of the items specified in the list:

cols_pref_inorder = [&quot;product5&quot;, &quot;product2&quot;, &quot;product&quot;...]
df   product  product2 product5
0    apple    Appl     Apple
1    banan    Banan     NaN

I tried:

def create(x):
  if pd.notnull(df[&quot;product5&quot;]):
     return df[&quot;product5&quot;]
  ...
df[&quot;Product_final&quot;] = df.apply(create, axis=1)

Expected Output

df   product  product2 product5 Product_final
0    apple    Appl     Apple     Apple          (Product_final uses product5 since available)
1    banan    Banan    NaN       Banan          (Product_final uses product2 since product5 is missing)

答案1

得分: 2

你可以使用 reindex、bfill，以及 iloc 来操作第一列：

df['Product_final'] = (df.reindex(cols_pref_inorder, axis=1)
                     .bfill(axis=1).iloc[:, 0]
                  )

输出结果：

   df product product2 product5 Product_final
0   0   apple     Appl    Apple         Apple
1   1   banan    Banan      NaN         Banan

在切片之前的中间步骤（重新索引和后向填充）：

  product5 product2 product
0    Apple     Appl   apple
1    Banan    Banan   banan

英文:

You can reindex, bfill and slice the first column with iloc:

df[&#39;Product_final&#39;] = (df.reindex(cols_pref_inorder, axis=1)
                         .bfill(axis=1).iloc[:, 0]
                      )

Output:

   df product product2 product5 Product_final
0   0   apple     Appl    Apple         Apple
1   1   banan    Banan      NaN         Banan

Intermediate (reindexed and bfilled) before slicing:

  product5 product2 product
0    Apple     Appl   apple
1    Banan    Banan   banan

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建新列基于缺失值

问题

答案1

Python DataFrame：创建一个列，根据日期和月份进行动态计算/百分比计算。

statsmodels 的 `.summary()` 和 `.summary2()` 函数有什么区别？

如何用一行代码创建一个循环列表？

打印句子的方法是基于您输入的单词（您的名字）的第一个字母。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。