2023年6月15日 05:14:54go评论160阅读模式

英文:

Function on pandas columns in series

问题

以下是翻译好的部分：

我希望对一组列执行函数。表格设置如下：

Column_A	Column_B	Column_C	Column_D
123	456	null	null
258	null	456	null
null	785	null	null
null	null	794	null

基本上，我想按列的“重要性”顺序执行操作（为了这个示例，假设我尝试从值中减去10），并将结果存储在Column_D中。例如：

如果Column_A不为空，则Column_D = Column_A - 10。
如果Column_A为空，则Column_D = Column_B - 10。
如果Column_B为空，则Column_D = Column_C - 10。

Column_B和Column_C不应同时填充，要么一个填充，要么另一个填充，但可以与Column_A同时填充。这个数据集非常大，所以无法迭代行。有人有什么建议吗？

到目前为止，我尝试了if/elif/else逻辑，但在我的情况下不适用于整个数据框。也许我在做错了什么，我将尝试任何人提供的解决方案，即使它是一个if/elif/else语句。

在代码完成后，我计划尝试以下操作：


df.Column_D = df.Column_A.combinefirst(df.Column_B)
df['Column_E'] = df.Column_A.combinefirst(df.Column_C)
df.Column_D = df.Column_D.fillna(df.Column_E)

英文:

I am looking to preform a function on a set of columns The table setup is as follows:

Column_A	Column_B	Column_C	Column_D
123	456	null	null
258	null	456	null
null	785	null	null
null	null	794	null

I am essentially looking to preform an operation (for the purpose of this example lets say I am trying to subtract 10 from the value) in order of "importance" of the column and store that in Column_D. For example:

If Column_A is not null, then Column_D = Column_A - 10.
If Column_A is null, then Column_D = Column_B - 10.
If Column_B is null, then Column_D = Column_C - 10.

Column_B and Column_C should never both be populated, it is one or the other, but they can be populated simultaneously with Column_A. This dataset is absurdly large, so iterating through the rows is not a possibility. Does anyone have any suggestions?

So far I have tried if/elif/else logic, but that doesn't apply the values to the whole dataframe in my case. Maybe I am doing it incorrectly, I will try whatever solutions people have, even if it is an if/elif/else statement.

I am planning on trying the following after the code finishes:


df.Column_D = df.Column_A.combinefirst(df.Column_B)
df[&#39;Column_E&#39;] = df.Column_A.combinefirst(df.Column_C)
df.Column_D = df.Column_D.fillna(df.Column_E)

答案1

得分: 1

你可以使用bfill方法，根据你指定的顺序填充缺失值。请看以下示例：

df = pd.DataFrame({"Column_A": [123, 258, np.NaN, np.NaN],
                   "Column_B": [456, np.NaN, 785, np.NaN],
                   "Column_C": [np.NaN, 456, np.NaN, 794]})
df['column_D'] = df[['Column_A', 'Column_B', 'Column_C']].bfill(axis=1).iloc[:, 0] - 10
df

结果如下：

   Column_A  Column_B  Column_C  column_D
0     123.0     456.0       NaN     113.0
1     258.0       NaN     456.0     248.0
2       NaN     785.0       NaN     775.0
3       NaN       NaN     794.0     784.0

英文:

You can make use of bfill which fills the missing values based on the sequence that you specified. See below as example:

&gt;&gt;&gt; df = pd.DataFrame({&quot;Column_A&quot;: [123, 258, np.NaN, np.NaN],&quot;Column_B&quot;: [456,np.NaN,785,np.NaN],&quot;Column_C&quot;: [np.NaN,456,np.NaN,794]})
&gt;&gt;&gt; df[&#39;column_D&#39;] = df[[&#39;Column_A&#39;, &#39;Column_B&#39;, &#39;Column_C&#39;]].bfill(axis=1).iloc[:, 0] - 10
&gt;&gt;&gt; df
   Column_A  Column_B  Column_C  column_D
0     123.0     456.0       NaN     113.0
1     258.0       NaN     456.0     248.0
2       NaN     785.0       NaN     775.0
3       NaN       NaN     794.0     784.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas系列中对列执行的函数。

问题

答案1

Searching for hidden API to scrape data with Python

如何将使用pytesseract.image_to_string提取的信息转换为数据框？

如何在pandas数据框中选择特定的值？

我对浅复制和深复制中的对象引用方式感到困惑。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。