英文:
Function on pandas columns in series
问题
以下是翻译好的部分:
我希望对一组列执行函数。表格设置如下:
Column_A | Column_B | Column_C | Column_D |
---|---|---|---|
123 | 456 | null | null |
258 | null | 456 | null |
null | 785 | null | null |
null | null | 794 | null |
基本上,我想按列的“重要性”顺序执行操作(为了这个示例,假设我尝试从值中减去10),并将结果存储在Column_D中。例如:
- 如果Column_A不为空,则Column_D = Column_A - 10。
- 如果Column_A为空,则Column_D = Column_B - 10。
- 如果Column_B为空,则Column_D = Column_C - 10。
Column_B和Column_C不应同时填充,要么一个填充,要么另一个填充,但可以与Column_A同时填充。这个数据集非常大,所以无法迭代行。有人有什么建议吗?
到目前为止,我尝试了if/elif/else逻辑,但在我的情况下不适用于整个数据框。也许我在做错了什么,我将尝试任何人提供的解决方案,即使它是一个if/elif/else语句。
在代码完成后,我计划尝试以下操作:
df.Column_D = df.Column_A.combinefirst(df.Column_B)
df['Column_E'] = df.Column_A.combinefirst(df.Column_C)
df.Column_D = df.Column_D.fillna(df.Column_E)
英文:
I am looking to preform a function on a set of columns The table setup is as follows:
Column_A | Column_B | Column_C | Column_D |
---|---|---|---|
123 | 456 | null | null |
258 | null | 456 | null |
null | 785 | null | null |
null | null | 794 | null |
I am essentially looking to preform an operation (for the purpose of this example lets say I am trying to subtract 10 from the value) in order of "importance" of the column and store that in Column_D. For example:
- If Column_A is not null, then Column_D = Column_A - 10.
- If Column_A is null, then Column_D = Column_B - 10.
- If Column_B is null, then Column_D = Column_C - 10.
Column_B and Column_C should never both be populated, it is one or the other, but they can be populated simultaneously with Column_A. This dataset is absurdly large, so iterating through the rows is not a possibility. Does anyone have any suggestions?
So far I have tried if/elif/else logic, but that doesn't apply the values to the whole dataframe in my case. Maybe I am doing it incorrectly, I will try whatever solutions people have, even if it is an if/elif/else statement.
I am planning on trying the following after the code finishes:
df.Column_D = df.Column_A.combinefirst(df.Column_B)
df['Column_E'] = df.Column_A.combinefirst(df.Column_C)
df.Column_D = df.Column_D.fillna(df.Column_E)
答案1
得分: 1
你可以使用bfill
方法,根据你指定的顺序填充缺失值。请看以下示例:
df = pd.DataFrame({"Column_A": [123, 258, np.NaN, np.NaN],
"Column_B": [456, np.NaN, 785, np.NaN],
"Column_C": [np.NaN, 456, np.NaN, 794]})
df['column_D'] = df[['Column_A', 'Column_B', 'Column_C']].bfill(axis=1).iloc[:, 0] - 10
df
结果如下:
Column_A Column_B Column_C column_D
0 123.0 456.0 NaN 113.0
1 258.0 NaN 456.0 248.0
2 NaN 785.0 NaN 775.0
3 NaN NaN 794.0 784.0
英文:
You can make use of bfill which fills the missing values based on the sequence that you specified. See below as example:
>>> df = pd.DataFrame({"Column_A": [123, 258, np.NaN, np.NaN],"Column_B": [456,np.NaN,785,np.NaN],"Column_C": [np.NaN,456,np.NaN,794]})
>>> df['column_D'] = df[['Column_A', 'Column_B', 'Column_C']].bfill(axis=1).iloc[:, 0] - 10
>>> df
Column_A Column_B Column_C column_D
0 123.0 456.0 NaN 113.0
1 258.0 NaN 456.0 248.0
2 NaN 785.0 NaN 775.0
3 NaN NaN 794.0 784.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论