在Pandas系列中对列执行的函数。

huangapple go评论138阅读模式
英文:

Function on pandas columns in series

问题

以下是翻译好的部分:

我希望对一组列执行函数。表格设置如下:

Column_A Column_B Column_C Column_D
123 456 null null
258 null 456 null
null 785 null null
null null 794 null

基本上,我想按列的“重要性”顺序执行操作(为了这个示例,假设我尝试从值中减去10),并将结果存储在Column_D中。例如:

  • 如果Column_A不为空,则Column_D = Column_A - 10。
  • 如果Column_A为空,则Column_D = Column_B - 10。
  • 如果Column_B为空,则Column_D = Column_C - 10。

Column_B和Column_C不应同时填充,要么一个填充,要么另一个填充,但可以与Column_A同时填充。这个数据集非常大,所以无法迭代行。有人有什么建议吗?

到目前为止,我尝试了if/elif/else逻辑,但在我的情况下不适用于整个数据框。也许我在做错了什么,我将尝试任何人提供的解决方案,即使它是一个if/elif/else语句。

在代码完成后,我计划尝试以下操作:


df.Column_D = df.Column_A.combinefirst(df.Column_B)

df['Column_E'] = df.Column_A.combinefirst(df.Column_C)

df.Column_D = df.Column_D.fillna(df.Column_E)
英文:

I am looking to preform a function on a set of columns The table setup is as follows:

Column_A Column_B Column_C Column_D
123 456 null null
258 null 456 null
null 785 null null
null null 794 null

I am essentially looking to preform an operation (for the purpose of this example lets say I am trying to subtract 10 from the value) in order of "importance" of the column and store that in Column_D. For example:

  • If Column_A is not null, then Column_D = Column_A - 10.
  • If Column_A is null, then Column_D = Column_B - 10.
  • If Column_B is null, then Column_D = Column_C - 10.

Column_B and Column_C should never both be populated, it is one or the other, but they can be populated simultaneously with Column_A. This dataset is absurdly large, so iterating through the rows is not a possibility. Does anyone have any suggestions?

So far I have tried if/elif/else logic, but that doesn't apply the values to the whole dataframe in my case. Maybe I am doing it incorrectly, I will try whatever solutions people have, even if it is an if/elif/else statement.

I am planning on trying the following after the code finishes:


df.Column_D = df.Column_A.combinefirst(df.Column_B)

df['Column_E'] = df.Column_A.combinefirst(df.Column_C)

df.Column_D = df.Column_D.fillna(df.Column_E)

答案1

得分: 1

你可以使用bfill方法,根据你指定的顺序填充缺失值。请看以下示例:

df = pd.DataFrame({"Column_A": [123, 258, np.NaN, np.NaN],
                   "Column_B": [456, np.NaN, 785, np.NaN],
                   "Column_C": [np.NaN, 456, np.NaN, 794]})
df['column_D'] = df[['Column_A', 'Column_B', 'Column_C']].bfill(axis=1).iloc[:, 0] - 10
df

结果如下:

   Column_A  Column_B  Column_C  column_D
0     123.0     456.0       NaN     113.0
1     258.0       NaN     456.0     248.0
2       NaN     785.0       NaN     775.0
3       NaN       NaN     794.0     784.0
英文:

You can make use of bfill which fills the missing values based on the sequence that you specified. See below as example:

>>> df = pd.DataFrame({"Column_A": [123, 258, np.NaN, np.NaN],"Column_B": [456,np.NaN,785,np.NaN],"Column_C": [np.NaN,456,np.NaN,794]})
>>> df['column_D'] = df[['Column_A', 'Column_B', 'Column_C']].bfill(axis=1).iloc[:, 0] - 10
>>> df
   Column_A  Column_B  Column_C  column_D
0     123.0     456.0       NaN     113.0
1     258.0       NaN     456.0     248.0
2       NaN     785.0       NaN     775.0
3       NaN       NaN     794.0     784.0

huangapple
  • 本文由 发表于 2023年6月15日 05:14:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477584.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定