How to create a new column in a Pandas dataframe based on conditions using two existing columns i.e., multiple and/or operators in each condition?

huangapple go评论51阅读模式
英文:

How to create a new column in a Pandas dataframe based on conditions using two existing columns i.e., multiple and/or operators in each condition?

问题

我正在改进一些现有的Python代码,使用Numpy和Pandas,并且仍在学习中,所以需要在这种情况下寻求一些帮助。现有的代码非常冗长,我想象中有一种方法可以减少编写的代码量,同时使脚本更高效。

我正在处理的数据集需要创建一个列,以确定会计交易是否应根据“condition_code”和“trans_type”列的值组合来“发布”或“不发布”。以下是现有代码通常的模拟。

conditions = [df['trans_type'].eq('D67'), 
  (df['condition_code'].eq('H')) & 
  (df['trans_type'].eq('D4S') | df['trans_type'].eq('D4U') | ... 
  许多其他.eq语句),
  同样格式的其他条件...]

这段代码可以正常工作,但对于每个条件都有许多or语句,导致几乎有200行代码。

我的目的是在trans_type值的列表中使用in运算符,但第二段代码不起作用,因为我返回了整列值,而不是迭代每一行。任何建议或帮助将不胜感激。希望使这段代码更容易阅读。

conditions = [df['trans_type'] == 'D67', 
  (df['condition_code'] in ['A', 'H']) & (df['trans_type'] in 
  ['D4S', 'D4U', 'D4V', ...]), 
  更多类似的条件...]

我现在知道为什么我的方法不起作用,但不知道如何解决这个问题。有什么建议、推荐或其他方法我应该考虑吗?

英文:

I am working on improving some existing Python code using Numpy and Pandas and am still learning so need some help with this scenario. The existing code is very verbose and I imagine there is a way to trim down on the amount of code written in addition to making the script more efficient.

The dataset I am working with needs a column created to determine if accounting transactions should be "posted" or "not posted" based on combinations of values from columns "condition_code" and "trans_type". Below is a mockup of what the existing code generally looks like.

conditions = [df['trans_type'].eq('D67'), 
  (df['condition_code'].eq('H')) & 
  (df['trans_type'].eq('D4S') | df['trans_type'].eq('D4U') | ... 
  many more .eq statements),
  other conditions in the same format...]

This code works as is, but there are many or statements for each of the conditions resulting in almost 200 lines of code.

My intent was to use the in operator for the list trans_type values, but the second excerpt of code does not work because I'm returning the whole column of values instead of iterating through each row. Any advice or help will be much appreciated. Would love to have this code be easier to read.

conditions = [df['trans_type'] == 'D67', 
  (df['condition_code'] in ['A', 'H']) & (df['trans_type'] in 
  ['D4S', 'D4U', 'D4V', ...]), 
  more similar conditions...]

I know now why my approach does not work, but have no clue how to tackle this now. Any advice, recommendations, or other methods I should look into?

答案1

得分: 0

pandas内置了用于此目的的isin方法:

conditions = (
    df["trans_type"].eq("D67")
    & df["condition_code"].eq("H")
    & df["trans_type"].isin(["D4S", "D4U"])
    & ... # 其他条件
)
英文:

pandas has a built-in isin method for this:

conditions = (
    df["trans_type"].eq("D67")
    & df["condition_code"].eq("H")
    & df["trans_type"].isin(["D4S", "D4U"])
    & ... # other conditions
)

huangapple
  • 本文由 发表于 2023年3月12日 11:47:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75710957.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定