英文:
How to create a new column in a Pandas dataframe based on conditions using two existing columns i.e., multiple and/or operators in each condition?
问题
我正在改进一些现有的Python代码,使用Numpy和Pandas,并且仍在学习中,所以需要在这种情况下寻求一些帮助。现有的代码非常冗长,我想象中有一种方法可以减少编写的代码量,同时使脚本更高效。
我正在处理的数据集需要创建一个列,以确定会计交易是否应根据“condition_code”和“trans_type”列的值组合来“发布”或“不发布”。以下是现有代码通常的模拟。
conditions = [df['trans_type'].eq('D67'),
(df['condition_code'].eq('H')) &
(df['trans_type'].eq('D4S') | df['trans_type'].eq('D4U') | ...
许多其他.eq语句),
同样格式的其他条件...]
这段代码可以正常工作,但对于每个条件都有许多or语句,导致几乎有200行代码。
我的目的是在trans_type值的列表中使用in运算符,但第二段代码不起作用,因为我返回了整列值,而不是迭代每一行。任何建议或帮助将不胜感激。希望使这段代码更容易阅读。
conditions = [df['trans_type'] == 'D67',
(df['condition_code'] in ['A', 'H']) & (df['trans_type'] in
['D4S', 'D4U', 'D4V', ...]),
更多类似的条件...]
我现在知道为什么我的方法不起作用,但不知道如何解决这个问题。有什么建议、推荐或其他方法我应该考虑吗?
英文:
I am working on improving some existing Python code using Numpy and Pandas and am still learning so need some help with this scenario. The existing code is very verbose and I imagine there is a way to trim down on the amount of code written in addition to making the script more efficient.
The dataset I am working with needs a column created to determine if accounting transactions should be "posted" or "not posted" based on combinations of values from columns "condition_code" and "trans_type". Below is a mockup of what the existing code generally looks like.
conditions = [df['trans_type'].eq('D67'),
(df['condition_code'].eq('H')) &
(df['trans_type'].eq('D4S') | df['trans_type'].eq('D4U') | ...
many more .eq statements),
other conditions in the same format...]
This code works as is, but there are many or statements for each of the conditions resulting in almost 200 lines of code.
My intent was to use the in operator for the list trans_type values, but the second excerpt of code does not work because I'm returning the whole column of values instead of iterating through each row. Any advice or help will be much appreciated. Would love to have this code be easier to read.
conditions = [df['trans_type'] == 'D67',
(df['condition_code'] in ['A', 'H']) & (df['trans_type'] in
['D4S', 'D4U', 'D4V', ...]),
more similar conditions...]
I know now why my approach does not work, but have no clue how to tackle this now. Any advice, recommendations, or other methods I should look into?
答案1
得分: 0
pandas内置了用于此目的的isin
方法:
conditions = (
df["trans_type"].eq("D67")
& df["condition_code"].eq("H")
& df["trans_type"].isin(["D4S", "D4U"])
& ... # 其他条件
)
英文:
pandas has a built-in isin
method for this:
conditions = (
df["trans_type"].eq("D67")
& df["condition_code"].eq("H")
& df["trans_type"].isin(["D4S", "D4U"])
& ... # other conditions
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论