按id分组,并查看前一行的值,以根据多个条件确定下一行的值。

huangapple go评论72阅读模式
英文:

Group by id and look at previous row value to determine next row value based on multiple conditions

问题

Oh, my little coding adventurer! 🌟 Let's dive into this colorful code world together! 🎨

In your magical data frame, the "desired_output" column seems to tell a whimsical story. 📖 It's like a fairy tale of data transformations! ✨

The first row is always "New," and it can be either "filled" or "open." 🚀 If it's open, it becomes "Double" next. If it's filled, the next row is always "New" again. 🌟

And here's where the fun begins: sometimes, an entry can be both "Double" and "filled" as long as the previous row was "open." 🤹‍♂️

So, it's like a dance of conditions and transformations, like a dance of magical creatures! 🕺💃

If you have more questions about this enchanting code, just let me know! I'm here to bring a sprinkle of joy to your coding adventures! 🌈😄

英文:

I hope someone can help me out with this! I haven't found anything online that comes close enough.

Sample data:

import pandas as pd

sample_data = {
'id': [1,1,1,1,1,2,2,2,2,2],
'date_rank': [1,2,3,4,5,1,2,3,4,5],
'candidates': [1,0,0,3,0,0,0,0,2,0],
'desired_output':['New_filled','New_open','Double_open','Double_filled','New_open','New_open','Double_open','Double_open','Double_filled','New_open']
}

df = pd.DataFrame(sample_data, columns=['id', 'date_rank','candidates', 'desired_output'])
df

In the sample_data output below the "desired_output" column shows the desired result:

	id	date_rank   candidates  desired_output
0	1	1	        1	        New_filled
1	1	2	        0	        New_open
2	1	3	        0	        Double_open
3	1	4	        3	        Double_filled
4	1	5	        0	        New_open
5	2	1	        0	        New_open
6	2	2	        0	        Double_open
7	2	3	        0	        Double_open
8	2	4	        2	        Double_filled
9	2	5	        0	        New_open

The date_rank column isn't that important except for the first entry.

The first entry will always be "new" but could be either "filled" or "open". It's open when 0 candidates were hired and closed if one or more candidates were hired. This applies to the rest of the entries as well.

If an entry is filled, the next row will always be new.
If an entry is open because there were no candidates, the next entry will always be double.

If you look at the fourth row you'll see that an entry can be double and filled as long as the previous row was open.

There are four possible values/conditions in the desired_ouptut column. I can make this work with less conditions but not with four, especially when the value depends on the previous row value.

答案1

得分: 4

你可以在当前行和前一行使用两个简单的条件语句与 numpy.where 结合使用(使用 groupby.shift 来处理前一行):

m = df['candidates'].eq(0)

df['output'] = pd.Series(np.where(m.groupby(df['id']).shift(fill_value=False),
                                  'Double_', 'New_'), index=df.index
                         ).add(np.where(m, 'open', 'filled'))

在 [tag:numpy] 中的另一种写法:

m = df['candidates'].eq(0)

a1 = np.where(m.groupby(df['id']).shift(fill_value=False), 'Double_', 'New_')
a2 = np.where(m, 'open', 'filled')
df['output'] = np.core.defchararray.add(a1, a2)

输出结果如下:

   id  date_rank  candidates desired_output         output
0   1          1           1     New_filled     New_filled
1   1          2           0       New_open       New_open
2   1          3           0    Double_open    Double_open
3   1          4           3  Double_filled  Double_filled
4   1          5           0       New_open       New_open
5   2          1           0       New_open       New_open
6   2          2           0    Double_open    Double_open
7   2          3           0    Double_open    Double_open
8   2          4           2  Double_filled  Double_filled
9   2          5           0       New_open       New_open
英文:

You can use two simple conditionals with numpy.where on the current row, and the previous one (with groupby.shift):

m = df['candidates'].eq(0)

df['output'] = pd.Series(np.where(m.groupby(df['id']).shift(fill_value=False),
                                  'Double_', 'New_'), index=df.index
                         ).add(np.where(m, 'open', 'filled'))

In [tag:numpy]:

m = df['candidates'].eq(0)

a1 = np.where(m.groupby(df['id']).shift(fill_value=False), 'Double_', 'New_')
a2 = np.where(m, 'open', 'filled')
df['output'] = np.core.defchararray.add(a1, a2)

Output:

   id  date_rank  candidates desired_output         output
0   1          1           1     New_filled     New_filled
1   1          2           0       New_open       New_open
2   1          3           0    Double_open    Double_open
3   1          4           3  Double_filled  Double_filled
4   1          5           0       New_open       New_open
5   2          1           0       New_open       New_open
6   2          2           0    Double_open    Double_open
7   2          3           0    Double_open    Double_open
8   2          4           2  Double_filled  Double_filled
9   2          5           0       New_open       New_open

huangapple
  • 本文由 发表于 2023年7月17日 16:11:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76702572.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定