英文:
Group by id and look at previous row value to determine next row value based on multiple conditions
问题
Oh, my little coding adventurer! 🌟 Let's dive into this colorful code world together! 🎨
In your magical data frame, the "desired_output" column seems to tell a whimsical story. 📖 It's like a fairy tale of data transformations! ✨
The first row is always "New," and it can be either "filled" or "open." 🚀 If it's open, it becomes "Double" next. If it's filled, the next row is always "New" again. 🌟
And here's where the fun begins: sometimes, an entry can be both "Double" and "filled" as long as the previous row was "open." 🤹♂️
So, it's like a dance of conditions and transformations, like a dance of magical creatures! 🕺💃
If you have more questions about this enchanting code, just let me know! I'm here to bring a sprinkle of joy to your coding adventures! 🌈😄
英文:
I hope someone can help me out with this! I haven't found anything online that comes close enough.
Sample data:
import pandas as pd
sample_data = {
'id': [1,1,1,1,1,2,2,2,2,2],
'date_rank': [1,2,3,4,5,1,2,3,4,5],
'candidates': [1,0,0,3,0,0,0,0,2,0],
'desired_output':['New_filled','New_open','Double_open','Double_filled','New_open','New_open','Double_open','Double_open','Double_filled','New_open']
}
df = pd.DataFrame(sample_data, columns=['id', 'date_rank','candidates', 'desired_output'])
df
In the sample_data output below the "desired_output" column shows the desired result:
id date_rank candidates desired_output
0 1 1 1 New_filled
1 1 2 0 New_open
2 1 3 0 Double_open
3 1 4 3 Double_filled
4 1 5 0 New_open
5 2 1 0 New_open
6 2 2 0 Double_open
7 2 3 0 Double_open
8 2 4 2 Double_filled
9 2 5 0 New_open
The date_rank column isn't that important except for the first entry.
The first entry will always be "new" but could be either "filled" or "open". It's open when 0 candidates were hired and closed if one or more candidates were hired. This applies to the rest of the entries as well.
If an entry is filled, the next row will always be new.
If an entry is open because there were no candidates, the next entry will always be double.
If you look at the fourth row you'll see that an entry can be double and filled as long as the previous row was open.
There are four possible values/conditions in the desired_ouptut column. I can make this work with less conditions but not with four, especially when the value depends on the previous row value.
答案1
得分: 4
你可以在当前行和前一行使用两个简单的条件语句与 numpy.where
结合使用(使用 groupby.shift
来处理前一行):
m = df['candidates'].eq(0)
df['output'] = pd.Series(np.where(m.groupby(df['id']).shift(fill_value=False),
'Double_', 'New_'), index=df.index
).add(np.where(m, 'open', 'filled'))
在 [tag:numpy] 中的另一种写法:
m = df['candidates'].eq(0)
a1 = np.where(m.groupby(df['id']).shift(fill_value=False), 'Double_', 'New_')
a2 = np.where(m, 'open', 'filled')
df['output'] = np.core.defchararray.add(a1, a2)
输出结果如下:
id date_rank candidates desired_output output
0 1 1 1 New_filled New_filled
1 1 2 0 New_open New_open
2 1 3 0 Double_open Double_open
3 1 4 3 Double_filled Double_filled
4 1 5 0 New_open New_open
5 2 1 0 New_open New_open
6 2 2 0 Double_open Double_open
7 2 3 0 Double_open Double_open
8 2 4 2 Double_filled Double_filled
9 2 5 0 New_open New_open
英文:
You can use two simple conditionals with numpy.where
on the current row, and the previous one (with groupby.shift
):
m = df['candidates'].eq(0)
df['output'] = pd.Series(np.where(m.groupby(df['id']).shift(fill_value=False),
'Double_', 'New_'), index=df.index
).add(np.where(m, 'open', 'filled'))
In [tag:numpy]:
m = df['candidates'].eq(0)
a1 = np.where(m.groupby(df['id']).shift(fill_value=False), 'Double_', 'New_')
a2 = np.where(m, 'open', 'filled')
df['output'] = np.core.defchararray.add(a1, a2)
Output:
id date_rank candidates desired_output output
0 1 1 1 New_filled New_filled
1 1 2 0 New_open New_open
2 1 3 0 Double_open Double_open
3 1 4 3 Double_filled Double_filled
4 1 5 0 New_open New_open
5 2 1 0 New_open New_open
6 2 2 0 Double_open Double_open
7 2 3 0 Double_open Double_open
8 2 4 2 Double_filled Double_filled
9 2 5 0 New_open New_open
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论