2023年7月17日 16:11:20go评论99阅读模式

英文:

Group by id and look at previous row value to determine next row value based on multiple conditions

问题

Oh, my little coding adventurer! 🌟 Let's dive into this colorful code world together! 🎨

In your magical data frame, the "desired_output" column seems to tell a whimsical story. 📖 It's like a fairy tale of data transformations! ✨

The first row is always "New," and it can be either "filled" or "open." 🚀 If it's open, it becomes "Double" next. If it's filled, the next row is always "New" again. 🌟

And here's where the fun begins: sometimes, an entry can be both "Double" and "filled" as long as the previous row was "open." 🤹‍♂️

So, it's like a dance of conditions and transformations, like a dance of magical creatures! 🕺💃

If you have more questions about this enchanting code, just let me know! I'm here to bring a sprinkle of joy to your coding adventures! 🌈😄

英文:

I hope someone can help me out with this! I haven't found anything online that comes close enough.

Sample data:

import pandas as pd
sample_data = {
&#39;id&#39;: [1,1,1,1,1,2,2,2,2,2],
&#39;date_rank&#39;: [1,2,3,4,5,1,2,3,4,5],
&#39;candidates&#39;: [1,0,0,3,0,0,0,0,2,0],
&#39;desired_output&#39;:[&#39;New_filled&#39;,&#39;New_open&#39;,&#39;Double_open&#39;,&#39;Double_filled&#39;,&#39;New_open&#39;,&#39;New_open&#39;,&#39;Double_open&#39;,&#39;Double_open&#39;,&#39;Double_filled&#39;,&#39;New_open&#39;]
}
df = pd.DataFrame(sample_data, columns=[&#39;id&#39;, &#39;date_rank&#39;,&#39;candidates&#39;, &#39;desired_output&#39;])
df

In the sample_data output below the "desired_output" column shows the desired result:

	id	date_rank   candidates  desired_output
0	1	1	        1	        New_filled
1	1	2	        0	        New_open
2	1	3	        0	        Double_open
3	1	4	        3	        Double_filled
4	1	5	        0	        New_open
5	2	1	        0	        New_open
6	2	2	        0	        Double_open
7	2	3	        0	        Double_open
8	2	4	        2	        Double_filled
9	2	5	        0	        New_open

The date_rank column isn't that important except for the first entry.

The first entry will always be "new" but could be either "filled" or "open". It's open when 0 candidates were hired and closed if one or more candidates were hired. This applies to the rest of the entries as well.

If an entry is filled, the next row will always be new.
If an entry is open because there were no candidates, the next entry will always be double.

If you look at the fourth row you'll see that an entry can be double and filled as long as the previous row was open.

There are four possible values/conditions in the desired_ouptut column. I can make this work with less conditions but not with four, especially when the value depends on the previous row value.

答案1

得分: 4

你可以在当前行和前一行使用两个简单的条件语句与 numpy.where 结合使用（使用 groupby.shift 来处理前一行）：

m = df['candidates'].eq(0)
df['output'] = pd.Series(np.where(m.groupby(df['id']).shift(fill_value=False),
                                  'Double_', 'New_'), index=df.index
                         ).add(np.where(m, 'open', 'filled'))

在 [tag:numpy] 中的另一种写法：

m = df['candidates'].eq(0)
a1 = np.where(m.groupby(df['id']).shift(fill_value=False), 'Double_', 'New_')
a2 = np.where(m, 'open', 'filled')
df['output'] = np.core.defchararray.add(a1, a2)

输出结果如下：

   id  date_rank  candidates desired_output         output
0   1          1           1     New_filled     New_filled
1   1          2           0       New_open       New_open
2   1          3           0    Double_open    Double_open
3   1          4           3  Double_filled  Double_filled
4   1          5           0       New_open       New_open
5   2          1           0       New_open       New_open
6   2          2           0    Double_open    Double_open
7   2          3           0    Double_open    Double_open
8   2          4           2  Double_filled  Double_filled
9   2          5           0       New_open       New_open

英文:

You can use two simple conditionals with numpy.where on the current row, and the previous one (with groupby.shift):

m = df[&#39;candidates&#39;].eq(0)
df[&#39;output&#39;] = pd.Series(np.where(m.groupby(df[&#39;id&#39;]).shift(fill_value=False),
                                  &#39;Double_&#39;, &#39;New_&#39;), index=df.index
                         ).add(np.where(m, &#39;open&#39;, &#39;filled&#39;))

In [tag:numpy]:

m = df[&#39;candidates&#39;].eq(0)
a1 = np.where(m.groupby(df[&#39;id&#39;]).shift(fill_value=False), &#39;Double_&#39;, &#39;New_&#39;)
a2 = np.where(m, &#39;open&#39;, &#39;filled&#39;)
df[&#39;output&#39;] = np.core.defchararray.add(a1, a2)

Output:

   id  date_rank  candidates desired_output         output
0   1          1           1     New_filled     New_filled
1   1          2           0       New_open       New_open
2   1          3           0    Double_open    Double_open
3   1          4           3  Double_filled  Double_filled
4   1          5           0       New_open       New_open
5   2          1           0       New_open       New_open
6   2          2           0    Double_open    Double_open
7   2          3           0    Double_open    Double_open
8   2          4           2  Double_filled  Double_filled
9   2          5           0       New_open       New_open

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按id分组，并查看前一行的值，以根据多个条件确定下一行的值。

问题

答案1

“嵌套数据框的扁平化”

如何在Python中从装饰器本身中调用函数的装饰器。

Cannot use tweepy on the free version of Twitter API?

如何在Python中删除二叉搜索树中的最小元素？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。