2023年2月10日 13:11:46go评论95阅读模式

英文:

R create shift ID where counter increases based on change in row values by group

问题

以下是您要翻译的内容：

原始数据集：

user  datetime              shift_change_ind
1	  2017-08-24 22:42:52	0		
1	  2017-08-24 22:53:52	0		
1	  2017-08-24 22:59:37	0		
1	  2017-09-01 22:34:56	1		
1	  2017-09-01 22:42:22	0		
1	  2017-09-01 22:48:49	0		
1	  2017-09-01 22:51:53	0		
1	  2017-09-02 00:27:09	1		
1	  2017-10-26 22:11:35	1		
1	  2017-10-26 22:12:44	0		
1	  2017-10-26 22:13:10	0		
1	  2017-10-26 22:22:20	0		
1	  2017-10-27 03:50:05	1		
1	  2017-11-10 23:47:55	1		
1	  2018-03-02 09:14:40	1		
1	  2018-03-02 09:36:17	0		
1	  2018-03:02 09:38:33	0		
2	  2017-07-10 20:30:52	0		
2	  2017-07-10 20:49:48	0		
2	  2017-07-10 20:52:37	0		
2	  2017-07-12 17:13:11	1		
2	  2017-07-12 17:19:52	0		
2	  2017-07-12 19:14:21	1		
2	  2017-07-12 19:17:12	0

需要创建的“shift ID”列：

user  datetime              shift_change_ind  shift_id
1	  2017-08-24 22:42:52	0	              1	
1	  2017-08-24 22:53:52	0	              1	
1	  2017-08-24 22:59:37	0	              1	
1	  2017-09-01 22:34:56	1	              2	
1	  2017-09-01 22:42:22	0	              2	
1	  2017-09-01 22:48:49	0	              2	
1	  2017-09-01 22:51:53	0	              2	
1	  2017-09-02 00:27:09	1	              3	
1	  2017-10-26 22:11:35	1	              4	
1	  2017-10-26 22:12:44	0	              4	
1	  2017-10-26 22:13:10	0	              4	
1	  2017-10-26 22:22:20	0	              4	
1	  2017-10-27 03:50:05	1	              5	
1	  2017-11-10 23:47:55	1	              6	
1	  2018-03-02 09:14:40	1	              7	
1	  2018-03-02 09:36:17	0	              7	
1	  2018-03:02 09:38:33	0	              7	
2	  2017-07-10 20:30:52	0	              1	
2	  2017-07-10 20:49:48	0	              1	
2	  2017-07-10 20:52:37	0	              1	
2	  2017-07-12 17:13:11	1	              2	
2	  2017-07-12 17:19:52	0	              2	
2	  2017-07-12 19:14:21	1	              3	
2	  2017-07-12 19:17:12	0	              3

如果您需要更多帮助，请告诉我。

英文:

What I have are time stamps for different "users" and an indicator for when there is a 1+ hour gap between a user's time stamps (indicating a new "shift"). The dataset looks like this:

user  datetime              shift_change_ind
1	  2017-08-24 22:42:52	0		
1	  2017-08-24 22:53:52	0		
1	  2017-08-24 22:59:37	0		
1	  2017-09-01 22:34:56	1		
1	  2017-09-01 22:42:22	0		
1	  2017-09-01 22:48:49	0		
1	  2017-09-01 22:51:53	0		
1	  2017-09-02 00:27:09	1		
1	  2017-10-26 22:11:35	1		
1	  2017-10-26 22:12:44	0		
1	  2017-10-26 22:13:10	0		
1	  2017-10-26 22:22:20	0		
1	  2017-10-27 03:50:05	1		
1	  2017-11-10 23:47:55	1		
1	  2018-03-02 09:14:40	1		
1	  2018-03-02 09:36:17	0		
1	  2018-03-02 09:38:33	0		
2	  2017-07-10 20:30:52	0		
2	  2017-07-10 20:49:48	0		
2	  2017-07-10 20:52:37	0		
2	  2017-07-12 17:13:11	1		
2	  2017-07-12 17:19:52	0		
2	  2017-07-12 19:14:21	1		
2	  2017-07-12 19:17:12	0

Code here:

data = structure(list(user = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1503614572.35, 
1503615232.527, 1503615577.937, 1504305296.2, 1504305742.53, 
1504306129.867, 1504306313.847, 1504312029.627, 1509055895.44, 
1509055964.003, 1509055990.587, 1509056540.84, 1509076205.797, 
1510357675.767, 1519982080, 1519983377, 1519983513, 1499718652.61, 
1499719788.737, 1499719957.883, 1499879591.997, 1499879992.94, 
1499886861.447, 1499887032.547), class = c(&quot;POSIXct&quot;, &quot;POSIXt&quot;
), tzone = &quot;UTC&quot;), shift_change_ind = c(0, 0, 0, 1, 0, 
0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0)), row.names = c(NA, 
-24L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;))

What I need is to create a "shift ID" column, grouped by the user, that increases the ID counter anytime a 1 appears, resulting in a dataset like this:

user  datetime              shift_change_ind  shift_id
1	  2017-08-24 22:42:52	0	              1	
1	  2017-08-24 22:53:52	0	              1	
1	  2017-08-24 22:59:37	0	              1	
1	  2017-09-01 22:34:56	1	              2	
1	  2017-09-01 22:42:22	0	              2	
1	  2017-09-01 22:48:49	0	              2	
1	  2017-09-01 22:51:53	0	              2	
1	  2017-09-02 00:27:09	1	              3	
1	  2017-10-26 22:11:35	1	              4	
1	  2017-10-26 22:12:44	0	              4	
1	  2017-10-26 22:13:10	0	              4	
1	  2017-10-26 22:22:20	0	              4	
1	  2017-10-27 03:50:05	1	              5	
1	  2017-11-10 23:47:55	1	              6	
1	  2018-03-02 09:14:40	1	              7	
1	  2018-03-02 09:36:17	0	              7	
1	  2018-03-02 09:38:33	0	              7	
2	  2017-07-10 20:30:52	0	              1	
2	  2017-07-10 20:49:48	0	              1	
2	  2017-07-10 20:52:37	0	              1	
2	  2017-07-12 17:13:11	1	              2	
2	  2017-07-12 17:19:52	0	              2	
2	  2017-07-12 19:14:21	1	              3	
2	  2017-07-12 19:17:12	0	              3

Code here:

new_data = structure(list(user = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1503614572.35, 
1503615232.527, 1503615577.937, 1504305296.2, 1504305742.53, 
1504306129.867, 1504306313.847, 1504312029.627, 1509055895.44, 
1509055964.003, 1509055990.587, 1509056540.84, 1509076205.797, 
1510357675.767, 1519982080, 1519983377, 1519983513, 1499718652.61, 
1499719788.737, 1499719957.883, 1499879591.997, 1499879992.94, 
1499886861.447, 1499887032.547), class = c(&quot;POSIXct&quot;, &quot;POSIXt&quot;
), tzone = &quot;UTC&quot;), shift_change_ind = c(0, 0, 0, 1, 0, 
0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0), shift_id = c(1, 
1, 1, 2, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 7, 1, 1, 1, 2, 2, 
3, 3)), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, 
-24L))

I have millions of rows so a for-loop seems like a nightmare. I have tried using the rleid() as a starting place for a shift_id column, with ifelse() conditions on whether a leading or lagging 1 or 0 occurs in the shift_change_ind column to adjust the counter in the shift_id column, but am running into issues when there are repeating 1's (in the shift_change_ind column). Plus, I know that my approach is hacky and that there's got to be a cleaner and better way of going about this. Any help is much appreciated.

答案1

得分: 1

我们可以在按'user'分组后使用cumsum：

library(dplyr)
out <- data %>%
  group_by(user) %>%
  mutate(shift_id = cumsum(shift_change_ind) + 1) %>%
  ungroup

输出：

as.data.frame(out)
 user            datetime shift_change_ind shift_id
1     1 2017-08-24 22:42:52                0        1
2     1 2017-08-24 22:53:52                0        1
3     1 2017-08-24 22:59:37                0        1
4     1 2017-09-01 22:34:56                1        2
5     1 2017-09-01 22:42:22                0        2
6     1 2017-09-01 22:48:49                0        2
7     1 2017-09-01 22:51:53                0        2
8     1 2017-09-02 00:27:09                1        3
9     1 2017-10-26 22:11:35                1        4
10    1 2017-10-26 22:12:44                0        4
11    1 2017-10-26 22:13:10                0        4
12    1 2017-10-26 22:22:20                0        4
13    1 2017-10-27 03:50:05                1        5
14    1 2017-11-10 23:47:55                1        6
15    1 2018-03-02 09:14:40                1        7
16    1 2018-03-02 09:36:17                0        7
17    1 2018-03-02 09:38:33                0        7
18    2 2017-07-10 20:30:52                0        1
19    2 2017-07-10 20:49:48                0        1
20    2 2017-07-10 20:52:37                0        1
21    2 2017-07-12 17:13:11                1        2
22    2 2017-07-12 17:19:52                0        2
23    2 2017-07-12 19:14:21                1        3
24    2 2017-07-12 19:17:12                0        3

英文:

We could use cumsum after grouping by 'user'

library(dplyr)
out &lt;- data %&gt;% 
group_by(user) %&gt;%
mutate(shift_id = cumsum(shift_change_ind)+1) %&gt;%
ungroup

-output

as.data.frame(out)
user            datetime shift_change_ind shift_id
1     1 2017-08-24 22:42:52                0        1
2     1 2017-08-24 22:53:52                0        1
3     1 2017-08-24 22:59:37                0        1
4     1 2017-09-01 22:34:56                1        2
5     1 2017-09-01 22:42:22                0        2
6     1 2017-09-01 22:48:49                0        2
7     1 2017-09-01 22:51:53                0        2
8     1 2017-09-02 00:27:09                1        3
9     1 2017-10-26 22:11:35                1        4
10    1 2017-10-26 22:12:44                0        4
11    1 2017-10-26 22:13:10                0        4
12    1 2017-10-26 22:22:20                0        4
13    1 2017-10-27 03:50:05                1        5
14    1 2017-11-10 23:47:55                1        6
15    1 2018-03-02 09:14:40                1        7
16    1 2018-03-02 09:36:17                0        7
17    1 2018-03-02 09:38:33                0        7
18    2 2017-07-10 20:30:52                0        1
19    2 2017-07-10 20:49:48                0        1
20    2 2017-07-10 20:52:37                0        1
21    2 2017-07-12 17:13:11                1        2
22    2 2017-07-12 17:19:52                0        2
23    2 2017-07-12 19:14:21                1        3
24    2 2017-07-12 19:17:12                0        3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个按组计算的，基于行值变化递增的班次ID。

问题

答案1

如何使用dplyr库选择特定变量，然后将一个变量与特定值匹配。

在同一图中绘制两个带权重的直方图。

为Seurat对象中特定子集添加元数据

基于同一数据框中的多个列的非数字变量创建新列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。