英文:
R create shift ID where counter increases based on change in row values by group
问题
以下是您要翻译的内容:
原始数据集:
user datetime shift_change_ind
1 2017-08-24 22:42:52 0
1 2017-08-24 22:53:52 0
1 2017-08-24 22:59:37 0
1 2017-09-01 22:34:56 1
1 2017-09-01 22:42:22 0
1 2017-09-01 22:48:49 0
1 2017-09-01 22:51:53 0
1 2017-09-02 00:27:09 1
1 2017-10-26 22:11:35 1
1 2017-10-26 22:12:44 0
1 2017-10-26 22:13:10 0
1 2017-10-26 22:22:20 0
1 2017-10-27 03:50:05 1
1 2017-11-10 23:47:55 1
1 2018-03-02 09:14:40 1
1 2018-03-02 09:36:17 0
1 2018-03:02 09:38:33 0
2 2017-07-10 20:30:52 0
2 2017-07-10 20:49:48 0
2 2017-07-10 20:52:37 0
2 2017-07-12 17:13:11 1
2 2017-07-12 17:19:52 0
2 2017-07-12 19:14:21 1
2 2017-07-12 19:17:12 0
需要创建的“shift ID”列:
user datetime shift_change_ind shift_id
1 2017-08-24 22:42:52 0 1
1 2017-08-24 22:53:52 0 1
1 2017-08-24 22:59:37 0 1
1 2017-09-01 22:34:56 1 2
1 2017-09-01 22:42:22 0 2
1 2017-09-01 22:48:49 0 2
1 2017-09-01 22:51:53 0 2
1 2017-09-02 00:27:09 1 3
1 2017-10-26 22:11:35 1 4
1 2017-10-26 22:12:44 0 4
1 2017-10-26 22:13:10 0 4
1 2017-10-26 22:22:20 0 4
1 2017-10-27 03:50:05 1 5
1 2017-11-10 23:47:55 1 6
1 2018-03-02 09:14:40 1 7
1 2018-03-02 09:36:17 0 7
1 2018-03:02 09:38:33 0 7
2 2017-07-10 20:30:52 0 1
2 2017-07-10 20:49:48 0 1
2 2017-07-10 20:52:37 0 1
2 2017-07-12 17:13:11 1 2
2 2017-07-12 17:19:52 0 2
2 2017-07-12 19:14:21 1 3
2 2017-07-12 19:17:12 0 3
如果您需要更多帮助,请告诉我。
英文:
What I have are time stamps for different "users" and an indicator for when there is a 1+ hour gap between a user's time stamps (indicating a new "shift"). The dataset looks like this:
user datetime shift_change_ind
1 2017-08-24 22:42:52 0
1 2017-08-24 22:53:52 0
1 2017-08-24 22:59:37 0
1 2017-09-01 22:34:56 1
1 2017-09-01 22:42:22 0
1 2017-09-01 22:48:49 0
1 2017-09-01 22:51:53 0
1 2017-09-02 00:27:09 1
1 2017-10-26 22:11:35 1
1 2017-10-26 22:12:44 0
1 2017-10-26 22:13:10 0
1 2017-10-26 22:22:20 0
1 2017-10-27 03:50:05 1
1 2017-11-10 23:47:55 1
1 2018-03-02 09:14:40 1
1 2018-03-02 09:36:17 0
1 2018-03-02 09:38:33 0
2 2017-07-10 20:30:52 0
2 2017-07-10 20:49:48 0
2 2017-07-10 20:52:37 0
2 2017-07-12 17:13:11 1
2 2017-07-12 17:19:52 0
2 2017-07-12 19:14:21 1
2 2017-07-12 19:17:12 0
Code here:
data = structure(list(user = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1503614572.35,
1503615232.527, 1503615577.937, 1504305296.2, 1504305742.53,
1504306129.867, 1504306313.847, 1504312029.627, 1509055895.44,
1509055964.003, 1509055990.587, 1509056540.84, 1509076205.797,
1510357675.767, 1519982080, 1519983377, 1519983513, 1499718652.61,
1499719788.737, 1499719957.883, 1499879591.997, 1499879992.94,
1499886861.447, 1499887032.547), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), shift_change_ind = c(0, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0)), row.names = c(NA,
-24L), class = c("tbl_df", "tbl", "data.frame"))
What I need is to create a "shift ID" column, grouped by the user, that increases the ID counter anytime a 1 appears, resulting in a dataset like this:
user datetime shift_change_ind shift_id
1 2017-08-24 22:42:52 0 1
1 2017-08-24 22:53:52 0 1
1 2017-08-24 22:59:37 0 1
1 2017-09-01 22:34:56 1 2
1 2017-09-01 22:42:22 0 2
1 2017-09-01 22:48:49 0 2
1 2017-09-01 22:51:53 0 2
1 2017-09-02 00:27:09 1 3
1 2017-10-26 22:11:35 1 4
1 2017-10-26 22:12:44 0 4
1 2017-10-26 22:13:10 0 4
1 2017-10-26 22:22:20 0 4
1 2017-10-27 03:50:05 1 5
1 2017-11-10 23:47:55 1 6
1 2018-03-02 09:14:40 1 7
1 2018-03-02 09:36:17 0 7
1 2018-03-02 09:38:33 0 7
2 2017-07-10 20:30:52 0 1
2 2017-07-10 20:49:48 0 1
2 2017-07-10 20:52:37 0 1
2 2017-07-12 17:13:11 1 2
2 2017-07-12 17:19:52 0 2
2 2017-07-12 19:14:21 1 3
2 2017-07-12 19:17:12 0 3
Code here:
new_data = structure(list(user = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1503614572.35,
1503615232.527, 1503615577.937, 1504305296.2, 1504305742.53,
1504306129.867, 1504306313.847, 1504312029.627, 1509055895.44,
1509055964.003, 1509055990.587, 1509056540.84, 1509076205.797,
1510357675.767, 1519982080, 1519983377, 1519983513, 1499718652.61,
1499719788.737, 1499719957.883, 1499879591.997, 1499879992.94,
1499886861.447, 1499887032.547), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), shift_change_ind = c(0, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0), shift_id = c(1,
1, 1, 2, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 7, 1, 1, 1, 2, 2,
3, 3)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-24L))
I have millions of rows so a for-loop seems like a nightmare. I have tried using the rleid()
as a starting place for a shift_id
column, with ifelse()
conditions on whether a leading or lagging 1 or 0 occurs in the shift_change_ind
column to adjust the counter in the shift_id
column, but am running into issues when there are repeating 1's (in the shift_change_ind
column). Plus, I know that my approach is hacky and that there's got to be a cleaner and better way of going about this. Any help is much appreciated.
答案1
得分: 1
我们可以在按'user'分组后使用cumsum
:
library(dplyr)
out <- data %>%
group_by(user) %>%
mutate(shift_id = cumsum(shift_change_ind) + 1) %>%
ungroup
输出:
as.data.frame(out)
user datetime shift_change_ind shift_id
1 1 2017-08-24 22:42:52 0 1
2 1 2017-08-24 22:53:52 0 1
3 1 2017-08-24 22:59:37 0 1
4 1 2017-09-01 22:34:56 1 2
5 1 2017-09-01 22:42:22 0 2
6 1 2017-09-01 22:48:49 0 2
7 1 2017-09-01 22:51:53 0 2
8 1 2017-09-02 00:27:09 1 3
9 1 2017-10-26 22:11:35 1 4
10 1 2017-10-26 22:12:44 0 4
11 1 2017-10-26 22:13:10 0 4
12 1 2017-10-26 22:22:20 0 4
13 1 2017-10-27 03:50:05 1 5
14 1 2017-11-10 23:47:55 1 6
15 1 2018-03-02 09:14:40 1 7
16 1 2018-03-02 09:36:17 0 7
17 1 2018-03-02 09:38:33 0 7
18 2 2017-07-10 20:30:52 0 1
19 2 2017-07-10 20:49:48 0 1
20 2 2017-07-10 20:52:37 0 1
21 2 2017-07-12 17:13:11 1 2
22 2 2017-07-12 17:19:52 0 2
23 2 2017-07-12 19:14:21 1 3
24 2 2017-07-12 19:17:12 0 3
英文:
We could use cumsum
after grouping by 'user'
library(dplyr)
out <- data %>%
group_by(user) %>%
mutate(shift_id = cumsum(shift_change_ind)+1) %>%
ungroup
-output
as.data.frame(out)
user datetime shift_change_ind shift_id
1 1 2017-08-24 22:42:52 0 1
2 1 2017-08-24 22:53:52 0 1
3 1 2017-08-24 22:59:37 0 1
4 1 2017-09-01 22:34:56 1 2
5 1 2017-09-01 22:42:22 0 2
6 1 2017-09-01 22:48:49 0 2
7 1 2017-09-01 22:51:53 0 2
8 1 2017-09-02 00:27:09 1 3
9 1 2017-10-26 22:11:35 1 4
10 1 2017-10-26 22:12:44 0 4
11 1 2017-10-26 22:13:10 0 4
12 1 2017-10-26 22:22:20 0 4
13 1 2017-10-27 03:50:05 1 5
14 1 2017-11-10 23:47:55 1 6
15 1 2018-03-02 09:14:40 1 7
16 1 2018-03-02 09:36:17 0 7
17 1 2018-03-02 09:38:33 0 7
18 2 2017-07-10 20:30:52 0 1
19 2 2017-07-10 20:49:48 0 1
20 2 2017-07-10 20:52:37 0 1
21 2 2017-07-12 17:13:11 1 2
22 2 2017-07-12 17:19:52 0 2
23 2 2017-07-12 19:14:21 1 3
24 2 2017-07-12 19:17:12 0 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论