创建一个按组计算的,基于行值变化递增的班次ID。

huangapple go评论95阅读模式
英文:

R create shift ID where counter increases based on change in row values by group

问题

以下是您要翻译的内容:

原始数据集:

  1. user datetime shift_change_ind
  2. 1 2017-08-24 22:42:52 0
  3. 1 2017-08-24 22:53:52 0
  4. 1 2017-08-24 22:59:37 0
  5. 1 2017-09-01 22:34:56 1
  6. 1 2017-09-01 22:42:22 0
  7. 1 2017-09-01 22:48:49 0
  8. 1 2017-09-01 22:51:53 0
  9. 1 2017-09-02 00:27:09 1
  10. 1 2017-10-26 22:11:35 1
  11. 1 2017-10-26 22:12:44 0
  12. 1 2017-10-26 22:13:10 0
  13. 1 2017-10-26 22:22:20 0
  14. 1 2017-10-27 03:50:05 1
  15. 1 2017-11-10 23:47:55 1
  16. 1 2018-03-02 09:14:40 1
  17. 1 2018-03-02 09:36:17 0
  18. 1 2018-03:02 09:38:33 0
  19. 2 2017-07-10 20:30:52 0
  20. 2 2017-07-10 20:49:48 0
  21. 2 2017-07-10 20:52:37 0
  22. 2 2017-07-12 17:13:11 1
  23. 2 2017-07-12 17:19:52 0
  24. 2 2017-07-12 19:14:21 1
  25. 2 2017-07-12 19:17:12 0

需要创建的“shift ID”列:

  1. user datetime shift_change_ind shift_id
  2. 1 2017-08-24 22:42:52 0 1
  3. 1 2017-08-24 22:53:52 0 1
  4. 1 2017-08-24 22:59:37 0 1
  5. 1 2017-09-01 22:34:56 1 2
  6. 1 2017-09-01 22:42:22 0 2
  7. 1 2017-09-01 22:48:49 0 2
  8. 1 2017-09-01 22:51:53 0 2
  9. 1 2017-09-02 00:27:09 1 3
  10. 1 2017-10-26 22:11:35 1 4
  11. 1 2017-10-26 22:12:44 0 4
  12. 1 2017-10-26 22:13:10 0 4
  13. 1 2017-10-26 22:22:20 0 4
  14. 1 2017-10-27 03:50:05 1 5
  15. 1 2017-11-10 23:47:55 1 6
  16. 1 2018-03-02 09:14:40 1 7
  17. 1 2018-03-02 09:36:17 0 7
  18. 1 2018-03:02 09:38:33 0 7
  19. 2 2017-07-10 20:30:52 0 1
  20. 2 2017-07-10 20:49:48 0 1
  21. 2 2017-07-10 20:52:37 0 1
  22. 2 2017-07-12 17:13:11 1 2
  23. 2 2017-07-12 17:19:52 0 2
  24. 2 2017-07-12 19:14:21 1 3
  25. 2 2017-07-12 19:17:12 0 3

如果您需要更多帮助,请告诉我。

英文:

What I have are time stamps for different "users" and an indicator for when there is a 1+ hour gap between a user's time stamps (indicating a new "shift"). The dataset looks like this:

  1. user datetime shift_change_ind
  2. 1 2017-08-24 22:42:52 0
  3. 1 2017-08-24 22:53:52 0
  4. 1 2017-08-24 22:59:37 0
  5. 1 2017-09-01 22:34:56 1
  6. 1 2017-09-01 22:42:22 0
  7. 1 2017-09-01 22:48:49 0
  8. 1 2017-09-01 22:51:53 0
  9. 1 2017-09-02 00:27:09 1
  10. 1 2017-10-26 22:11:35 1
  11. 1 2017-10-26 22:12:44 0
  12. 1 2017-10-26 22:13:10 0
  13. 1 2017-10-26 22:22:20 0
  14. 1 2017-10-27 03:50:05 1
  15. 1 2017-11-10 23:47:55 1
  16. 1 2018-03-02 09:14:40 1
  17. 1 2018-03-02 09:36:17 0
  18. 1 2018-03-02 09:38:33 0
  19. 2 2017-07-10 20:30:52 0
  20. 2 2017-07-10 20:49:48 0
  21. 2 2017-07-10 20:52:37 0
  22. 2 2017-07-12 17:13:11 1
  23. 2 2017-07-12 17:19:52 0
  24. 2 2017-07-12 19:14:21 1
  25. 2 2017-07-12 19:17:12 0

Code here:

  1. data = structure(list(user = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  2. 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1503614572.35,
  3. 1503615232.527, 1503615577.937, 1504305296.2, 1504305742.53,
  4. 1504306129.867, 1504306313.847, 1504312029.627, 1509055895.44,
  5. 1509055964.003, 1509055990.587, 1509056540.84, 1509076205.797,
  6. 1510357675.767, 1519982080, 1519983377, 1519983513, 1499718652.61,
  7. 1499719788.737, 1499719957.883, 1499879591.997, 1499879992.94,
  8. 1499886861.447, 1499887032.547), class = c("POSIXct", "POSIXt"
  9. ), tzone = "UTC"), shift_change_ind = c(0, 0, 0, 1, 0,
  10. 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0)), row.names = c(NA,
  11. -24L), class = c("tbl_df", "tbl", "data.frame"))

What I need is to create a "shift ID" column, grouped by the user, that increases the ID counter anytime a 1 appears, resulting in a dataset like this:

  1. user datetime shift_change_ind shift_id
  2. 1 2017-08-24 22:42:52 0 1
  3. 1 2017-08-24 22:53:52 0 1
  4. 1 2017-08-24 22:59:37 0 1
  5. 1 2017-09-01 22:34:56 1 2
  6. 1 2017-09-01 22:42:22 0 2
  7. 1 2017-09-01 22:48:49 0 2
  8. 1 2017-09-01 22:51:53 0 2
  9. 1 2017-09-02 00:27:09 1 3
  10. 1 2017-10-26 22:11:35 1 4
  11. 1 2017-10-26 22:12:44 0 4
  12. 1 2017-10-26 22:13:10 0 4
  13. 1 2017-10-26 22:22:20 0 4
  14. 1 2017-10-27 03:50:05 1 5
  15. 1 2017-11-10 23:47:55 1 6
  16. 1 2018-03-02 09:14:40 1 7
  17. 1 2018-03-02 09:36:17 0 7
  18. 1 2018-03-02 09:38:33 0 7
  19. 2 2017-07-10 20:30:52 0 1
  20. 2 2017-07-10 20:49:48 0 1
  21. 2 2017-07-10 20:52:37 0 1
  22. 2 2017-07-12 17:13:11 1 2
  23. 2 2017-07-12 17:19:52 0 2
  24. 2 2017-07-12 19:14:21 1 3
  25. 2 2017-07-12 19:17:12 0 3

Code here:

  1. new_data = structure(list(user = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  2. 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1503614572.35,
  3. 1503615232.527, 1503615577.937, 1504305296.2, 1504305742.53,
  4. 1504306129.867, 1504306313.847, 1504312029.627, 1509055895.44,
  5. 1509055964.003, 1509055990.587, 1509056540.84, 1509076205.797,
  6. 1510357675.767, 1519982080, 1519983377, 1519983513, 1499718652.61,
  7. 1499719788.737, 1499719957.883, 1499879591.997, 1499879992.94,
  8. 1499886861.447, 1499887032.547), class = c("POSIXct", "POSIXt"
  9. ), tzone = "UTC"), shift_change_ind = c(0, 0, 0, 1, 0,
  10. 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0), shift_id = c(1,
  11. 1, 1, 2, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 7, 1, 1, 1, 2, 2,
  12. 3, 3)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
  13. -24L))

I have millions of rows so a for-loop seems like a nightmare. I have tried using the rleid() as a starting place for a shift_id column, with ifelse() conditions on whether a leading or lagging 1 or 0 occurs in the shift_change_ind column to adjust the counter in the shift_id column, but am running into issues when there are repeating 1's (in the shift_change_ind column). Plus, I know that my approach is hacky and that there's got to be a cleaner and better way of going about this. Any help is much appreciated.

答案1

得分: 1

我们可以在按'user'分组后使用cumsum

  1. library(dplyr)
  2. out <- data %>%
  3. group_by(user) %>%
  4. mutate(shift_id = cumsum(shift_change_ind) + 1) %>%
  5. ungroup

输出:

  1. as.data.frame(out)
  2. user datetime shift_change_ind shift_id
  3. 1 1 2017-08-24 22:42:52 0 1
  4. 2 1 2017-08-24 22:53:52 0 1
  5. 3 1 2017-08-24 22:59:37 0 1
  6. 4 1 2017-09-01 22:34:56 1 2
  7. 5 1 2017-09-01 22:42:22 0 2
  8. 6 1 2017-09-01 22:48:49 0 2
  9. 7 1 2017-09-01 22:51:53 0 2
  10. 8 1 2017-09-02 00:27:09 1 3
  11. 9 1 2017-10-26 22:11:35 1 4
  12. 10 1 2017-10-26 22:12:44 0 4
  13. 11 1 2017-10-26 22:13:10 0 4
  14. 12 1 2017-10-26 22:22:20 0 4
  15. 13 1 2017-10-27 03:50:05 1 5
  16. 14 1 2017-11-10 23:47:55 1 6
  17. 15 1 2018-03-02 09:14:40 1 7
  18. 16 1 2018-03-02 09:36:17 0 7
  19. 17 1 2018-03-02 09:38:33 0 7
  20. 18 2 2017-07-10 20:30:52 0 1
  21. 19 2 2017-07-10 20:49:48 0 1
  22. 20 2 2017-07-10 20:52:37 0 1
  23. 21 2 2017-07-12 17:13:11 1 2
  24. 22 2 2017-07-12 17:19:52 0 2
  25. 23 2 2017-07-12 19:14:21 1 3
  26. 24 2 2017-07-12 19:17:12 0 3
英文:

We could use cumsum after grouping by 'user'

  1. library(dplyr)
  2. out &lt;- data %&gt;%
  3. group_by(user) %&gt;%
  4. mutate(shift_id = cumsum(shift_change_ind)+1) %&gt;%
  5. ungroup

-output

  1. as.data.frame(out)
  2. user datetime shift_change_ind shift_id
  3. 1 1 2017-08-24 22:42:52 0 1
  4. 2 1 2017-08-24 22:53:52 0 1
  5. 3 1 2017-08-24 22:59:37 0 1
  6. 4 1 2017-09-01 22:34:56 1 2
  7. 5 1 2017-09-01 22:42:22 0 2
  8. 6 1 2017-09-01 22:48:49 0 2
  9. 7 1 2017-09-01 22:51:53 0 2
  10. 8 1 2017-09-02 00:27:09 1 3
  11. 9 1 2017-10-26 22:11:35 1 4
  12. 10 1 2017-10-26 22:12:44 0 4
  13. 11 1 2017-10-26 22:13:10 0 4
  14. 12 1 2017-10-26 22:22:20 0 4
  15. 13 1 2017-10-27 03:50:05 1 5
  16. 14 1 2017-11-10 23:47:55 1 6
  17. 15 1 2018-03-02 09:14:40 1 7
  18. 16 1 2018-03-02 09:36:17 0 7
  19. 17 1 2018-03-02 09:38:33 0 7
  20. 18 2 2017-07-10 20:30:52 0 1
  21. 19 2 2017-07-10 20:49:48 0 1
  22. 20 2 2017-07-10 20:52:37 0 1
  23. 21 2 2017-07-12 17:13:11 1 2
  24. 22 2 2017-07-12 17:19:52 0 2
  25. 23 2 2017-07-12 19:14:21 1 3
  26. 24 2 2017-07-12 19:17:12 0 3

huangapple
  • 本文由 发表于 2023年2月10日 13:11:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75407201.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定