How to create a new colum that identifies the last and second last row in longitudinal data using dplyr

huangapple go评论94阅读模式
英文:

How to create a new colum that identifies the last and second last row in longitudinal data using dplyr

问题

以下是您要的翻译部分:

  1. 我有长格式的数据,每个个体(通过“ID”标识)有多个观察(“Visit”)。每个个体的观察次数不同。我想创建一个新列,其中包括最后一次访问,我已经实现了,以及一个包括倒数第二次访问的列。
  2. 我的数据如下所示:
  3. ```R
  4. ID <- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
  5. Visit <- c("BL","V02","V03","BL","V02","V03","V04","BL","V02","V03","V04","V05")
  6. df <- data.frame(ID,Visit)

最后一列“lastVisit”是通过以下代码创建的:

  1. df <- df %>%
  2. group_by(ID) %>%
  3. mutate(lastVisit = last(Visit))

期望的输出如下所示:

  1. ID Visit lastVisit secondlastVisit
  2. 1000 BL V03 V02
  3. 1000 V02 V03 V02
  4. 1000 V03 V03 V02
  5. 1001 BL V04 V03
  6. 1001 V02 V04 V03
  7. 1001 V03 V04 V03
  8. 1001 V04 V04 V03
  9. 1002 BL V05 V04
  10. 1002 V02 V05 V04
  11. 1002 V03 V05 V04
  12. 1002 V04 V05 V04
  13. 1002 V05 V05 V04

我尝试使用secondlastVisit = lag(Visit),但这不会产生所需的输出。更喜欢使用dplyr::mutate的方法。

谢谢!

  1. 希望这有助于您的工作!如果您需要进一步的帮助,请随时告诉我。
  2. <details>
  3. <summary>英文:</summary>
  4. I have data in long format with several observations (&quot;Visit&quot;) per individual (identified by &quot;ID&quot;). The number of observations per individual varies. I would like to create a new column with the last visit, which I have accomplished, and a column with the second last Visit.
  5. My data looks like this:

ID <- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
Visit <- c("BL","V02","V03","BL","V02","V03","V04","BL","V02","V03","V04","V05")
df <- data.frame(ID,Visit)

  1. The lastVisit column is created by the following code:

df <- df %>%
group_by(ID) %>%
mutate(lastVisit = last(Visit))

  1. The desired output is like this:

ID Visit lastVisit secondlastVisit
1000 BL V03 V02
1000 V02 V03 V02
1000 V03 V03 V02
1001 BL V04 V03
1001 V02 V04 V03
1001 V03 V04 V03
1001 V04 V04 V03
1002 BL V05 V04
1002 V02 V05 V04
1002 V03 V05 V04
1002 V04 V05 V04
1002 V05 V05 V04

  1. I have tried using ```secondlastVisit = lag(Visit)```, but this does not give the desired output. A method using dplyr::mutate is preferred.
  2. Thanks!
  3. </details>
  4. # 答案1
  5. **得分**: 2
  6. 🧙🏾‍♂️: I understand you need a translation of the code snippet you provided. Here's the translated code:
  7. ``` r
  8. 函数`dplyr::nth()`可以实现你想要的功能;负索引表示从末尾开始。
  9. library("magrittr")
  10. library("dplyr")
  11. ID <- c(1000, 1000, 1000, 1001, 1001, 1001, 1001, 1002, 1002, 1002, 1002, 1002)
  12. Visit <- c("BL", "V02", "V03", "BL", "V02", "V03", "V04", "BL", "V02", "V03", "V04", "V05")
  13. df <- data.frame(ID, Visit)
  14. df <- df %>%
  15. group_by(ID) %>%
  16. mutate(lastVisit = last(Visit)) %>%
  17. mutate(secondlastVisit = nth(Visit, -2L))
  18. df
  19. #> # A tibble: 12 x 4
  20. #> # Groups: ID [3]
  21. #> ID Visit lastVisit secondlastVisit
  22. #> <dbl> <chr> <chr> <chr>
  23. #> 1 1000 BL V03 V02
  24. #> 2 1000 V02 V03 V02
  25. #> 3 1000 V03 V03 V02
  26. #> 4 1001 BL V04 V03
  27. #> 5 1001 V02 V04 V03
  28. #> 6 1001 V03 V04 V03
  29. #> 7 1001 V04 V04 V03
  30. #> 8 1002 BL V05 V04
  31. #> 9 1002 V02 V05 V04
  32. #> 10 1002 V03 V05 V04
  33. #> 11 1002 V04 V05 V04
  34. #> 12 1002 V05 V05 V04

Is there anything specific you'd like to know or do with this code? 🤔

英文:

The function dplyr::nth() does what you want; the negative index is to tell it to start from the end.

  1. library(&quot;magrittr&quot;)
  2. library(&quot;dplyr&quot;)
  3. ID &lt;- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
  4. Visit &lt;- c(&quot;BL&quot;,&quot;V02&quot;,&quot;V03&quot;,&quot;BL&quot;,&quot;V02&quot;,&quot;V03&quot;,&quot;V04&quot;,&quot;BL&quot;,&quot;V02&quot;,&quot;V03&quot;,&quot;V04&quot;,&quot;V05&quot;)
  5. df &lt;- data.frame(ID,Visit)
  6. df &lt;- df %&gt;%
  7. group_by(ID) %&gt;%
  8. mutate(lastVisit = last(Visit)) %&gt;%
  9. mutate(secondlastVisit = nth(Visit, -2L))
  10. df
  11. #&gt; # A tibble: 12 x 4
  12. #&gt; # Groups: ID [3]
  13. #&gt; ID Visit lastVisit secondlastVisit
  14. #&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  15. #&gt; 1 1000 BL V03 V02
  16. #&gt; 2 1000 V02 V03 V02
  17. #&gt; 3 1000 V03 V03 V02
  18. #&gt; 4 1001 BL V04 V03
  19. #&gt; 5 1001 V02 V04 V03
  20. #&gt; 6 1001 V03 V04 V03
  21. #&gt; 7 1001 V04 V04 V03
  22. #&gt; 8 1002 BL V05 V04
  23. #&gt; 9 1002 V02 V05 V04
  24. #&gt; 10 1002 V03 V05 V04
  25. #&gt; 11 1002 V04 V05 V04
  26. #&gt; 12 1002 V05 V05 V04

<sup>Created on 2023-04-13 with reprex v2.0.2</sup>

答案2

得分: 1

  1. df <- df %>%
  2. ID分组 %>%
  3. 添加列(secondLastVisit = Visit[which(lastVisit == Visit) - 1])
英文:
  1. df &lt;- df %&gt;%
  2. group_by(ID) %&gt;%
  3. mutate(secondLastVisit = Visit[which(lastVisit == Visit) - 1])

huangapple
  • 本文由 发表于 2023年4月13日 19:32:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定