英文:
How to create a new colum that identifies the last and second last row in longitudinal data using dplyr
问题
以下是您要的翻译部分:
我有长格式的数据,每个个体(通过“ID”标识)有多个观察(“Visit”)。每个个体的观察次数不同。我想创建一个新列,其中包括最后一次访问,我已经实现了,以及一个包括倒数第二次访问的列。
我的数据如下所示:
```R
ID <- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
Visit <- c("BL","V02","V03","BL","V02","V03","V04","BL","V02","V03","V04","V05")
df <- data.frame(ID,Visit)
最后一列“lastVisit”是通过以下代码创建的:
df <- df %>%
group_by(ID) %>%
mutate(lastVisit = last(Visit))
期望的输出如下所示:
ID Visit lastVisit secondlastVisit
1000 BL V03 V02
1000 V02 V03 V02
1000 V03 V03 V02
1001 BL V04 V03
1001 V02 V04 V03
1001 V03 V04 V03
1001 V04 V04 V03
1002 BL V05 V04
1002 V02 V05 V04
1002 V03 V05 V04
1002 V04 V05 V04
1002 V05 V05 V04
我尝试使用secondlastVisit = lag(Visit)
,但这不会产生所需的输出。更喜欢使用dplyr::mutate的方法。
谢谢!
希望这有助于您的工作!如果您需要进一步的帮助,请随时告诉我。
<details>
<summary>英文:</summary>
I have data in long format with several observations ("Visit") per individual (identified by "ID"). The number of observations per individual varies. I would like to create a new column with the last visit, which I have accomplished, and a column with the second last Visit.
My data looks like this:
ID <- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
Visit <- c("BL","V02","V03","BL","V02","V03","V04","BL","V02","V03","V04","V05")
df <- data.frame(ID,Visit)
The lastVisit column is created by the following code:
df <- df %>%
group_by(ID) %>%
mutate(lastVisit = last(Visit))
The desired output is like this:
ID Visit lastVisit secondlastVisit
1000 BL V03 V02
1000 V02 V03 V02
1000 V03 V03 V02
1001 BL V04 V03
1001 V02 V04 V03
1001 V03 V04 V03
1001 V04 V04 V03
1002 BL V05 V04
1002 V02 V05 V04
1002 V03 V05 V04
1002 V04 V05 V04
1002 V05 V05 V04
I have tried using ```secondlastVisit = lag(Visit)```, but this does not give the desired output. A method using dplyr::mutate is preferred.
Thanks!
</details>
# 答案1
**得分**: 2
🧙🏾♂️: I understand you need a translation of the code snippet you provided. Here's the translated code:
``` r
函数`dplyr::nth()`可以实现你想要的功能;负索引表示从末尾开始。
library("magrittr")
library("dplyr")
ID <- c(1000, 1000, 1000, 1001, 1001, 1001, 1001, 1002, 1002, 1002, 1002, 1002)
Visit <- c("BL", "V02", "V03", "BL", "V02", "V03", "V04", "BL", "V02", "V03", "V04", "V05")
df <- data.frame(ID, Visit)
df <- df %>%
group_by(ID) %>%
mutate(lastVisit = last(Visit)) %>%
mutate(secondlastVisit = nth(Visit, -2L))
df
#> # A tibble: 12 x 4
#> # Groups: ID [3]
#> ID Visit lastVisit secondlastVisit
#> <dbl> <chr> <chr> <chr>
#> 1 1000 BL V03 V02
#> 2 1000 V02 V03 V02
#> 3 1000 V03 V03 V02
#> 4 1001 BL V04 V03
#> 5 1001 V02 V04 V03
#> 6 1001 V03 V04 V03
#> 7 1001 V04 V04 V03
#> 8 1002 BL V05 V04
#> 9 1002 V02 V05 V04
#> 10 1002 V03 V05 V04
#> 11 1002 V04 V05 V04
#> 12 1002 V05 V05 V04
Is there anything specific you'd like to know or do with this code? 🤔
英文:
The function dplyr::nth()
does what you want; the negative index is to tell it to start from the end.
library("magrittr")
library("dplyr")
ID <- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
Visit <- c("BL","V02","V03","BL","V02","V03","V04","BL","V02","V03","V04","V05")
df <- data.frame(ID,Visit)
df <- df %>%
group_by(ID) %>%
mutate(lastVisit = last(Visit)) %>%
mutate(secondlastVisit = nth(Visit, -2L))
df
#> # A tibble: 12 x 4
#> # Groups: ID [3]
#> ID Visit lastVisit secondlastVisit
#> <dbl> <chr> <chr> <chr>
#> 1 1000 BL V03 V02
#> 2 1000 V02 V03 V02
#> 3 1000 V03 V03 V02
#> 4 1001 BL V04 V03
#> 5 1001 V02 V04 V03
#> 6 1001 V03 V04 V03
#> 7 1001 V04 V04 V03
#> 8 1002 BL V05 V04
#> 9 1002 V02 V05 V04
#> 10 1002 V03 V05 V04
#> 11 1002 V04 V05 V04
#> 12 1002 V05 V05 V04
<sup>Created on 2023-04-13 with reprex v2.0.2</sup>
答案2
得分: 1
df <- df %>%
按ID分组 %>%
添加列(secondLastVisit = Visit[which(lastVisit == Visit) - 1])
英文:
df <- df %>%
group_by(ID) %>%
mutate(secondLastVisit = Visit[which(lastVisit == Visit) - 1])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论