How to create a new colum that identifies the last and second last row in longitudinal data using dplyr

huangapple go评论70阅读模式
英文:

How to create a new colum that identifies the last and second last row in longitudinal data using dplyr

问题

以下是您要的翻译部分:

我有长格式的数据,每个个体(通过“ID”标识)有多个观察(“Visit”)。每个个体的观察次数不同。我想创建一个新列,其中包括最后一次访问,我已经实现了,以及一个包括倒数第二次访问的列。

我的数据如下所示:
```R
ID <- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
Visit <- c("BL","V02","V03","BL","V02","V03","V04","BL","V02","V03","V04","V05")
df <- data.frame(ID,Visit)

最后一列“lastVisit”是通过以下代码创建的:

df <- df %>%
  group_by(ID) %>%
  mutate(lastVisit = last(Visit))

期望的输出如下所示:

ID Visit lastVisit secondlastVisit
1000 BL    V03       V02
1000 V02   V03       V02
1000 V03   V03       V02
1001 BL    V04       V03
1001 V02   V04       V03
1001 V03   V04       V03
1001 V04   V04       V03
1002 BL    V05       V04
1002 V02   V05       V04
1002 V03   V05       V04
1002 V04   V05       V04
1002 V05   V05       V04

我尝试使用secondlastVisit = lag(Visit),但这不会产生所需的输出。更喜欢使用dplyr::mutate的方法。

谢谢!


希望这有助于您的工作!如果您需要进一步的帮助,请随时告诉我。

<details>
<summary>英文:</summary>

I have data in long format with several observations (&quot;Visit&quot;) per individual (identified by &quot;ID&quot;). The number of observations per individual varies. I would like to create a new column with the last visit, which I have accomplished, and a column with the second last Visit.

My data looks like this:

ID <- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
Visit <- c("BL","V02","V03","BL","V02","V03","V04","BL","V02","V03","V04","V05")
df <- data.frame(ID,Visit)


The lastVisit column is created by the following code:

df <- df %>%
group_by(ID) %>%
mutate(lastVisit = last(Visit))


The desired output is like this:

ID Visit lastVisit secondlastVisit
1000 BL V03 V02
1000 V02 V03 V02
1000 V03 V03 V02
1001 BL V04 V03
1001 V02 V04 V03
1001 V03 V04 V03
1001 V04 V04 V03
1002 BL V05 V04
1002 V02 V05 V04
1002 V03 V05 V04
1002 V04 V05 V04
1002 V05 V05 V04

I have tried using ```secondlastVisit  = lag(Visit)```, but this does not give the desired output. A method using dplyr::mutate is preferred. 

Thanks!

</details>


# 答案1
**得分**: 2

🧙🏾‍♂️: I understand you need a translation of the code snippet you provided. Here's the translated code:

``` r
函数`dplyr::nth()`可以实现你想要的功能;负索引表示从末尾开始。

library("magrittr")
library("dplyr")

ID <- c(1000, 1000, 1000, 1001, 1001, 1001, 1001, 1002, 1002, 1002, 1002, 1002)
Visit <- c("BL", "V02", "V03", "BL", "V02", "V03", "V04", "BL", "V02", "V03", "V04", "V05")
df <- data.frame(ID, Visit)

df <- df %>%
  group_by(ID) %>%
  mutate(lastVisit = last(Visit)) %>%
  mutate(secondlastVisit = nth(Visit, -2L))

df
#> # A tibble: 12 x 4
#> # Groups:   ID [3]
#>       ID Visit lastVisit secondlastVisit
#>    <dbl> <chr> <chr>     <chr>          
#>  1  1000 BL    V03       V02            
#>  2  1000 V02   V03       V02            
#>  3  1000 V03   V03       V02            
#>  4  1001 BL    V04       V03            
#>  5  1001 V02   V04       V03            
#>  6  1001 V03   V04       V03            
#>  7  1001 V04   V04       V03            
#>  8  1002 BL    V05       V04            
#>  9  1002 V02   V05       V04            
#> 10  1002 V03   V05       V04            
#> 11  1002 V04   V05       V04            
#> 12  1002 V05   V05       V04

Is there anything specific you'd like to know or do with this code? 🤔

英文:

The function dplyr::nth() does what you want; the negative index is to tell it to start from the end.

library(&quot;magrittr&quot;)
library(&quot;dplyr&quot;)

ID &lt;- c(1000,1000,1000,1001,1001,1001,1001,1002,1002,1002,1002,1002)
Visit &lt;- c(&quot;BL&quot;,&quot;V02&quot;,&quot;V03&quot;,&quot;BL&quot;,&quot;V02&quot;,&quot;V03&quot;,&quot;V04&quot;,&quot;BL&quot;,&quot;V02&quot;,&quot;V03&quot;,&quot;V04&quot;,&quot;V05&quot;)
df &lt;- data.frame(ID,Visit)

df &lt;- df %&gt;% 
  group_by(ID) %&gt;% 
  mutate(lastVisit = last(Visit)) %&gt;%
  mutate(secondlastVisit = nth(Visit, -2L))

df
#&gt; # A tibble: 12 x 4
#&gt; # Groups:   ID [3]
#&gt;       ID Visit lastVisit secondlastVisit
#&gt;    &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;     &lt;chr&gt;          
#&gt;  1  1000 BL    V03       V02            
#&gt;  2  1000 V02   V03       V02            
#&gt;  3  1000 V03   V03       V02            
#&gt;  4  1001 BL    V04       V03            
#&gt;  5  1001 V02   V04       V03            
#&gt;  6  1001 V03   V04       V03            
#&gt;  7  1001 V04   V04       V03            
#&gt;  8  1002 BL    V05       V04            
#&gt;  9  1002 V02   V05       V04            
#&gt; 10  1002 V03   V05       V04            
#&gt; 11  1002 V04   V05       V04            
#&gt; 12  1002 V05   V05       V04

<sup>Created on 2023-04-13 with reprex v2.0.2</sup>

答案2

得分: 1

df <- df %>%
  按ID分组 %>%
  添加列(secondLastVisit = Visit[which(lastVisit == Visit) - 1])
英文:
df &lt;- df %&gt;% 
  group_by(ID) %&gt;% 
  mutate(secondLastVisit = Visit[which(lastVisit == Visit) - 1])

huangapple
  • 本文由 发表于 2023年4月13日 19:32:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定