计算具有滞后的条件累积和

huangapple go评论70阅读模式
英文:

Calculate conditional cumulative sum with lag

问题

我理解您想要的是将时间间隔从tstart到tstop的累积总和添加到数据框中,并且只考虑tdc等于1的情况。以下是您所需的两个数据框的结果:

第一个数据框:

want <- data.frame(ID = c(1, 1, 1, 2, 3),
                 tstart = c(0, 100, 200, 0, 0),
                 tstop = c(100, 200, 500, 400, 300),
                 tdc = c(0,1,1,0,1),
                 tdc.cum = c(0,100,400,0,300))

第二个数据框,将累积时间减去10个单位:

want2 <- data.frame(ID = c(1, 1, 1, 2, 3),
                 tstart = c(0, 100, 200, 0, 0),
                 tstop = c(100, 200, 500, 400, 300),
                 tdc = c(0,1,1,0,1),
                 tdc.cum = c(0,100,390,0,290))

希望这可以帮助您获得您需要的结果。如果您有进一步的问题,请随时提问。

英文:

I have this data frame:

have &lt;- data.frame(ID = c(1, 1, 1, 2, 3),
                 tstart = c(0, 100, 200, 0, 0),
                 tstop = c(100, 200, 500, 400, 300),
                 tdc = c(0,1,1,0,1))
&gt; have
  ID tstart tstop tdc
1  1      0   100   0
2  1    100   200   1
3  1    200   500   1
4  2      0   400   0
5  3      0   300   1

I would like to add a column with the cumulative sum of time from the interval tstart to tstop, conditional on tdc=1, i.e.,

want &lt;- data.frame(ID = c(1, 1, 1, 2, 3),
                 tstart = c(0, 100, 200, 0, 0),
                 tstop = c(100, 200, 500, 400, 300),
                 tdc = c(0,1,1,0,1),
                 tdc.cum = c(0,100,400,0,300))
&gt; want
  ID tstart tstop tdc tdc.cum
1  1      0   100   0       0
2  1    100   200   1     100
3  1    200   500   1     400
4  2      0   400   0       0
5  3      0   300   1     300

It would also be helpful to see how to lag the cumulative time by 10 units, i.e., subtract 10 units from each ID's total cumulative sum of rows with tdc=1.

want2 &lt;- data.frame(ID = c(1, 1, 1, 2, 3),
                 tstart = c(0, 100, 200, 0, 0),
                 tstop = c(100, 200, 500, 400, 300),
                 tdc = c(0,1,1,0,1),
                 tdc.cum = c(0,100,390,0,290))
&gt; want2
  ID tstart tstop tdc tdc.cum
1  1      0   100   0       0
2  1    100   200   1     100
3  1    200   500   1     390
4  2      0   400   0       0
5  3      0   300   1     290

I have tried to set up this data frame using survival::tmerge() and cumtdc() but I have only been able to get the cumulative sum of tdc (1 or 0) instead of the time interval. Thank you.

答案1

得分: 0

使用dplyr,这将实现您所需的功能。首先,它通过组(ID)计算了起始/停止差异的累积和。其次,如果每个组中的最后一个值在"tdc.cum"列中不等于0,则从最后一个值中减去10:

library(dplyr)

df <- data.frame(ID = c(1, 1, 1, 2, 3),
                 tstart = c(0, 100, 200, 0, 0),
                 tstop = c(100, 200, 500, 400, 300),
                 tdc = c(0,1,1,0,1))

df %>%
  group_by(ID) %>%
  mutate(tdc.cum = cumsum(ifelse(tdc == 1, tstop - tstart, 0)),
         tdc.cum = ifelse(tdc.cum == max(tdc.cum) & tdc != 0, 
                          tdc.cum - 10, tdc.cum)) %>%
  ungroup()
         
# A tibble: 5 × 5
     ID tstart tstop   tdc tdc.cum
  <dbl>  <dbl> <dbl> <dbl>   <dbl>
1     1      0   100     0       0
2     1    100   200     1     100
3     1    200   500     1     390
4     2      0   400     0       0
5     3      0   300     1     290
英文:

Using dplyr, this will achieve what you need. First, it calculates the cumulative sum of start/stop differences by group (ID). Second, if the last value in tdc for each group != 0, subtract 10 from last value in "tdc.cum":

library(dplyr)

df &lt;- data.frame(ID = c(1, 1, 1, 2, 3),
                 tstart = c(0, 100, 200, 0, 0),
                 tstop = c(100, 200, 500, 400, 300),
                 tdc = c(0,1,1,0,1))

df %&gt;%
  group_by(ID) %&gt;%
  mutate(tdc.cum = cumsum(ifelse(tdc == 1, tstop - tstart, 0)),
         tdc.cum = ifelse(tdc.cum == max(tdc.cum) &amp; tdc != 0, 
                          tdc.cum - 10, tdc.cum)) %&gt;%
  ungroup()
         
# A tibble: 5 &#215; 5
     ID tstart tstop   tdc tdc.cum
  &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;
1     1      0   100     0       0
2     1    100   200     1     100
3     1    200   500     1     390
4     2      0   400     0       0
5     3      0   300     1     290

huangapple
  • 本文由 发表于 2023年5月10日 14:09:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76215345.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定