英文:
Calculate conditional cumulative sum with lag
问题
我理解您想要的是将时间间隔从tstart到tstop的累积总和添加到数据框中,并且只考虑tdc等于1的情况。以下是您所需的两个数据框的结果:
第一个数据框:
want <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1),
tdc.cum = c(0,100,400,0,300))
第二个数据框,将累积时间减去10个单位:
want2 <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1),
tdc.cum = c(0,100,390,0,290))
希望这可以帮助您获得您需要的结果。如果您有进一步的问题,请随时提问。
英文:
I have this data frame:
have <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1))
> have
ID tstart tstop tdc
1 1 0 100 0
2 1 100 200 1
3 1 200 500 1
4 2 0 400 0
5 3 0 300 1
I would like to add a column with the cumulative sum of time from the interval tstart to tstop, conditional on tdc=1, i.e.,
want <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1),
tdc.cum = c(0,100,400,0,300))
> want
ID tstart tstop tdc tdc.cum
1 1 0 100 0 0
2 1 100 200 1 100
3 1 200 500 1 400
4 2 0 400 0 0
5 3 0 300 1 300
It would also be helpful to see how to lag the cumulative time by 10 units, i.e., subtract 10 units from each ID's total cumulative sum of rows with tdc=1.
want2 <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1),
tdc.cum = c(0,100,390,0,290))
> want2
ID tstart tstop tdc tdc.cum
1 1 0 100 0 0
2 1 100 200 1 100
3 1 200 500 1 390
4 2 0 400 0 0
5 3 0 300 1 290
I have tried to set up this data frame using survival::tmerge() and cumtdc() but I have only been able to get the cumulative sum of tdc (1 or 0) instead of the time interval. Thank you.
答案1
得分: 0
使用dplyr
,这将实现您所需的功能。首先,它通过组(ID)计算了起始/停止差异的累积和。其次,如果每个组中的最后一个值在"tdc.cum"列中不等于0,则从最后一个值中减去10:
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1))
df %>%
group_by(ID) %>%
mutate(tdc.cum = cumsum(ifelse(tdc == 1, tstop - tstart, 0)),
tdc.cum = ifelse(tdc.cum == max(tdc.cum) & tdc != 0,
tdc.cum - 10, tdc.cum)) %>%
ungroup()
# A tibble: 5 × 5
ID tstart tstop tdc tdc.cum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 100 0 0
2 1 100 200 1 100
3 1 200 500 1 390
4 2 0 400 0 0
5 3 0 300 1 290
英文:
Using dplyr
, this will achieve what you need. First, it calculates the cumulative sum of start/stop differences by group (ID). Second, if the last value in tdc for each group != 0, subtract 10 from last value in "tdc.cum":
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 3),
tstart = c(0, 100, 200, 0, 0),
tstop = c(100, 200, 500, 400, 300),
tdc = c(0,1,1,0,1))
df %>%
group_by(ID) %>%
mutate(tdc.cum = cumsum(ifelse(tdc == 1, tstop - tstart, 0)),
tdc.cum = ifelse(tdc.cum == max(tdc.cum) & tdc != 0,
tdc.cum - 10, tdc.cum)) %>%
ungroup()
# A tibble: 5 × 5
ID tstart tstop tdc tdc.cum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 100 0 0
2 1 100 200 1 100
3 1 200 500 1 390
4 2 0 400 0 0
5 3 0 300 1 290
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论