英文:
Add 0 for cumulative sum based on condition in R
问题
以下是您提供的代码的翻译部分:
我有一个示例的data.table,我试图对每个“Type”按ID编码3个不同的累积和。我的数据表也按日期排序。
DT <- data.table(ID = c(A, A, A, A, B, B, B, C, C, C, C),
date = c("2008-01-01", "2009-01-01", "2010-01-01", "2011-01-01", "1978-01-01", "1982-01-01", "1985-01-01", "2001-01-01", "2011-01-01", "2015-01-01", "2019-01-01"),
Type = c(1, 2, 2, 3, 3, 1, 1, 2, 2, 1, 3),
Days = c(18, 333, 26, 57, 48, 10, 17, , 212, 55, 64, 18))
我尝试过:
DT[Type == 1 & order(date), cumdays1 := cumsum(Days), by = ID]
DT[Type == 2 & order(date), cumdays2 := cumsum(Days), by = ID]
DT[Type == 3 & order(date), cumdays3 := cumsum(Days), by = ID]
但是,这在计算不同于我正在计算的Type时会给我NA。
ID date Type Days cumdays1 cumdays2 cumdays3
1: A 2008-01-01 1 18 18 NA NA
2: A 2009-01-01 2 333 NA 333 NA
3: A 2010-01-01 2 26 NA 359 NA
4: A 2011-01-01 3 57 NA NA 57
5: B 1978-01-01 3 48 NA NA 48
6: B 1982-01-01 1 10 10 NA NA
7: B 1985-01-01 1 17 27 NA NA
8: C 2001-01-01 2 212 NA 212 NA
9: C 2011-01-01 2 55 NA 267 NA
10: C 2015-01-01 1 64 64 NA NA
11: C 2019-01-01 3 18 NA NA 18
相反,我希望继续使用Day=0来计算这些列。我知道我可以为每种Type编写单独的Day变量,但是否有更简单的代码一次完成这个任务?
我的期望输出:
ID date Type Days cumdays1 cumdays2 cumdays3
1: A 2008-01-01 1 18 18 0 0
2: A 2009-01-01 2 333 18 333 0
3: A 2010-01-01 2 26 18 359 0
4: A 2011-01-01 3 57 18 359 57
5: B 1978-01-01 3 48 0 0 48
6: B 1982-01-01 1 10 10 0 48
7: B 1985-01-01 1 17 27 0 48
8: C 2001-01-01 2 212 0 212 0
9: C 2011-01-01 2 55 0 267 0
10: C 2015-01-01 1 64 64 267 0
11: C 2019-01-01 3 18 64 267 18
希望这能帮助您理解代码的翻译。如果您有其他问题,请随时提问。
英文:
I have an example data.table and I am trying to code 3 different cumulative sums by ID, one for each "Type". My data table is also sorted by date.
DT<-data.table(ID=c(A,A,A,A,B,B,B,C,C,C,C),date=c("2008-01-01","2009-01-01","2010-01-01","2011-01-01","1978-01-01","1982-01-01","1985-01-01","2001-01-01","2011-01-01","2015-01-01","2019-01-01"),Type=c(1,2,2,3,3,1,1,2,2,1,3),Days=c(18,333,26,57,48,10,17,,212,55,64,18))
I tried:
DT[Type==1&order(date),cumdays1:=cumsum(Days),by=ID]
DT[Type==2&order(date),cumdays2:=cumsum(Days),by=ID]
DT[Type==3&order(date),cumdays3:=cumsum(Days),by=ID]
However, this gives me NA for when the Type is different to the one I was calculating.
ID date Type Days cumdays1 cumdays2 cumdays3
1: A 2008-01-01 1 18 18 NA NA
2: A 2009-01-01 2 333 NA 333 NA
3: A 2010-01-01 2 26 NA 359 NA
4: A 2011-01-01 3 57 NA NA 57
5: B 1978-01-01 3 48 NA NA 48
6: B 1982-01-01 1 10 10 NA NA
7: B 1985-01-01 1 17 27 NA NA
8: C 2001-01-01 2 212 NA 212 NA
9: C 2011-01-01 2 55 NA 267 NA
10: C 2015-01-01 1 64 64 NA NA
11: C 2019-01-01 3 18 NA NA 18
I would like instead to continue calculate cumulative sum using Day=0 for these columns. I know I could code a separate Day variable for each Type, but is there a simpler code to do this at once?
My desired output
ID date Type Days cumdays1 cumdays2 cumdays3
1: A 2008-01-01 1 18 18 0 0
2: A 2009-01-01 2 333 18 333 0
3: A 2010-01-01 2 26 18 359 0
4: A 2011-01-01 3 57 18 359 57
5: B 1978-01-01 3 48 0 0 48
6: B 1982-01-01 1 10 10 0 48
7: B 1985-01-01 1 17 27 0 48
8: C 2001-01-01 2 212 0 212 0
9: C 2011-01-01 2 55 0 267 0
10: C 2015-01-01 1 64 64 267 0
11: C 2019-01-01 3 18 64 267 18
答案1
得分: 2
# 使用基础R:
transform(dt[order(dt$ID,dt$date),],
cumdays = apply(Days * diag(max(Type))[Type,],
2, \(x)ave(x,ID, FUN = cumsum)))
ID date Type Days cumdays.V1 cumdays.V2 cumdays.V3
1: A 2008-01-01 1 18 18 0 0
2: A 2009-01-01 2 333 18 333 0
3: A 2010-01-01 2 26 18 359 0
4: A 2011-01-01 3 57 18 359 57
5: B 1978-01-01 3 48 0 0 48
6: B 1982-01-01 1 10 10 0 48
7: B 1985-01-01 1 17 27 0 48
8: C 2001-01-01 2 212 0 212 0
9: C 2011-01-01 2 55 0 267 0
10: C 2015-01-01 1 64 64 267 0
11: C 2019-01-01 3 18 64 267 18
---
# 使用Tidyverse:
dt %>%
arrange(ID, date) %>%
mutate(name = Type) %>%
pivot_wider(id_cols = c(ID, date, Type),
names_prefix = 'cumdays',
values_from = Days, values_fill = 0) %>%
mutate(across(starts_with('cumdays'), cumsum), .by = ID)
# 一个 tibble: 11 × 6
ID date Type cumdays1 cumdays2 cumdays3
<chr> <chr> <int> <int> <int> <int>
1 A 2008-01-01 1 18 0 0
2 A 2009-01-01 2 18 333 0
3 A 2010-01-01 2 18 359 0
4 A 2011-01-01 3 18 359 57
5 B 1978-01-01 3 0 0 48
6 B 1982-01-01 1 10 0 48
7 B 1985-01-01 1 27 0 48
8 C 2001-01-01 2 0 212 0
9 C 2011-01-01 2 0 267 0
10 C 2015-01-01 1 64 267 0
11 C 2019-01-01 3 64 267 18
英文:
Using Base R:
transform(dt[order(dt$ID,dt$date),],
cumdays = apply(Days * diag(max(Type))[Type,],
2, \(x)ave(x,ID, FUN = cumsum)))
ID date Type Days cumdays.V1 cumdays.V2 cumdays.V3
1: A 2008-01-01 1 18 18 0 0
2: A 2009-01-01 2 333 18 333 0
3: A 2010-01-01 2 26 18 359 0
4: A 2011-01-01 3 57 18 359 57
5: B 1978-01-01 3 48 0 0 48
6: B 1982-01-01 1 10 10 0 48
7: B 1985-01-01 1 17 27 0 48
8: C 2001-01-01 2 212 0 212 0
9: C 2011-01-01 2 55 0 267 0
10: C 2015-01-01 1 64 64 267 0
11: C 2019-01-01 3 18 64 267 18
Tidyverse:
dt %>%
arrange(ID, date) %>%
mutate(name = Type) %>%
pivot_wider(id_cols = c(ID, date, Type),
names_prefix = 'cumdays',
values_from = Days, values_fill = 0) %>%
mutate(across(starts_with('cumdays'), cumsum), .by = ID)
# A tibble: 11 × 6
ID date Type cumdays1 cumdays2 cumdays3
<chr> <chr> <int> <int> <int> <int>
1 A 2008-01-01 1 18 0 0
2 A 2009-01-01 2 18 333 0
3 A 2010-01-01 2 18 359 0
4 A 2011-01-01 3 18 359 57
5 B 1978-01-01 3 0 0 48
6 B 1982-01-01 1 10 0 48
7 B 1985-01-01 1 27 0 48
8 C 2001-01-01 2 0 212 0
9 C 2011-01-01 2 0 267 0
10 C 2015-01-01 1 64 267 0
11 C 2019-01-01 3 64 267 18
答案2
得分: 1
# 一次性计算累积和,然后重新排列
DT[, cumdays := cumsum(Days), by = .(ID, Type)]
DT <- dcast(DT, ID + date + Days ~ Type, value.var = "cumdays")
cols <- 4:6
# 使用nafill函数填充缺失值
DT[, (cols) := lapply(.SD, nafill, type = "locf"), ID, .SDcols = cols]
# 使用setnafill函数以常数填充缺失值
setnafill(DT, type = "const", fill = 0L, cols = cols)
# 重命名列名
setnames(DT, 4:6, paste0("cumdays", 1:3))
# 输出结果
DT
# ID date Days cumdays1 cumdays2 cumdays3
# 1: A 2008-01-01 18 18 0 0
# 2: A 2009-01-01 333 18 333 0
# 3: A 2010-01-01 26 18 359 0
# 4: A 2011-01-01 57 18 359 57
# 5: B 1978-01-01 48 0 0 48
# 6: B 1982-01-01 10 10 0 48
# 7: B 1985-01-01 17 27 0 48
# 8: C 2001-01-01 212 0 212 0
# 9: C 2011-01-01 55 0 267 0
# 10: C 2015-01-01 64 64 267 0
# 11: C 2019-01-01 18 64 267 18
英文:
# you can calculate the cumsums in once, then dcast them
DT[, cumdays := cumsum(Days), by = .(ID, Type)]
DT <- dcast(DT, ID + date + Days ~ Type, value.var = "cumdays")
cols <- 4:6
DT[, (cols) := lapply(.SD, nafill, type = "locf"), ID, .SDcols = cols]
setnafill(DT, type = "const", fill = 0L, cols = cols)
setnames(DT, 4:6, paste0("cumdays", 1:3))
DT
# ID date Days cumdays1 cumdays2 cumdays3
# 1: A 2008-01-01 18 18 0 0
# 2: A 2009-01-01 333 18 333 0
# 3: A 2010-01-01 26 18 359 0
# 4: A 2011-01-01 57 18 359 57
# 5: B 1978-01-01 48 0 0 48
# 6: B 1982-01-01 10 10 0 48
# 7: B 1985-01-01 17 27 0 48
# 8: C 2001-01-01 212 0 212 0
# 9: C 2011-01-01 55 0 267 0
# 10: C 2015-01-01 64 64 267 0
# 11: C 2019-01-01 18 64 267 18
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论