英文:
Time since event calculations in R
问题
我有以下表格。我想计算自最近一次接受以来的天数。例如,用户1多次接受,但只有其中一次是最近的。我想做两件事:
标记一个简单的真值,表示mostRecent == TRUE/FALSE
,并计算从那次接受到今天日期的天数。
用户 状态 邀请日期
<fct> <fct> <date>
1 1 接受 2021-09-09
2 1 拒绝 2021-09-10
3 1 接受 2021-09-30
4 4 接受 2021-11-10
5 1 接受 2021-11-22
6 4 拒绝 2021-11-29
我已包含重新创建表格的代码。
df <- tribble(
~user, ~status_name, ~invitationDate,
"1", "拒绝", "2021-07-13",
"4", "拒绝", "2021-07-31",
"1", "接受", "2021-09-09",
"1", "拒绝", "2021-09-10",
"1", "接受", "2021-09-30",
"4", "接受", "2021-11-10",
"3", "拒绝", "2021-11-12",
"2", "拒绝", "2021-11-18",
"1", "接受", "2021-11-22",
"4", "拒绝", "2021-11-29"
) %>%
mutate(
user = as.factor(user),
status_name = as.factor(status_name),
invitationDate = as.Date(invitationDate, format = "%Y-%m-%d")
) %>%
group_by(user) %>%
mutate(cumsum = cumsum(status_name == "接受")) %>%
filter(cumsum > 0) %>%
select(-cumsum)
英文:
I have the below table. I want to count the number of days since the most recent acceptance. So, for example, user 1 accepted on multiple occasions but only one of them is the most recent. I want to do 2 things:
Mark a simple truth value indicating mostRecent == TRUE/FALSE
and calculate the number of days from that acceptance to today's date.
user status_name invitationDate
<fct> <fct> <date>
1 1 Accepted 2021-09-09
2 1 Declined 2021-09-10
3 1 Accepted 2021-09-30
4 4 Accepted 2021-11-10
5 1 Accepted 2021-11-22
6 4 Declined 2021-11-29
I have included the code to recreate the table below.
df <- tribble(
~user, ~status_name, ~invitationDate,
"1", "Declined", "2021-07-13",
"4", "Declined", "2021-07-31",
"1", "Accepted", "2021-09-09",
"1", "Declined", "2021-09-10",
"1", "Accepted", "2021-09-30",
"4", "Accepted", "2021-11-10",
"3", "Declined", "2021-11-12",
"2", "Declined", "2021-11-18",
"1", "Accepted", "2021-11-22",
"4", "Declined", "2021-11-29"
) %>%
mutate(
user = as.factor(user),
status_name = as.factor(status_name),
invitationDate = as.Date(invitationDate, format = "%Y-%m-%d")
) %>%
group_by(user) %>%
mutate(cumsum = cumsum(status_name == "Accepted")) %>%
filter(cumsum > 0) %>%
select(-cumsum)
答案1
得分: 4
我们可以通过子集化 'invitationDate' 来创建逻辑索引,其中 'status_name' 为 Accepted
,然后获取 max
并与 'invitationDate' 进行比较。
library(dplyr)
df %>%
mutate(mostRecent = invitationDate %in%
max(invitationDate[status_name == "Accepted"]),
Diff = Sys.Date() - invitationDate[mostRecent]) %>%
ungroup
英文:
We could create the logical index by subsetting the 'invitationDate' where the 'status_name' is Accepted
, get the max
and check with the invitationDate
library(dplyr)
df %>%
mutate(mostRecent = invitationDate %in%
max(invitationDate[status_name == "Accepted"]),
Diff = Sys.Date() - invitationDate[mostRecent]) %>%
ungroup
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论