英文:
Create a variable/column using count of events in a 13-week period from start date
问题
数据框包括案例号(chr)、入职日期[Y-m-d](POSIXct)和每个登录事件的提出日期[Y-m-d h:m:s](POSIXct)。我正在寻找一种方法来计算入职日期和入职后13周(或91天)之间发生的登录事件(提出日期)的数量。
log_df <- data.frame(
case_number = c("000920", "000920", "000920", "000920", "000921", "000921"),
onboarding_date = as.Date(c("2020-10-27", "2020-10-27", "2020-10-27", "2020-10-27", "2020-11-08", "2020-11-08")),
raised_at = as.POSIXct(c("2020-10-27 02:52:31", "2020-10-27 05:40:51", "2020-10-27 05:46:25", "2020-10-28 09:05:21", "2020-11-08 23:39:15", "2020-11-11 00:19:16"))
)
我认为我需要创建一个间隔对象,但我对此还不太熟悉:
int <- interval(ymd("2001-01-01"), ymd("2002-01-01"))
interval = int_shift(int, duration(days = 91))
接下来我不太确定该怎么做。
现在我是否可以创建一个表,只包括使用onboarding_date + interval排除在我感兴趣的91天窗口之外的案例的raised_at事件?然后再按case_number分组计数raised_at?
在接下来的部分,我陷入了困境,自己感到困惑:
log_df %>%
group_by(case_number) %>%
mutate(interval = onboarding_date - min(onboarding_date) < 91,
count_events = sum(raised_at) >= 1,
event_within_interval = if_else(interval & count_events,
1, 0)) %>%
ungroup()
它不起作用,我似乎无法使它工作。
mutate()
中的错误:
! 在计算
count_events = sum(raised_at) >= 1
时出现问题。i 错误发生在组1中:case_number = "ACU0000920"。由于Summary.POSIXct()
中的错误而引起:! 'sum' 对于 "POSIXt" 对象未定义
我理解它无法计算POSIXct日期,但我不知道如何修复这个错误,使我的代码工作。我确定我漏掉了一些明显的东西。如果您能帮助我看出我漏掉了什么,我非常感谢您给一个刚刚入门的人的明智建议。
英文:
The dataframe consists of case_number (chr), onboarding_date Y-m-d, and raised_at [Y-m-d h:m:s] (POSIXct) for each login event. I'm searching for a way to count how many login events (raised_at) occured between the onboarding date and 13-weeks (or 91 days) after onboarding.
log_df <- data.frame(
case_number = c("000920", "000920", "000920", "000920", "000921", "000921"),
onboarding_date = as.Date(c("2020-10-27", "2020-10-27", "2020-10-27", "2020-10-27", "2020-11-08", "2020-11-08")),
raised_at = as.POSIXct(c("2020-10-27 02:52:31", "2020-10-27 05:40:51", "2020-10-27 05:46:25", "2020-10-28 09:05:21", "2020-11-08 23:39:15", "2020-11-11 00:19:16"))
)
I believe I need to create an interval object but I'm pretty new at this:
int <- interval(ymd("2001-01-01"), ymd("2002-01-01"))
interval = int_shift(int, duration(days = 91))
I'm not sure what to do next.
Can I now create a table that will only include raised_at events using onboarding_date+interval to exclude cases outside of my 91 day window of interest?
And then follow that up with a count of raised_at with group_by(case_number)?
I am getting stuck and confusing myself at the next part:
log_df %>%
group_by(case_number) %>%
mutate(interval = onboarding_date - min(onboarding_date) < 91,
count_events = sum(raised_at) >= 1,
event_within_interval = if_else(interval & count_events,
1, 0)) %>%
ungroup()
It doesn't work and I can't seem to make it work.
Error in mutate()
:
> ! Problem while computing count_events = sum(raised_at) >= 1
. i The
> error occurred in group 1: case_number = "ACU0000920". Caused by error
> in Summary.POSIXct()
: ! 'sum' not defined for "POSIXt" objects
I understand that it cannot count the POSIXct dates but I don't know how to fix this error in my code to make it work. I'm sure I'm missing something obvious. If you are able to help me see what I am missing, I really appreciate your sage advice for someone just starting out.
答案1
得分: 1
您可以这样测量日期间隔。根据您的数据,目前无需分组,因为onboarding_date
在case_number
上似乎是恒定的。不分组速度更快。
log_df <- log_df |
mutate(
interval_days = as.Date(raised_at) - onboarding_date,
in_window = interval_days > 91
)
log_df
# case_number onboarding_date raised_at interval_days in_window
# 1 1 000920 2020-10-27 2020-10-27 02:52:31 0 days FALSE
# 2 2 000920 2020-10-27 2020-10-27 05:40:51 0 days FALSE
# 3 3 000920 2020-10-27 2020-10-27 05:46:25 0 days FALSE
# 4 4 000920 2020-10-27 2020-10-28 09:05:21 1 days FALSE
# 5 5 000921 2020-11-08 2020-11-08 23:39:15 0 days FALSE
# 6 6 000921 2020-11-08 2020-11-11 00:19:16 3 days FALSE
然后,要计算每个组中有多少个条目,只需:
log_df |
filter(in_window) |
group_by(case_number) |
summarise(n = n())
您的示例数据中没有间隔超过91天的情况,但在您的真实数据中,这应该提供您所需的结果。我使用了dplyr
语法,因为您在这里使用了它。然而,在基本R中做类似的事情,您可以使用以下方式:
# 这将产生与上面相同的数据框
log_df <- within(log_df, {
interval_days <- as.Date(raised_at) - as.Date(onboarding_date)
in_window <- interval_days > 91
})
# 这将为每个`case_number`生成一个带名称的计数数组,表示窗口外的计数
with(log_df, tapply(in_window, case_number, sum))
# 或者如果您想要一个数据框
aggregate(in_window ~ case_number, data = log_df, sum)
英文:
You can measure the interval in days like this. There's no need to group at this stage, by the look of your data, as the onboarding_date
appears to be constant by case_number
. It's quicker not to group.
log_df <- log_df |>
mutate(
interval_days = as.Date(raised_at) - onboarding_date,
in_window = interval_days > 91
)
log_df
# case_number onboarding_date raised_at interval_days in_window
# 1 1 000920 2020-10-27 2020-10-27 02:52:31 0 days FALSE
# 2 2 000920 2020-10-27 2020-10-27 05:40:51 0 days FALSE
# 3 3 000920 2020-10-27 2020-10-27 05:46:25 0 days FALSE
# 4 4 000920 2020-10-27 2020-10-28 09:05:21 1 days FALSE
# 5 5 000921 2020-11-08 2020-11-08 23:39:15 0 days FALSE
# 6 6 000921 2020-11-08 2020-11-11 00:19:16 3 days FALSE
Then to count how many are in each group simply:
log_df |>
filter(in_window) |>
group_by(case_number) |>
summarise(n = n())
Your sample data doesn't have any cases where the interval is more than 91 days, but in your real data this should provide what you are asking for. I've used dplyr
syntax as you're using it here. However, to do something similar in base R, you can do:
# This produces the same data.frame as above
log_df <- within(log_df, {
interval_days <- as.Date(raised_at) - as.Date(onboarding_date)
in_window <- interval_days > 91
})
# This produces a named array of counts outside the window for each `case_number`
with(log_df, tapply(in_window, case_number, sum))
# Alternatively if you want a data.frame
aggregate(in_window ~ case_number, data = log_df, sum)
答案2
得分: 0
这应该可以工作!
log_df %>%
mutate(events = map_dbl(onboarding_date, ~ sum(between(log_df$raised_at, .x, .x + days(91)))))
英文:
This should work!
log_df %>%
mutate(events = map_dbl(onboarding_date, ~ sum(between(log_df$raised_at, .x, .x + days(91)))))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论