英文:
I have 2 kinds of posts (sponsored and non sponsored) for X users. I want to know for each post of a user when the last sponsored post happened (days)
问题
日期 | 用户 | 赞助 | 距上次赞助天数 |
---|---|---|---|
08.07.2022 | YYY | 0 | NA |
24.07.2022 | YYY | 1 | NA |
08.08.2022 | YYY | 1 | 15 |
15.08.2022 | YYY | 0 | 7 |
20.08.2022 | YYY | 0 | 12 |
14.05.2022 | ZZZ | 1 | NA |
15.05.2022 | ZZZ | 0 | 1 |
20.06.2022 | ZZZ | 0 | 36 |
20.06.2022 | ZZZ | 0 | 36 |
22.06.2022 | ZZZ | 1 | 38 |
22.06.2022 | ZZZ | 0 | 0 |
英文:
I have 2 kinds of posts (sponsored and non sponsored) for many users. I know the date when each post happened. Now I need to know for each post of a user when the last sponsored post happened (in full days) see the column "Days since last sponsored". Thank you for your help! I want to solve it in R.
Date | User | Sponsored | Days Since Last Sponsored |
---|---|---|---|
08.07.2022 | YYY | 0 | NA |
24.07.2022 | YYY | 1 | NA |
08.08.2022 | YYY | 1 | 15 |
15.08.2022 | YYY | 0 | 7 |
20.08.2022 | YYY | 0 | 12 |
14.05.2022 | ZZZ | 1 | NA |
15.05.2022 | ZZZ | 0 | 1 |
20.06.2022 | ZZZ | 0 | 36 |
20.06.2022 | ZZZ | 0 | 36 |
22.06.2022 | ZZZ | 1 | 38 |
22.06.2022 | ZZZ | 0 | 0 |
I have tried to group by user and use diff time. However the sponsored / non-sponsored gives me a headache.
答案1
得分: 0
以下是使用“tidyverse”的一个选项:
最后的计算有点问题 - 我检查了我的一个值,似乎是正确的,但如果有其他逻辑,请告诉我。
library(tidyverse)
df <- tibble::tribble(
~Date, ~User, ~Sponsored, ~"Days Since Last Sponsored",
"08.07.2022", "YYY", 0L, NA,
"24.07.2022", "YYY", 1L, NA,
"08.08.2022", "YYY", 1L, 15L,
"15.08.2022", "YYY", 0L, 7L,
"20.08.2022", "YYY", 0L, 12L,
"14.05.2022", "ZZZ", 1L, NA,
"15.05.2022", "ZZZ", 0L, 1L,
"20.06.2022", "ZZZ", 0L, 36L,
"20.06.2022", "ZZZ", 0L, 36L,
"22.06.2022", "ZZZ", 1L, 38L,
"22.06.2022", "ZZZ", 0L, 0L
)
df %>%
mutate(Date = lubridate::dmy(Date),
sponsored_date = ifelse(Sponsored == 1, as.character.Date(Date), NA_character_),
sponsored_date = lag(sponsored_date)) %>%
group_by(User) %>%
fill(sponsored_date, .direction = "down") %>%
mutate(sponsored_date = lubridate::ymd(sponsored_date),
since_last = as.numeric(Date - sponsored_date)) %>%
select(Date, User, Sponsored, "Days Since Last Sponsored", since_last)
英文:
Here's an option with tidyverse
:
The calculation at the end is a bit off - I spot checked one of my values and it seems correct, but let me know if there's some other logic used.
library(tidyverse)
df <- tibble::tribble(
~Date, ~User, ~Sponsored, ~"Days Since Last Sponsored",
"08.07.2022", "YYY", 0L, NA,
"24.07.2022", "YYY", 1L, NA,
"08.08.2022", "YYY", 1L, 15L,
"15.08.2022", "YYY", 0L, 7L,
"20.08.2022", "YYY", 0L, 12L,
"14.05.2022", "ZZZ", 1L, NA,
"15.05.2022", "ZZZ", 0L, 1L,
"20.06.2022", "ZZZ", 0L, 36L,
"20.06.2022", "ZZZ", 0L, 36L,
"22.06.2022", "ZZZ", 1L, 38L,
"22.06.2022", "ZZZ", 0L, 0L
)
df %>%
mutate(Date = lubridate::dmy(Date),
sponsored_date = ifelse(Sponsored == 1, as.character.Date(Date), NA_character_),
sponsored_date = lag(sponsored_date)) %>%
group_by(User) %>%
fill(sponsored_date, .direction = "down") %>%
mutate(sponsored_date = lubridate::ymd(sponsored_date),
since_last = as.numeric(Date - sponsored_date)) %>%
select(Date, User, Sponsored, "Days Since Last Sponsored", since_last)
#> # A tibble: 11 × 5
#> # Groups: User [2]
#> Date User Sponsored `Days Since Last Sponsored` since_last
#> <date> <chr> <int> <int> <dbl>
#> 1 2022-07-08 YYY 0 NA NA
#> 2 2022-07-24 YYY 1 NA NA
#> 3 2022-08-08 YYY 1 15 15
#> 4 2022-08-15 YYY 0 7 7
#> 5 2022-08-20 YYY 0 12 12
#> 6 2022-05-14 ZZZ 1 NA NA
#> 7 2022-05-15 ZZZ 0 1 1
#> 8 2022-06-20 ZZZ 0 36 37
#> 9 2022-06-20 ZZZ 0 36 37
#> 10 2022-06-22 ZZZ 1 38 39
#> 11 2022-06-22 ZZZ 0 0 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论