For each post of a user when the last sponsored post happened (days)

huangapple go评论65阅读模式
英文:

I have 2 kinds of posts (sponsored and non sponsored) for X users. I want to know for each post of a user when the last sponsored post happened (days)

问题

日期 用户 赞助 距上次赞助天数
08.07.2022 YYY 0 NA
24.07.2022 YYY 1 NA
08.08.2022 YYY 1 15
15.08.2022 YYY 0 7
20.08.2022 YYY 0 12
14.05.2022 ZZZ 1 NA
15.05.2022 ZZZ 0 1
20.06.2022 ZZZ 0 36
20.06.2022 ZZZ 0 36
22.06.2022 ZZZ 1 38
22.06.2022 ZZZ 0 0
英文:

I have 2 kinds of posts (sponsored and non sponsored) for many users. I know the date when each post happened. Now I need to know for each post of a user when the last sponsored post happened (in full days) see the column "Days since last sponsored". Thank you for your help! I want to solve it in R.

Date User Sponsored Days Since Last Sponsored
08.07.2022 YYY 0 NA
24.07.2022 YYY 1 NA
08.08.2022 YYY 1 15
15.08.2022 YYY 0 7
20.08.2022 YYY 0 12
14.05.2022 ZZZ 1 NA
15.05.2022 ZZZ 0 1
20.06.2022 ZZZ 0 36
20.06.2022 ZZZ 0 36
22.06.2022 ZZZ 1 38
22.06.2022 ZZZ 0 0

I have tried to group by user and use diff time. However the sponsored / non-sponsored gives me a headache.

答案1

得分: 0

以下是使用“tidyverse”的一个选项:

最后的计算有点问题 - 我检查了我的一个值,似乎是正确的,但如果有其他逻辑,请告诉我。

library(tidyverse)

df <- tibble::tribble(
  ~Date, ~User, ~Sponsored, ~"Days Since Last Sponsored",
  "08.07.2022", "YYY",         0L,                         NA,
  "24.07.2022", "YYY",         1L,                         NA,
  "08.08.2022", "YYY",         1L,                        15L,
  "15.08.2022", "YYY",         0L,                         7L,
  "20.08.2022", "YYY",         0L,                        12L,
  "14.05.2022", "ZZZ",         1L,                         NA,
  "15.05.2022", "ZZZ",         0L,                         1L,
  "20.06.2022", "ZZZ",         0L,                        36L,
  "20.06.2022", "ZZZ",         0L,                        36L,
  "22.06.2022", "ZZZ",         1L,                        38L,
  "22.06.2022", "ZZZ",         0L,                         0L
)

df %>% 
  mutate(Date = lubridate::dmy(Date),
         sponsored_date = ifelse(Sponsored == 1, as.character.Date(Date), NA_character_),
         sponsored_date = lag(sponsored_date)) %>% 
  group_by(User) %>% 
  fill(sponsored_date, .direction = "down") %>% 
  mutate(sponsored_date = lubridate::ymd(sponsored_date),
         since_last = as.numeric(Date - sponsored_date)) %>% 
  select(Date, User, Sponsored, "Days Since Last Sponsored", since_last)
英文:

Here's an option with tidyverse:

The calculation at the end is a bit off - I spot checked one of my values and it seems correct, but let me know if there's some other logic used.

library(tidyverse)

df <- tibble::tribble(
  ~Date, ~User, ~Sponsored, ~"Days Since Last Sponsored",
  "08.07.2022", "YYY",         0L,                         NA,
  "24.07.2022", "YYY",         1L,                         NA,
  "08.08.2022", "YYY",         1L,                        15L,
  "15.08.2022", "YYY",         0L,                         7L,
  "20.08.2022", "YYY",         0L,                        12L,
  "14.05.2022", "ZZZ",         1L,                         NA,
  "15.05.2022", "ZZZ",         0L,                         1L,
  "20.06.2022", "ZZZ",         0L,                        36L,
  "20.06.2022", "ZZZ",         0L,                        36L,
  "22.06.2022", "ZZZ",         1L,                        38L,
  "22.06.2022", "ZZZ",         0L,                         0L
)



df %>% 
  mutate(Date = lubridate::dmy(Date),
         sponsored_date = ifelse(Sponsored == 1, as.character.Date(Date), NA_character_),
         sponsored_date = lag(sponsored_date)) %>% 
  group_by(User) %>% 
  fill(sponsored_date, .direction = "down") %>% 
  mutate(sponsored_date = lubridate::ymd(sponsored_date),
         since_last = as.numeric(Date - sponsored_date)) %>% 
  select(Date, User, Sponsored, "Days Since Last Sponsored", since_last)
  
#> # A tibble: 11 × 5
#> # Groups:   User [2]
#>    Date       User  Sponsored `Days Since Last Sponsored` since_last
#>    <date>     <chr>     <int>                       <int>      <dbl>
#>  1 2022-07-08 YYY           0                          NA         NA
#>  2 2022-07-24 YYY           1                          NA         NA
#>  3 2022-08-08 YYY           1                          15         15
#>  4 2022-08-15 YYY           0                           7          7
#>  5 2022-08-20 YYY           0                          12         12
#>  6 2022-05-14 ZZZ           1                          NA         NA
#>  7 2022-05-15 ZZZ           0                           1          1
#>  8 2022-06-20 ZZZ           0                          36         37
#>  9 2022-06-20 ZZZ           0                          36         37
#> 10 2022-06-22 ZZZ           1                          38         39
#> 11 2022-06-22 ZZZ           0                           0          0

huangapple
  • 本文由 发表于 2023年3月21日 02:56:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/75794246.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定