if语句按组ID执行。

huangapple go评论87阅读模式
英文:

if statement per group ID

问题

以下是翻译好的部分:

我想要计算每个客户ID的关系长度,但是关系长度应该对于每个使用该ID的购买是相同的。
这个长度要么是缺陷日期 - 第一次日期(如果缺陷为1),要么是2022年02月11日 - 第一次日期(如果缺陷为0)。
现在它为每个条目单独生成,而不是根据ID。

以下是当前数据集和期望的结果。

  1. ID first_donation_date date_of_donation defected defection_date relationship_length
  2. 1 2014-01-13 2014-01-13 0 NA 2951
  3. 1 2014-01-13 2014-04-14 0 NA 2951
  4. 1 2014-01-13 2014-08-13 0 NA 2951
  5. 1 2014-01-13 2014-09-12 0 NA 2951
  6. 1 2014-01-13 2014-11-12 0 NA 2951
  7. 1 2014-01-13 2015-02-13 0 NA 2951
  8. 1 2014-01-13 2017-02-14 1 2017-02-14 1128
  9. 1 2014-01-13 2018-12-13 1 2018-12-13 1795
  10. 2 2013-12-02 2013-12-02 0 NA 2993
  11. 2 2013-12-02 2014-05-02 0 NA 2993

期望的结果是,ID 1 的所有relationship_length都为1128(第一个defected=1)。对于ID 2,它是2993,因为该客户从未缺陷。

希望这解释了我的问题,期待回答。

英文:

I want to calculate the relationship length per customer ID, however the relationshiplength should be the same for every purchase using the ID.
This length is either defection date - first date (if defection is 1) OR 2202-02-11 - first date (if defection = 0).
Now it generates for every entry seperately, instead of the ID.

  1. ## Relationship length
  2. dataset$relationship_length <- if_else(dataset$defected == 1, as.numeric(dataset$defection_date - dataset$first_donation_date),
  3. as.numeric(as.Date("2022-02-11") - dataset$first_donation_date))

Below one can see the current dataset and the desired outcome.

  1. ID first_donation_date date_of_donation defected defection_date relationship_length
  2. 1 2014-01-13 2014-01-13 0 NA 2951
  3. 1 2014-01-13 2014-04-14 0 NA 2951
  4. 1 2014-01-13 2014-08-13 0 NA 2951
  5. 1 2014-01-13 2014-09-12 0 NA 2951
  6. 1 2014-01-13 2014-11-12 0 NA 2951
  7. 1 2014-01-13 2015-02-13 0 NA 2951
  8. 1 2014-01-13 2017-02-14 1 2017-02-14 1128
  9. 1 2014-01-13 2018-12-13 1 2018-12-13 1795
  10. 2 2013-12-02 2013-12-02 0 NA 2993
  11. 2 2013-12-02 2014-05-02 0 NA 2993

The desired outcome is that all relationship_length for ID 1 are 1128 (the first defected=1). And for ID 2 it is 2993, as the customer never defected.

I hope this explains my question and i look forward to the answers

答案1

得分: 0

以下是您要翻译的代码部分:

  1. library(dplyr)
  2. df %>%
  3. group_by(ID) %>%
  4. mutate(relationship_length = ifelse(any(defected == 1),
  5. (defection_date[defected == 1] - first_donation_date)[defected == 1][1],
  6. as.Date("2022-02-11") - first_donation_date))
  7. # or with newer dplyr versions using .by:
  8. df %>%
  9. mutate(relationship_length = ifelse(any(defected == 1),
  10. (defection_date[defected == 1] - first_donation_date)[defected == 1][1],
  11. as.Date("2022-02-11") - first_donation_date),
  12. .by = ID)

输出部分:

  1. ID first_donation_date date_of_donation defected defection_date relationship_length
  2. <int> <date> <date> <int> <date> <dbl>
  3. 1 1 2014-01-13 2014-01-13 0 NA 1128
  4. 2 1 2014-01-13 2014-04-14 0 NA 1128
  5. 3 1 2014-01-13 2014-08-13 0 NA 1128
  6. 4 1 2014-01-13 2014-09-12 0 NA 1128
  7. 5 1 2014-01-13 2014-11-12 0 NA 1128
  8. 6 1 2014-01-13 2015-02-13 0 NA 1128
  9. 7 1 2014-01-13 2017-02-14 1 2017-02-14 1128
  10. 8 1 2014-01-13 2018-12-13 1 2018-12-13 1128
  11. 9 2 2013-12-02 2013-12-02 0 NA 2993
  12. 10 2 2013-12-02 2014-05-02 0 NA 2993

数据部分:

  1. df <- read.table(text = "ID first_donation_date date_of_donation defected defection_date relationship_length
  2. 1 2014-01-13 2014-01-13 0 NA 2951
  3. 1 2014-01-13 2014-04-14 0 NA 2951
  4. 1 2014-01-13 2014-08-13 0 NA 2951
  5. 1 2014-01-13 2014-09-12 0 NA 2951
  6. 1 2014-01-13 2014-11-12 0 NA 2951
  7. 1 2014-01-13 2015-02-13 0 NA 2951
  8. 1 2014-01-13 2017-02-14 1 2017-02-14 1128
  9. 1 2014-01-13 2018-12-13 1 2018-12-13 1795
  10. 2 2013-12-02 2013-12-02 0 NA 2993
  11. 2 2013-12-02 2014-05-02 0 NA 2993", h = TRUE)
  12. df[c(2,3,5)] <- lapply(df[c(2,3,5)], as.Date)
  13. df <- df[-6]
英文:

There may be more elegant solutions, but here is one dplyr approach to apply duration to all rows in the group based on the first instance of defected == 1:

  1. library(dplyr)
  2. df %&gt;%
  3. group_by(ID) %&gt;%
  4. mutate(relationship_length = ifelse(any(defected == 1),
  5. (defection_date[defected == 1] - first_donation_date)[defected == 1][1],
  6. as.Date(&quot;2022-02-11&quot;) - first_donation_date))
  7. # or with newer `dplyr` versions using `.by`:
  8. df %&gt;%
  9. mutate(relationship_length = ifelse(any(defected == 1),
  10. (defection_date[defected == 1] - first_donation_date)[defected == 1][1],
  11. as.Date(&quot;2022-02-11&quot;) - first_donation_date),
  12. .by = ID)

Output:

  1. ID first_donation_date date_of_donation defected defection_date relationship_length
  2. &lt;int&gt; &lt;date&gt; &lt;date&gt; &lt;int&gt; &lt;date&gt; &lt;dbl&gt;
  3. 1 1 2014-01-13 2014-01-13 0 NA 1128
  4. 2 1 2014-01-13 2014-04-14 0 NA 1128
  5. 3 1 2014-01-13 2014-08-13 0 NA 1128
  6. 4 1 2014-01-13 2014-09-12 0 NA 1128
  7. 5 1 2014-01-13 2014-11-12 0 NA 1128
  8. 6 1 2014-01-13 2015-02-13 0 NA 1128
  9. 7 1 2014-01-13 2017-02-14 1 2017-02-14 1128
  10. 8 1 2014-01-13 2018-12-13 1 2018-12-13 1128
  11. 9 2 2013-12-02 2013-12-02 0 NA 2993
  12. 10 2 2013-12-02 2014-05-02 0 NA 2993

Data

  1. df &lt;- read.table(text = &quot;ID first_donation_date date_of_donation defected defection_date relationship_length
  2. 1 2014-01-13 2014-01-13 0 NA 2951
  3. 1 2014-01-13 2014-04-14 0 NA 2951
  4. 1 2014-01-13 2014-08-13 0 NA 2951
  5. 1 2014-01-13 2014-09-12 0 NA 2951
  6. 1 2014-01-13 2014-11-12 0 NA 2951
  7. 1 2014-01-13 2015-02-13 0 NA 2951
  8. 1 2014-01-13 2017-02-14 1 2017-02-14 1128
  9. 1 2014-01-13 2018-12-13 1 2018-12-13 1795
  10. 2 2013-12-02 2013-12-02 0 NA 2993
  11. 2 2013-12-02 2014-05-02 0 NA 2993&quot;, h = TRUE)
  12. df[c(2,3,5)] &lt;- lapply(df[c(2,3,5)], as.Date)
  13. df &lt;- df[-6]

huangapple
  • 本文由 发表于 2023年4月17日 22:29:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76036238.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定