2023年4月17日 22:29:05go评论87阅读模式

英文:

if statement per group ID

问题

以下是翻译好的部分：

我想要计算每个客户ID的关系长度，但是关系长度应该对于每个使用该ID的购买是相同的。
这个长度要么是缺陷日期 - 第一次日期（如果缺陷为1），要么是2022年02月11日 - 第一次日期（如果缺陷为0）。
现在它为每个条目单独生成，而不是根据ID。

以下是当前数据集和期望的结果。

ID first_donation_date date_of_donation defected defection_date relationship_length
1  2014-01-13          2014-01-13       0        NA             2951
1  2014-01-13          2014-04-14       0        NA             2951
1  2014-01-13          2014-08-13       0        NA             2951
1  2014-01-13          2014-09-12       0        NA             2951
1  2014-01-13          2014-11-12       0        NA             2951
1  2014-01-13          2015-02-13       0        NA             2951
1  2014-01-13          2017-02-14       1        2017-02-14     1128
1  2014-01-13          2018-12-13       1        2018-12-13     1795
2  2013-12-02          2013-12-02       0        NA             2993
2  2013-12-02          2014-05-02       0        NA             2993

期望的结果是，ID 1 的所有relationship_length都为1128（第一个defected=1）。对于ID 2，它是2993，因为该客户从未缺陷。

希望这解释了我的问题，期待回答。

英文:

I want to calculate the relationship length per customer ID, however the relationshiplength should be the same for every purchase using the ID.
This length is either defection date - first date (if defection is 1) OR 2202-02-11 - first date (if defection = 0).
Now it generates for every entry seperately, instead of the ID.

## Relationship length
dataset$relationship_length &lt;- if_else(dataset$defected == 1, as.numeric(dataset$defection_date - dataset$first_donation_date), 
                                       as.numeric(as.Date(&quot;2022-02-11&quot;) - dataset$first_donation_date))

Below one can see the current dataset and the desired outcome.

ID first_donation_date date_of_donation defected defection_date relationship_length
1  2014-01-13          2014-01-13       0        NA             2951
1  2014-01-13          2014-04-14       0        NA             2951
1  2014-01-13          2014-08-13       0        NA             2951
1  2014-01-13          2014-09-12       0        NA             2951
1  2014-01-13          2014-11-12       0        NA             2951
1  2014-01-13          2015-02-13       0        NA             2951
1  2014-01-13          2017-02-14       1        2017-02-14     1128
1  2014-01-13          2018-12-13       1        2018-12-13     1795
2  2013-12-02          2013-12-02       0        NA             2993
2  2013-12-02          2014-05-02       0        NA             2993

The desired outcome is that all relationship_length for ID 1 are 1128 (the first defected=1). And for ID 2 it is 2993, as the customer never defected.

I hope this explains my question and i look forward to the answers

答案1

得分: 0

以下是您要翻译的代码部分：

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(relationship_length = ifelse(any(defected == 1), 
                               (defection_date[defected == 1] - first_donation_date)[defected == 1][1], 
                               as.Date("2022-02-11") - first_donation_date))
# or with newer dplyr versions using .by:
df %>%
  mutate(relationship_length = ifelse(any(defected == 1), 
                               (defection_date[defected == 1] - first_donation_date)[defected == 1][1], 
                               as.Date("2022-02-11") - first_donation_date),
         .by = ID)

输出部分：

      ID first_donation_date date_of_donation defected defection_date relationship_length
   <int> <date>              <date>              <int> <date>                       <dbl>
 1     1 2014-01-13          2014-01-13              0 NA                            1128
 2     1 2014-01-13          2014-04-14              0 NA                            1128
 3     1 2014-01-13          2014-08-13              0 NA                            1128
 4     1 2014-01-13          2014-09-12              0 NA                            1128
 5     1 2014-01-13          2014-11-12              0 NA                            1128
 6     1 2014-01-13          2015-02-13              0 NA                            1128
 7     1 2014-01-13          2017-02-14              1 2017-02-14                    1128
 8     1 2014-01-13          2018-12-13              1 2018-12-13                    1128
 9     2 2013-12-02          2013-12-02              0 NA                            2993
10     2 2013-12-02          2014-05-02              0 NA                            2993

数据部分：

df <- read.table(text = "ID first_donation_date date_of_donation defected defection_date relationship_length
1  2014-01-13          2014-01-13       0        NA             2951
1  2014-01-13          2014-04-14       0        NA             2951
1  2014-01-13          2014-08-13       0        NA             2951
1  2014-01-13          2014-09-12       0        NA             2951
1  2014-01-13          2014-11-12       0        NA             2951
1  2014-01-13          2015-02-13       0        NA             2951
1  2014-01-13          2017-02-14       1        2017-02-14     1128
1  2014-01-13          2018-12-13       1        2018-12-13     1795
2  2013-12-02          2013-12-02       0        NA             2993
2  2013-12-02          2014-05-02       0        NA             2993", h = TRUE)
df[c(2,3,5)] <- lapply(df[c(2,3,5)], as.Date)
df <- df[-6]

英文:

There may be more elegant solutions, but here is one dplyr approach to apply duration to all rows in the group based on the first instance of defected == 1:

library(dplyr)
df %&gt;%
  group_by(ID) %&gt;%
  mutate(relationship_length = ifelse(any(defected == 1), 
                               (defection_date[defected == 1] - first_donation_date)[defected == 1][1], 
                               as.Date(&quot;2022-02-11&quot;) - first_donation_date))
# or with newer `dplyr` versions using `.by`:
df %&gt;%
  mutate(relationship_length = ifelse(any(defected == 1), 
                               (defection_date[defected == 1] - first_donation_date)[defected == 1][1], 
                               as.Date(&quot;2022-02-11&quot;) - first_donation_date),
         .by = ID)

Output:

      ID first_donation_date date_of_donation defected defection_date relationship_length
   &lt;int&gt; &lt;date&gt;              &lt;date&gt;              &lt;int&gt; &lt;date&gt;                       &lt;dbl&gt;
 1     1 2014-01-13          2014-01-13              0 NA                            1128
 2     1 2014-01-13          2014-04-14              0 NA                            1128
 3     1 2014-01-13          2014-08-13              0 NA                            1128
 4     1 2014-01-13          2014-09-12              0 NA                            1128
 5     1 2014-01-13          2014-11-12              0 NA                            1128
 6     1 2014-01-13          2015-02-13              0 NA                            1128
 7     1 2014-01-13          2017-02-14              1 2017-02-14                    1128
 8     1 2014-01-13          2018-12-13              1 2018-12-13                    1128
 9     2 2013-12-02          2013-12-02              0 NA                            2993
10     2 2013-12-02          2014-05-02              0 NA                            2993

Data

df &lt;- read.table(text = &quot;ID first_donation_date date_of_donation defected defection_date relationship_length
1  2014-01-13          2014-01-13       0        NA             2951
1  2014-01-13          2014-04-14       0        NA             2951
1  2014-01-13          2014-08-13       0        NA             2951
1  2014-01-13          2014-09-12       0        NA             2951
1  2014-01-13          2014-11-12       0        NA             2951
1  2014-01-13          2015-02-13       0        NA             2951
1  2014-01-13          2017-02-14       1        2017-02-14     1128
1  2014-01-13          2018-12-13       1        2018-12-13     1795
2  2013-12-02          2013-12-02       0        NA             2993
2  2013-12-02          2014-05-02       0        NA             2993&quot;, h = TRUE)
df[c(2,3,5)] &lt;- lapply(df[c(2,3,5)], as.Date)
df &lt;- df[-6]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

if语句按组ID执行。

问题

答案1

尝试对已经被标记化的数据框对象进行子集和/或筛选（基本上是分离）。

如何同时改变多个因素的水平

在另一个数据框中匹配行和列中的数值。

R ggplot2: 用 aes() 替代 aes_string()，并在 varname 可能为 NULL 时使用 .data[[varname]]

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。