计算向量中连续日期之间的滞后差异。

huangapple go评论93阅读模式
英文:

Calculate the lagged differences between consecutive dates in vectors

问题

给定样本数据集如下:

  1. v = data.frame(group = c(1,1,2,3,3),date = as.Date(c('01-01-2000','01-01-2001','01-05-2000','02-07-2000','01-01-2008'), "%d-%m-%Y"))
  2. v%>% group_by(group ) %>% mutate(difference_day = ifelse(n() == 2,
  3. c(0,diff(date )),
  4. difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')))

我期望的结果是:

差异天数
1 0
1 365
2 7915
3 0
3 2740

在上述代码中,如果组的长度等于一,则days_difference将是difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days'))

然而,
代码的输出是:

  1. # A tibble: 5 × 3
  2. # Groups: group [3]
  3. group date difference_day
  4. <dbl> <date> <dbl>
  5. 1 1 2000-01-01 0
  6. 2 1 2001-01-01 0
  7. 3 2 2000-05-01 -7914
  8. 4 3 2000-07-02 0
  9. 5 3 2008-01-01 0

这非常奇怪。
请给我一些建议,谢谢!

英文:

The sample dataset is given as below:

  1. v = data.frame(group = c(1,1,2,3,3),date = as.Date(c('01-01-2000','01-01-2001','01-05-2000','02-07-2000','01-01-2008'), "%d-%m-%Y"))
  2. v%>% group_by(group ) %>% mutate(difference_day = ifelse(n() == 2,
  3. c(0,diff(date )),
  4. difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')))

My desirable result is :

group difference_day
1 0
1 365
2 7915
3 0
3 2740

In the above code, if the length of groups is equal to one, then the days_difference will be
difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')).

However,
the output of the code was:

  1. # A tibble: 5 × 3
  2. # Groups: group [3]
  3. group date difference_day
  4. <dbl> <date> <dbl>
  5. 1 1 2000-01-01 0
  6. 2 1 2001-01-01 0
  7. 3 2 2000-05-01 -7914
  8. 4 3 2000-07-02 0
  9. 5 3 2008-01-01 0

which was very strange.
Please give me some suggestions, thank you!

答案1

得分: 1

你想要替换第一个向量或第二个向量,应使用 if 而不是 if_else。(也就是说,您的条件是外部的,而不是元素级别的条件,if_else 更适合用于元素级别条件。)

  1. v %>%
  2. group_by(group) %>%
  3. mutate(d = if (n() == 2L) diff(c(date[1], date)) else difftime(as.Date("2021-12-31"), date, units = "days")) %>%
  4. ungroup()
  5. # # A tibble: 5 × 3
  6. # group date d
  7. # <dbl> <date> <drtn>
  8. # 1 1 2000-01-01 0 days
  9. # 2 1 2001-01-01 366 days
  10. # 3 2 2000-05-01 7914 days
  11. # 4 3 2000-07-02 0 days
  12. # 5 3 2008-01-01 2739 days

对于预期输出与实际输出之间的+/- 1 差异,不确定是否是拼写错误或其他目的,而不是传统的 diff

这里 diffdifftime 的返回值都是类 difftime,它们在打印时自然显示为 ". days",但它们仍然足够数字,可以对它们进行数学运算等操作。如果您不喜欢这种显示方式,可以使用 as.integer(.)as.numeric(.) 进行包装。

英文:

Since you want to replace either the first vector or the second vector, use if instead of if_else. (That is, your conditional is external to the vectors, not an element-by-element conditional, where if_else would be more appropriate.)

  1. v %&gt;%
  2. group_by(group) %&gt;%
  3. mutate(d = if (n() == 2L) diff(c(date[1], date)) else difftime(as.Date(&quot;2021-12-31&quot;), date, units = &quot;days&quot;)) %&gt;%
  4. ungroup()
  5. # # A tibble: 5 &#215; 3
  6. # group date d
  7. # &lt;dbl&gt; &lt;date&gt; &lt;drtn&gt;
  8. # 1 1 2000-01-01 0 days
  9. # 2 1 2001-01-01 366 days
  10. # 3 2 2000-05-01 7914 days
  11. # 4 3 2000-07-02 0 days
  12. # 5 3 2008-01-01 2739 days

There are some differences of +/- 1 from your expected output, not sure if that was a typo or some other intent outside of a traditional diff.

The return from both diff and difftime here are class &quot;difftime&quot;, which prints naturally with &quot;. days&quot;; they are still number-enough that math or such still works on them. If you prefer not, just wrap with as.integer(.) or as.numeric(.).

huangapple
  • 本文由 发表于 2023年4月17日 16:08:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76032955.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定