计算向量中连续日期之间的滞后差异。

huangapple go评论66阅读模式
英文:

Calculate the lagged differences between consecutive dates in vectors

问题

给定样本数据集如下:

v = data.frame(group = c(1,1,2,3,3),date = as.Date(c('01-01-2000','01-01-2001','01-05-2000','02-07-2000','01-01-2008'), "%d-%m-%Y"))
  
v%>% group_by(group ) %>% mutate(difference_day = ifelse(n() == 2, 
                                                         c(0,diff(date )),
                                                         difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')))

我期望的结果是:

差异天数
1 0
1 365
2 7915
3 0
3 2740

在上述代码中,如果组的长度等于一,则days_difference将是difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days'))

然而,
代码的输出是:

# A tibble: 5 × 3
# Groups:   group [3]
  group date       difference_day
  <dbl> <date>              <dbl>
1     1 2000-01-01              0
2     1 2001-01-01              0
3     2 2000-05-01          -7914
4     3 2000-07-02              0
5     3 2008-01-01              0

这非常奇怪。
请给我一些建议,谢谢!

英文:

The sample dataset is given as below:

v = data.frame(group = c(1,1,2,3,3),date = as.Date(c('01-01-2000','01-01-2001','01-05-2000','02-07-2000','01-01-2008'), "%d-%m-%Y"))
  
v%>% group_by(group ) %>% mutate(difference_day = ifelse(n() == 2, 
                                                         c(0,diff(date )),
                                                         difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')))

My desirable result is :

group difference_day
1 0
1 365
2 7915
3 0
3 2740

In the above code, if the length of groups is equal to one, then the days_difference will be
difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')).

However,
the output of the code was:

# A tibble: 5 × 3
# Groups:   group [3]
  group date       difference_day
  <dbl> <date>              <dbl>
1     1 2000-01-01              0
2     1 2001-01-01              0
3     2 2000-05-01          -7914
4     3 2000-07-02              0
5     3 2008-01-01              0

which was very strange.
Please give me some suggestions, thank you!

答案1

得分: 1

你想要替换第一个向量或第二个向量,应使用 if 而不是 if_else。(也就是说,您的条件是外部的,而不是元素级别的条件,if_else 更适合用于元素级别条件。)

v %>%
  group_by(group) %>%
  mutate(d = if (n() == 2L) diff(c(date[1], date)) else difftime(as.Date("2021-12-31"), date, units = "days")) %>%
  ungroup()
# # A tibble: 5 × 3
#   group date       d        
#   <dbl> <date>     <drtn>   
# 1     1 2000-01-01    0 days
# 2     1 2001-01-01  366 days
# 3     2 2000-05-01 7914 days
# 4     3 2000-07-02    0 days
# 5     3 2008-01-01 2739 days

对于预期输出与实际输出之间的+/- 1 差异,不确定是否是拼写错误或其他目的,而不是传统的 diff

这里 diffdifftime 的返回值都是类 difftime,它们在打印时自然显示为 ". days",但它们仍然足够数字,可以对它们进行数学运算等操作。如果您不喜欢这种显示方式,可以使用 as.integer(.)as.numeric(.) 进行包装。

英文:

Since you want to replace either the first vector or the second vector, use if instead of if_else. (That is, your conditional is external to the vectors, not an element-by-element conditional, where if_else would be more appropriate.)

v %&gt;%
  group_by(group) %&gt;%
  mutate(d = if (n() == 2L) diff(c(date[1], date)) else difftime(as.Date(&quot;2021-12-31&quot;), date, units = &quot;days&quot;)) %&gt;%
  ungroup()
# # A tibble: 5 &#215; 3
#   group date       d        
#   &lt;dbl&gt; &lt;date&gt;     &lt;drtn&gt;   
# 1     1 2000-01-01    0 days
# 2     1 2001-01-01  366 days
# 3     2 2000-05-01 7914 days
# 4     3 2000-07-02    0 days
# 5     3 2008-01-01 2739 days

There are some differences of +/- 1 from your expected output, not sure if that was a typo or some other intent outside of a traditional diff.

The return from both diff and difftime here are class &quot;difftime&quot;, which prints naturally with &quot;. days&quot;; they are still number-enough that math or such still works on them. If you prefer not, just wrap with as.integer(.) or as.numeric(.).

huangapple
  • 本文由 发表于 2023年4月17日 16:08:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76032955.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定