英文:
Calculate the lagged differences between consecutive dates in vectors
问题
给定样本数据集如下:
v = data.frame(group = c(1,1,2,3,3),date = as.Date(c('01-01-2000','01-01-2001','01-05-2000','02-07-2000','01-01-2008'), "%d-%m-%Y"))
v%>% group_by(group ) %>% mutate(difference_day = ifelse(n() == 2,
c(0,diff(date )),
difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')))
我期望的结果是:
组 | 差异天数 |
---|---|
1 | 0 |
1 | 365 |
2 | 7915 |
3 | 0 |
3 | 2740 |
在上述代码中,如果组的长度等于一,则days_difference
将是difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days'))
。
然而,
代码的输出是:
# A tibble: 5 × 3
# Groups: group [3]
group date difference_day
<dbl> <date> <dbl>
1 1 2000-01-01 0
2 1 2001-01-01 0
3 2 2000-05-01 -7914
4 3 2000-07-02 0
5 3 2008-01-01 0
这非常奇怪。
请给我一些建议,谢谢!
英文:
The sample dataset is given as below:
v = data.frame(group = c(1,1,2,3,3),date = as.Date(c('01-01-2000','01-01-2001','01-05-2000','02-07-2000','01-01-2008'), "%d-%m-%Y"))
v%>% group_by(group ) %>% mutate(difference_day = ifelse(n() == 2,
c(0,diff(date )),
difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')))
My desirable result is :
group | difference_day |
---|---|
1 | 0 |
1 | 365 |
2 | 7915 |
3 | 0 |
3 | 2740 |
In the above code, if the length of groups is equal to one, then the days_difference will be
difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days'))
.
However,
the output of the code was:
# A tibble: 5 × 3
# Groups: group [3]
group date difference_day
<dbl> <date> <dbl>
1 1 2000-01-01 0
2 1 2001-01-01 0
3 2 2000-05-01 -7914
4 3 2000-07-02 0
5 3 2008-01-01 0
which was very strange.
Please give me some suggestions, thank you!
答案1
得分: 1
你想要替换第一个向量或第二个向量,应使用 if
而不是 if_else
。(也就是说,您的条件是外部的,而不是元素级别的条件,if_else
更适合用于元素级别条件。)
v %>%
group_by(group) %>%
mutate(d = if (n() == 2L) diff(c(date[1], date)) else difftime(as.Date("2021-12-31"), date, units = "days")) %>%
ungroup()
# # A tibble: 5 × 3
# group date d
# <dbl> <date> <drtn>
# 1 1 2000-01-01 0 days
# 2 1 2001-01-01 366 days
# 3 2 2000-05-01 7914 days
# 4 3 2000-07-02 0 days
# 5 3 2008-01-01 2739 days
对于预期输出与实际输出之间的+/- 1 差异,不确定是否是拼写错误或其他目的,而不是传统的 diff
。
这里 diff
和 difftime
的返回值都是类 difftime
,它们在打印时自然显示为 ". days"
,但它们仍然足够数字,可以对它们进行数学运算等操作。如果您不喜欢这种显示方式,可以使用 as.integer(.)
或 as.numeric(.)
进行包装。
英文:
Since you want to replace either the first vector or the second vector, use if
instead of if_else
. (That is, your conditional is external to the vectors, not an element-by-element conditional, where if_else
would be more appropriate.)
v %>%
group_by(group) %>%
mutate(d = if (n() == 2L) diff(c(date[1], date)) else difftime(as.Date("2021-12-31"), date, units = "days")) %>%
ungroup()
# # A tibble: 5 × 3
# group date d
# <dbl> <date> <drtn>
# 1 1 2000-01-01 0 days
# 2 1 2001-01-01 366 days
# 3 2 2000-05-01 7914 days
# 4 3 2000-07-02 0 days
# 5 3 2008-01-01 2739 days
There are some differences of +/- 1 from your expected output, not sure if that was a typo or some other intent outside of a traditional diff
.
The return from both diff
and difftime
here are class "difftime"
, which prints naturally with ". days"
; they are still number-enough that math or such still works on them. If you prefer not, just wrap with as.integer(.)
or as.numeric(.)
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论