英文:
How to create a sequence of repeating numbers in a column based on a starting number?
问题
library(tidyverse)
treatment <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
desired <- c(0,0,0,0,1,1,1,2,2,2,3,3,3,4,4)
df_treatment <- tibble(treatment, desired)
df <- df_treatment %>%
mutate(date = seq(as_date("2016-01-01"), as_date("2016-01-15") , by= "day"))
我的目标是从df
tibble中获取desired
列。当然,我希望以编程方式实现这一点。此外,我希望能够灵活地控制每个数字重复的次数。例如,我可能想将数字重复4次,而不是3次。
虽然这可能看起来是一个奇怪的问题,但我正在尝试找到在更大的数据集中获取“距离治疗的时间”列的最佳方法。我目前的想法是创建一个从治疗开始日期开始的数字序列。desired
列中的每个唯一数字都将是一个箱子,而每个唯一数字重复的次数将是每个箱子中的观测次数。
但出现了某种原因,当我尝试创建类似的东西时,似乎无法使数字从正确的位置开始:
df %>%
mutate(desired_attempt = ifelse(date >= as_date("2016-01-05"), rep(1:4, each = 3), 0))
英文:
Consider the following:
library(tidyverse)
treatment <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
desired <- c(0,0,0,0,1,1,1,2,2,2,3,3,3,4,4)
df_treatment <- tibble(treatment, desired)
df <- df_treatment %>%
mutate(date = seq(as_date("2016-01-01"), as_date("2016-01-15") , by= "day"))
My goal is to get the desired
column in the df
tibble. Of course, I would like to get this programmatically. In addition, I would like to be flexible with the number of times each number is repeating. For instance, I may want to change the numbers to repeating 4 times instead of 3.
While this may seem like a strange question, I am trying to find the best way to get a "time past/to treatment" column in a larger data set. My idea right now is to create a sequence of numbers starting with the date the treatment starts. Each unique number in the desired
column would be a bin, while the number of times each unique number repeats would be the number of observation in each bin.
For some reason, when I try to create something like this, I can't seem to get the numbers to start in the correct place:
df %>%
mutate(desired_attempt = ifelse(date >= as_date("2016-01-05"), rep(1:4, each = 3), 0))
答案1
得分: 1
a <- cumsum(treatment)
b <- sum(a)
replace(treatment, a > 0, rep(seq_len(b), each = 3, length = b))
[1] 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4
英文:
a <- cumsum(treatment)
b <- sum(a)
replace(treatment, a>0, rep(seq_len(b), each=3, length = b))
[1] 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4
答案2
得分: 0
df %>%
group_by(treatment_started = date >= "2016-01-03") %>%
mutate(desired_attempt = if (first(treatment_started)) rep(1:1e3, each = 3, length.out = n()) else 0)
# A tibble: 15 × 5
# Groups: treatment_started [2]
treatment desired date treatment_started desired_attempt
<dbl> <dbl> <date> <lgl> <dbl>
1 0 0 2016-01-01 FALSE 0
2 0 0 2016-01-02 FALSE 0
3 0 0 2016-01-03 TRUE 1
4 0 0 2016-01-04 TRUE 1
5 1 1 2016-01-05 TRUE 1
6 0 1 2016-01-06 TRUE 2
7 0 1 2016-01-07 TRUE 2
8 0 2 2016-01-08 TRUE 2
9 0 2 2016-01-09 TRUE 3
10 0 2 2016-01-10 TRUE 3
11 0 3 2016-01-11 TRUE 3
12 0 3 2016-01-12 TRUE 4
13 0 3 2016-01-13 TRUE 4
14 0 4 2016-01-14 TRUE 4
15 0 4 2016-01-15 TRUE 5
Or with base R:
df$desired_attempt <- 0
df$desired_attempt[df$date >= "2016-01-03"] <- rep(1:1e3, each = 3, length.out = sum(df$date >= "2016-01-03"))
英文:
df %>%
group_by(treatment_started = date >= "2016-01-03") %>%
mutate(desired_attempt = if (first(treatment_started)) rep(1:1e3, each = 3, length.out = n()) else 0)
# A tibble: 15 × 5
# Groups: treatment_started [2]
treatment desired date treatment_started desired_attempt
<dbl> <dbl> <date> <lgl> <dbl>
1 0 0 2016-01-01 FALSE 0
2 0 0 2016-01-02 FALSE 0
3 0 0 2016-01-03 TRUE 1
4 0 0 2016-01-04 TRUE 1
5 1 1 2016-01-05 TRUE 1
6 0 1 2016-01-06 TRUE 2
7 0 1 2016-01-07 TRUE 2
8 0 2 2016-01-08 TRUE 2
9 0 2 2016-01-09 TRUE 3
10 0 2 2016-01-10 TRUE 3
11 0 3 2016-01-11 TRUE 3
12 0 3 2016-01-12 TRUE 4
13 0 3 2016-01-13 TRUE 4
14 0 4 2016-01-14 TRUE 4
15 0 4 2016-01-15 TRUE 5
Or with base R:
df$desired_attempt <- 0
df$desired_attempt[df$date >= "2016-01-03"] <- rep(1:1e3, each = 3, length.out = sum(df$date >= "2016-01-03"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论