如何基于一个起始数字在一列中创建重复数字序列?

huangapple go评论68阅读模式
英文:

How to create a sequence of repeating numbers in a column based on a starting number?

问题

library(tidyverse)

treatment <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
desired <- c(0,0,0,0,1,1,1,2,2,2,3,3,3,4,4)

df_treatment <- tibble(treatment, desired)

df <- df_treatment %>% 
  mutate(date = seq(as_date("2016-01-01"), as_date("2016-01-15") , by= "day"))

我的目标是从df tibble中获取desired列。当然,我希望以编程方式实现这一点。此外,我希望能够灵活地控制每个数字重复的次数。例如,我可能想将数字重复4次,而不是3次。

虽然这可能看起来是一个奇怪的问题,但我正在尝试找到在更大的数据集中获取“距离治疗的时间”列的最佳方法。我目前的想法是创建一个从治疗开始日期开始的数字序列。desired列中的每个唯一数字都将是一个箱子,而每个唯一数字重复的次数将是每个箱子中的观测次数。

但出现了某种原因,当我尝试创建类似的东西时,似乎无法使数字从正确的位置开始:

df %>% 
  mutate(desired_attempt = ifelse(date >= as_date("2016-01-05"), rep(1:4, each = 3), 0))
英文:

Consider the following:

library(tidyverse)

treatment &lt;- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
desired &lt;- c(0,0,0,0,1,1,1,2,2,2,3,3,3,4,4)

df_treatment &lt;- tibble(treatment, desired)

df &lt;- df_treatment %&gt;% 
  mutate(date = seq(as_date(&quot;2016-01-01&quot;), as_date(&quot;2016-01-15&quot;) , by= &quot;day&quot;))

My goal is to get the desired column in the df tibble. Of course, I would like to get this programmatically. In addition, I would like to be flexible with the number of times each number is repeating. For instance, I may want to change the numbers to repeating 4 times instead of 3.

While this may seem like a strange question, I am trying to find the best way to get a "time past/to treatment" column in a larger data set. My idea right now is to create a sequence of numbers starting with the date the treatment starts. Each unique number in the desired column would be a bin, while the number of times each unique number repeats would be the number of observation in each bin.

For some reason, when I try to create something like this, I can't seem to get the numbers to start in the correct place:

df %&gt;% 
  mutate(desired_attempt = ifelse(date &gt;= as_date(&quot;2016-01-05&quot;), rep(1:4, each = 3), 0))

答案1

得分: 1

a <- cumsum(treatment)
b <- sum(a)
replace(treatment, a > 0, rep(seq_len(b), each = 3, length = b))
[1] 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4
英文:
a &lt;- cumsum(treatment)
b &lt;- sum(a)
replace(treatment, a&gt;0, rep(seq_len(b), each=3, length = b))
[1] 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4

答案2

得分: 0

df %>% 
  group_by(treatment_started = date >= "2016-01-03") %>% 
  mutate(desired_attempt = if (first(treatment_started)) rep(1:1e3, each = 3, length.out = n()) else 0)
# A tibble: 15 × 5
# Groups:   treatment_started [2]
   treatment desired date       treatment_started desired_attempt
       <dbl>   <dbl> <date>     <lgl>                       <dbl>
 1         0       0 2016-01-01 FALSE                           0
 2         0       0 2016-01-02 FALSE                           0
 3         0       0 2016-01-03 TRUE                            1
 4         0       0 2016-01-04 TRUE                            1
 5         1       1 2016-01-05 TRUE                            1
 6         0       1 2016-01-06 TRUE                            2
 7         0       1 2016-01-07 TRUE                            2
 8         0       2 2016-01-08 TRUE                            2
 9         0       2 2016-01-09 TRUE                            3
10         0       2 2016-01-10 TRUE                            3
11         0       3 2016-01-11 TRUE                            3
12         0       3 2016-01-12 TRUE                            4
13         0       3 2016-01-13 TRUE                            4
14         0       4 2016-01-14 TRUE                            4
15         0       4 2016-01-15 TRUE                            5

Or with base R:

df$desired_attempt <- 0
df$desired_attempt[df$date >= "2016-01-03"] <- rep(1:1e3, each = 3, length.out = sum(df$date >= "2016-01-03"))
英文:
df %&gt;% 
  group_by(treatment_started = date &gt;= &quot;2016-01-03&quot;) %&gt;% 
  mutate(desired_attempt = if (first(treatment_started)) rep(1:1e3, each = 3, length.out = n()) else 0)
# A tibble: 15 &#215; 5
# Groups:   treatment_started [2]
   treatment desired date       treatment_started desired_attempt
       &lt;dbl&gt;   &lt;dbl&gt; &lt;date&gt;     &lt;lgl&gt;                       &lt;dbl&gt;
 1         0       0 2016-01-01 FALSE                           0
 2         0       0 2016-01-02 FALSE                           0
 3         0       0 2016-01-03 TRUE                            1
 4         0       0 2016-01-04 TRUE                            1
 5         1       1 2016-01-05 TRUE                            1
 6         0       1 2016-01-06 TRUE                            2
 7         0       1 2016-01-07 TRUE                            2
 8         0       2 2016-01-08 TRUE                            2
 9         0       2 2016-01-09 TRUE                            3
10         0       2 2016-01-10 TRUE                            3
11         0       3 2016-01-11 TRUE                            3
12         0       3 2016-01-12 TRUE                            4
13         0       3 2016-01-13 TRUE                            4
14         0       4 2016-01-14 TRUE                            4
15         0       4 2016-01-15 TRUE                            5

Or with base R:

df$desired_attempt &lt;- 0
df$desired_attempt[df$date &gt;= &quot;2016-01-03&quot;] &lt;- rep(1:1e3, each = 3, length.out = sum(df$date &gt;= &quot;2016-01-03&quot;))

huangapple
  • 本文由 发表于 2023年4月4日 15:11:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75926456.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定