如何基于一个起始数字在一列中创建重复数字序列?

huangapple go评论97阅读模式
英文:

How to create a sequence of repeating numbers in a column based on a starting number?

问题

  1. library(tidyverse)
  2. treatment <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
  3. desired <- c(0,0,0,0,1,1,1,2,2,2,3,3,3,4,4)
  4. df_treatment <- tibble(treatment, desired)
  5. df <- df_treatment %>%
  6. mutate(date = seq(as_date("2016-01-01"), as_date("2016-01-15") , by= "day"))

我的目标是从df tibble中获取desired列。当然,我希望以编程方式实现这一点。此外,我希望能够灵活地控制每个数字重复的次数。例如,我可能想将数字重复4次,而不是3次。

虽然这可能看起来是一个奇怪的问题,但我正在尝试找到在更大的数据集中获取“距离治疗的时间”列的最佳方法。我目前的想法是创建一个从治疗开始日期开始的数字序列。desired列中的每个唯一数字都将是一个箱子,而每个唯一数字重复的次数将是每个箱子中的观测次数。

但出现了某种原因,当我尝试创建类似的东西时,似乎无法使数字从正确的位置开始:

  1. df %>%
  2. mutate(desired_attempt = ifelse(date >= as_date("2016-01-05"), rep(1:4, each = 3), 0))
英文:

Consider the following:

  1. library(tidyverse)
  2. treatment &lt;- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
  3. desired &lt;- c(0,0,0,0,1,1,1,2,2,2,3,3,3,4,4)
  4. df_treatment &lt;- tibble(treatment, desired)
  5. df &lt;- df_treatment %&gt;%
  6. mutate(date = seq(as_date(&quot;2016-01-01&quot;), as_date(&quot;2016-01-15&quot;) , by= &quot;day&quot;))

My goal is to get the desired column in the df tibble. Of course, I would like to get this programmatically. In addition, I would like to be flexible with the number of times each number is repeating. For instance, I may want to change the numbers to repeating 4 times instead of 3.

While this may seem like a strange question, I am trying to find the best way to get a "time past/to treatment" column in a larger data set. My idea right now is to create a sequence of numbers starting with the date the treatment starts. Each unique number in the desired column would be a bin, while the number of times each unique number repeats would be the number of observation in each bin.

For some reason, when I try to create something like this, I can't seem to get the numbers to start in the correct place:

  1. df %&gt;%
  2. mutate(desired_attempt = ifelse(date &gt;= as_date(&quot;2016-01-05&quot;), rep(1:4, each = 3), 0))

答案1

得分: 1

  1. a <- cumsum(treatment)
  2. b <- sum(a)
  3. replace(treatment, a > 0, rep(seq_len(b), each = 3, length = b))
  4. [1] 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4
英文:
  1. a &lt;- cumsum(treatment)
  2. b &lt;- sum(a)
  3. replace(treatment, a&gt;0, rep(seq_len(b), each=3, length = b))
  4. [1] 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4

答案2

得分: 0

  1. df %>%
  2. group_by(treatment_started = date >= "2016-01-03") %>%
  3. mutate(desired_attempt = if (first(treatment_started)) rep(1:1e3, each = 3, length.out = n()) else 0)
  1. # A tibble: 15 × 5
  2. # Groups: treatment_started [2]
  3. treatment desired date treatment_started desired_attempt
  4. <dbl> <dbl> <date> <lgl> <dbl>
  5. 1 0 0 2016-01-01 FALSE 0
  6. 2 0 0 2016-01-02 FALSE 0
  7. 3 0 0 2016-01-03 TRUE 1
  8. 4 0 0 2016-01-04 TRUE 1
  9. 5 1 1 2016-01-05 TRUE 1
  10. 6 0 1 2016-01-06 TRUE 2
  11. 7 0 1 2016-01-07 TRUE 2
  12. 8 0 2 2016-01-08 TRUE 2
  13. 9 0 2 2016-01-09 TRUE 3
  14. 10 0 2 2016-01-10 TRUE 3
  15. 11 0 3 2016-01-11 TRUE 3
  16. 12 0 3 2016-01-12 TRUE 4
  17. 13 0 3 2016-01-13 TRUE 4
  18. 14 0 4 2016-01-14 TRUE 4
  19. 15 0 4 2016-01-15 TRUE 5

Or with base R:

  1. df$desired_attempt <- 0
  2. df$desired_attempt[df$date >= "2016-01-03"] <- rep(1:1e3, each = 3, length.out = sum(df$date >= "2016-01-03"))
英文:
  1. df %&gt;%
  2. group_by(treatment_started = date &gt;= &quot;2016-01-03&quot;) %&gt;%
  3. mutate(desired_attempt = if (first(treatment_started)) rep(1:1e3, each = 3, length.out = n()) else 0)
  1. # A tibble: 15 &#215; 5
  2. # Groups: treatment_started [2]
  3. treatment desired date treatment_started desired_attempt
  4. &lt;dbl&gt; &lt;dbl&gt; &lt;date&gt; &lt;lgl&gt; &lt;dbl&gt;
  5. 1 0 0 2016-01-01 FALSE 0
  6. 2 0 0 2016-01-02 FALSE 0
  7. 3 0 0 2016-01-03 TRUE 1
  8. 4 0 0 2016-01-04 TRUE 1
  9. 5 1 1 2016-01-05 TRUE 1
  10. 6 0 1 2016-01-06 TRUE 2
  11. 7 0 1 2016-01-07 TRUE 2
  12. 8 0 2 2016-01-08 TRUE 2
  13. 9 0 2 2016-01-09 TRUE 3
  14. 10 0 2 2016-01-10 TRUE 3
  15. 11 0 3 2016-01-11 TRUE 3
  16. 12 0 3 2016-01-12 TRUE 4
  17. 13 0 3 2016-01-13 TRUE 4
  18. 14 0 4 2016-01-14 TRUE 4
  19. 15 0 4 2016-01-15 TRUE 5

Or with base R:

  1. df$desired_attempt &lt;- 0
  2. df$desired_attempt[df$date &gt;= &quot;2016-01-03&quot;] &lt;- rep(1:1e3, each = 3, length.out = sum(df$date &gt;= &quot;2016-01-03&quot;))

huangapple
  • 本文由 发表于 2023年4月4日 15:11:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75926456.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定