2023年4月4日 15:11:40go评论97阅读模式

英文:

How to create a sequence of repeating numbers in a column based on a starting number?

问题

library(tidyverse)
treatment <- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
desired <- c(0,0,0,0,1,1,1,2,2,2,3,3,3,4,4)
df_treatment <- tibble(treatment, desired)
df <- df_treatment %>% 
  mutate(date = seq(as_date("2016-01-01"), as_date("2016-01-15") , by= "day"))

我的目标是从df tibble中获取desired列。当然，我希望以编程方式实现这一点。此外，我希望能够灵活地控制每个数字重复的次数。例如，我可能想将数字重复4次，而不是3次。

虽然这可能看起来是一个奇怪的问题，但我正在尝试找到在更大的数据集中获取“距离治疗的时间”列的最佳方法。我目前的想法是创建一个从治疗开始日期开始的数字序列。desired列中的每个唯一数字都将是一个箱子，而每个唯一数字重复的次数将是每个箱子中的观测次数。

但出现了某种原因，当我尝试创建类似的东西时，似乎无法使数字从正确的位置开始：

df %>% 
  mutate(desired_attempt = ifelse(date >= as_date("2016-01-05"), rep(1:4, each = 3), 0))

英文:

Consider the following:

library(tidyverse)
treatment &lt;- c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0)
desired &lt;- c(0,0,0,0,1,1,1,2,2,2,3,3,3,4,4)
df_treatment &lt;- tibble(treatment, desired)
df &lt;- df_treatment %&gt;% 
  mutate(date = seq(as_date(&quot;2016-01-01&quot;), as_date(&quot;2016-01-15&quot;) , by= &quot;day&quot;))

My goal is to get the desired column in the df tibble. Of course, I would like to get this programmatically. In addition, I would like to be flexible with the number of times each number is repeating. For instance, I may want to change the numbers to repeating 4 times instead of 3.

While this may seem like a strange question, I am trying to find the best way to get a "time past/to treatment" column in a larger data set. My idea right now is to create a sequence of numbers starting with the date the treatment starts. Each unique number in the desired column would be a bin, while the number of times each unique number repeats would be the number of observation in each bin.

For some reason, when I try to create something like this, I can't seem to get the numbers to start in the correct place:

df %&gt;% 
  mutate(desired_attempt = ifelse(date &gt;= as_date(&quot;2016-01-05&quot;), rep(1:4, each = 3), 0))

答案1

得分: 1

a <- cumsum(treatment)
b <- sum(a)
replace(treatment, a > 0, rep(seq_len(b), each = 3, length = b))
[1] 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4

英文:

a &lt;- cumsum(treatment)
b &lt;- sum(a)
replace(treatment, a&gt;0, rep(seq_len(b), each=3, length = b))
[1] 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4

答案2

得分: 0

df %>% 
  group_by(treatment_started = date >= "2016-01-03") %>% 
  mutate(desired_attempt = if (first(treatment_started)) rep(1:1e3, each = 3, length.out = n()) else 0)

# A tibble: 15 × 5
# Groups:   treatment_started [2]
   treatment desired date       treatment_started desired_attempt
       <dbl>   <dbl> <date>     <lgl>                       <dbl>
 1         0       0 2016-01-01 FALSE                           0
 2         0       0 2016-01-02 FALSE                           0
 3         0       0 2016-01-03 TRUE                            1
 4         0       0 2016-01-04 TRUE                            1
 5         1       1 2016-01-05 TRUE                            1
 6         0       1 2016-01-06 TRUE                            2
 7         0       1 2016-01-07 TRUE                            2
 8         0       2 2016-01-08 TRUE                            2
 9         0       2 2016-01-09 TRUE                            3
10         0       2 2016-01-10 TRUE                            3
11         0       3 2016-01-11 TRUE                            3
12         0       3 2016-01-12 TRUE                            4
13         0       3 2016-01-13 TRUE                            4
14         0       4 2016-01-14 TRUE                            4
15         0       4 2016-01-15 TRUE                            5

Or with base R:

df$desired_attempt <- 0
df$desired_attempt[df$date >= "2016-01-03"] <- rep(1:1e3, each = 3, length.out = sum(df$date >= "2016-01-03"))

英文:

df %&gt;% 
  group_by(treatment_started = date &gt;= &quot;2016-01-03&quot;) %&gt;% 
  mutate(desired_attempt = if (first(treatment_started)) rep(1:1e3, each = 3, length.out = n()) else 0)

# A tibble: 15 &#215; 5
# Groups:   treatment_started [2]
   treatment desired date       treatment_started desired_attempt
       &lt;dbl&gt;   &lt;dbl&gt; &lt;date&gt;     &lt;lgl&gt;                       &lt;dbl&gt;
 1         0       0 2016-01-01 FALSE                           0
 2         0       0 2016-01-02 FALSE                           0
 3         0       0 2016-01-03 TRUE                            1
 4         0       0 2016-01-04 TRUE                            1
 5         1       1 2016-01-05 TRUE                            1
 6         0       1 2016-01-06 TRUE                            2
 7         0       1 2016-01-07 TRUE                            2
 8         0       2 2016-01-08 TRUE                            2
 9         0       2 2016-01-09 TRUE                            3
10         0       2 2016-01-10 TRUE                            3
11         0       3 2016-01-11 TRUE                            3
12         0       3 2016-01-12 TRUE                            4
13         0       3 2016-01-13 TRUE                            4
14         0       4 2016-01-14 TRUE                            4
15         0       4 2016-01-15 TRUE                            5

Or with base R:

df$desired_attempt &lt;- 0
df$desired_attempt[df$date &gt;= &quot;2016-01-03&quot;] &lt;- rep(1:1e3, each = 3, length.out = sum(df$date &gt;= &quot;2016-01-03&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何基于一个起始数字在一列中创建重复数字序列？

问题

答案1

答案2

从频率计数创建观察

在R中创建日期的正态分布。

How to change text colour of links in navbar header AND links in nav pills (in shiny app)?

快速在R中按年份拆分数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。