Count the number of types in the groups of data frame using R.

huangapple go评论101阅读模式
英文:

Count the number of types in the groups of data frame using R

问题

  1. 我有这样的数据:
  2. ```R
  3. data<-data.frame(is.on=c("FALSE","FALSE","FALSE","TRUE","FALSE","TRUE","FALSE","FALSE","TRUE","TRUE","TRUE","TRUE"),
  4. dur=c(10,20,30,10,10,10,10,20,10,20,30,40),
  5. dt=c(10,10,10,10,10,10,10,10,10,10,10,10),
  6. block=c(2,2,2,3,4,5,6,6,7,7,7,7),
  7. interval_block=c(1,1,1,2,2,2,3,3,3,4,4,4))

现在我想基于block创建summary_data
summary_data的行数取决于interval_block的类型数。
步骤1:

  1. # 步骤1:找到每个interval_block中block列的类型数的最大值
  2. max_types <- sapply(unique(data$interval_block), function(interval) {
  3. blocks <- unique(data[data$interval_block == interval, "block"])
  4. length(blocks)
  5. })
  6. max_num_types <- max(max_types)

对于interval_block=1,有一种类型的block。(2)
对于interval_block=2,有三种类型的block。(3,4和5)
对于interval_block=3,有两种类型的block。(6和7)
对于interval_block=4,有一种类型的block。(7)
因此,在每个interval_block中,block列的类型数的最大值是3。以上是计算这个数字的代码。基于这个数字,我想创建dur_列。所以,在这种情况下,应该有dur_1dur_2dur_3

步骤2:
确定dur_列的值。
对于interval_block=1,有一种类型的block
我想填充dur_1,并将dur_2dur_3留为0。
#(block=2在interval_block=1中)=3。因此,我想将dur_1填充为3次10=30。

对于interval_block=2,有三种类型的block
我想填充dur_1dur_2dur_3
#(block=3在interval_block=2中)=1,
#(block=4在interval_block=2中)=1,
#(block=5在interval_block=2中)=1。
因此,我想将dur_1填充为1次10=10,将dur_2填充为1次10=10,将dur_3填充为1次10=10。

对于interval_block=3,有两种类型的block
我想填充dur_1dur_2并将dur_3留为0。
#(block=6在interval_block=3中)=2,
#(block=7在interval_block=3中)=1,
因此,我想将dur_1填充为2次10=20,将dur_2填充为1次10=10,将dur_3留为0。

对于interval_block=4,有一种类型的block
我想填充dur_1并将dur_2dur_3留为0。
#(block=7在interval_block=4中)=3。
因此,我想将dur_1填充为3次10=30,将dur_2dur_3留为0。

我描述了规则很长,但基本上都是关于计算interval_block内类型的数量并乘以10。

我的期望输出应该是这样的:

  1. summary_data<-data.frame(dur_1=c(30,10,20,30),
  2. dur_2=c(0,10,10,0),
  3. dur_3=c(0,10,10,0),
  4. interval_block=c(1,2,3,4))

我不知道如何在R中编写代码。

为了澄清:
第一行:有3个block=2(一种类型)。因为只有一种类型,所以我们只填充dur_1,填充3次10。
第二行:有1个block=3,1个block=4和1个block=5(三种类型)。因为有三种类型,我们将dur_1dur_2dur_3分别填充1次10,1次10,1次10。

第三行:
有2个block=6,1个block=7(两种类型)。因为有两种类型,我们将dur_1dur_2分别填充2次10,1次10。```

英文:

I have a data like this:

  1. data<-data.frame(is.on=c("FALSE","FALSE","FALSE","TRUE","FALSE","TRUE","FALSE","FALSE","TRUE","TRUE","TRUE","TRUE"),
  2. dur=c(10,20,30,10,10,10,10,20,10,20,30,40),
  3. dt=c(10,10,10,10,10,10,10,10,10,10,10,10),
  4. block=c(2,2,2,3,4,5,6,6,7,7,7,7),
  5. interval_block=c(1,1,1,2,2,2,3,3,3,4,4,4))

Now I want to make summary_data based on block.
The number of rows of summary_data is the number of types of interval_block.
step1:

  1. # Step 1: Find the maximum number of types for block column within each interval_block
  2. max_types <- sapply(unique(data$interval_block), function(interval) {
  3. blocks <- unique(data[data$interval_block == interval, "block"])
  4. length(blocks)
  5. })
  6. max_num_types <- max(max_types)

For interval_block=1, there is one type of block. (2)
For interval_block=2, there are three types of block. (3,4 and 5)
For interval_block=3, there are two types of block. (6 and 7)
For interval_block=4, there is one type of block. (7)
So the maximum number of types for block column within each interval_block is 3. And the above is the code to calculate that number. Based on this number, I want to make dur_ columns. So, in this case, There should be dur_1,dur_2 and dur_3.

Step2:
Decide the values of dur_ columns.
For interval_block=1, there is one type of block.
I want to fill dur_1 and leave dur_2 and dur_3 as 0.
#(block=2 within interval_block=1)=3. So, I want to fill dur_1 as 3 times 10=30.

For interval_block=2,there are three types of block.
I want to fill dur_1, dur_2 and dur_3.
#(block=3 within interval_block=2)=1,
#(block=4 within interval_block=2)=1,
#(block=5 within interval_block=2)=1.
So, I want to fill dur_1 as 1 times 10=10, dur_2 as 1 times 10=10 and dur_3 as 1 times 10=10.

For interval_block=3,there are two types of block.
I want to fill dur_1, dur_2 and leave dur_3 as 0.
#(block=6 within interval_block=3)=2,
#(block=7 within interval_block=3)=1,
So, I want to fill dur_1 as 2 times 10=20, dur_2 as 1 times 10=10 and dur_3 as 0.

For interval_block=4,there is one type of block.
I want to fill dur_1 and leave dur_2 and dur_3 as 0.
#(block=7 within interval_block=4)=3.
So, I want to fill dur_1 as 3 times 10=10, dur_2 and dur_3 as 0.

I described the rules quite long, but basically it is all about counting the number of types within interval_block and multiply to 10.
My expected output should look like this:

  1. summary_data<-data.frame(dur_1=c(30,10,20,30),
  2. dur_2=c(0,10,10,0),
  3. dur_3=c(0,10,10,0),
  4. interval_block=c(1,2,3,4))

I don't know how to code in R.

For clarification.
First row: there are 3 block=2 (one type). Sine one type, we fill only dur_1 with 3 times 10.
Second row, there are 1 block=3 , 1 block=4 and 1 block=5 (three types). Since three types, we fill dur_1,dur_2 and dur_3 with 1 times 10, 1 times 10, 1 times 10 respectively.

Third row:
there are 2 block=6 , 1 block=7 (two types). Since two types, we fill dur_1,dur_2 with 2 times 10, 1 times 10 respectively.

答案1

得分: 1

利用 {dplyr} 和 {tidyr},你可以执行以下操作:

  1. library(dplyr)
  2. library(tidyr)
  3. data |>
  4. group_by(interval_block) |>
  5. mutate(ID = row_number(),
  6. dur = block |> as.factor() |> as.integer(),
  7. dur = 1 + dur - min(dur),
  8. dur_names = paste0('dur_', dur),
  9. dur_values = 10 * dur
  10. ) |>
  11. group_by(interval_block, dur_names) |>
  12. summarise(dur_values = sum(dur_values)) |>
  13. pivot_wider(names_from = dur_names, values_from = dur_values) |>
  14. mutate(across(everything(), ~ ifelse(is.na(.x), 0, .x))) |>
  15. select(starts_with('dur'), interval_block)
  1. # A tibble: 4 x 4
  2. # Groups: interval_block [4]
  3. dur_1 dur_2 dur_3 interval_block
  4. <dbl> <dbl> <dbl> <dbl>
  5. 1 30 0 0 1
  6. 2 10 20 30 2
  7. 3 20 20 0 3
  8. 4 30 0 0 4

编辑:
另一种略显奇特的基本 R 选择:

  1. data |>
  2. split(data$interval_block) |>
  3. Map(f = \(x) {
  4. max_blocks = with(data, max(table(interval_block, block)))
  5. dur <- table(x$block)
  6. `[<-`(integer(max_blocks), seq_along(dur), 10 * dur)
  7. }) |>
  8. Reduce(f = rbind) |>
  9. cbind(unique(data$interval_block)) |>
  10. as.data.frame(row.names = FALSE) |>
  11. setNames(nm = c(paste0('dur_', 1:3), 'interval block'))

'[<-' 用于零填充,参见 这里

英文:

Taking advantage of {dplyr} and {tidyr}, you could do the following:

  1. library(dplyr)
  2. library(tidyr)
  3. data |>
  4. group_by(interval_block) |>
  5. mutate(ID = row_number(),
  6. dur = block |> as.factor() |> as.integer(),
  7. dur = 1 + dur - min(dur),
  8. dur_names = paste0('dur_', dur),
  9. dur_values = 10 * dur
  10. ) |>
  11. group_by(interval_block, dur_names) |>
  12. summarise(dur_values = sum(dur_values)) |>
  13. pivot_wider(names_from = dur_names, values_from = dur_values) |>
  14. mutate(across(everything(), ~ ifelse(is.na(.x), 0, .x))) |>
  15. select(starts_with('dur'), interval_block)
  1. # A tibble: 4 x 4
  2. # Groups: interval_block [4]
  3. dur_1 dur_2 dur_3 interval_block
  4. <dbl> <dbl> <dbl> <dbl>
  5. 1 30 0 0 1
  6. 2 10 20 30 2
  7. 3 20 20 0 3
  8. 4 30 0 0 4

Edit:
a slightly esoteric alternative with base R:

  1. data |>
  2. split(data$interval_block) |>
  3. Map(f = \(x) {
  4. max_blocks = with(data, max(table(interval_block, block)))
  5. dur <- table(x$block)
  6. `[<-`(integer(max_blocks), seq_along(dur), 10 * dur)
  7. }) |>
  8. Reduce(f = rbind) |>
  9. cbind(unique(data$interval_block)) |>
  10. as.data.frame(row.names = FALSE) |>
  11. setNames(nm = c(paste0('dur_', 1:3), 'interval block'))

'[<-' for zero-padding taken from here

答案2

得分: 1

利用 base R,首先通过计算独特的组块计数,然后对数据进行聚合并重新塑造成最终格式并进行清理:

  1. # 添加独特块组编号的列
  2. data <- within(
  3. data, {
  4. dur_num <- ave(
  5. block,
  6. interval_block,
  7. FUN=function(x) as.integer(factor(x))
  8. )
  9. }
  10. )
  11. # 按独特块在时间间隔块内聚合
  12. agg_df <- aggregate(
  13. dt ~ dur_num + interval_block,
  14. data,
  15. FUN = sum
  16. )
  17. # 重新塑造数据为宽格式
  18. wide_df <- reshape(
  19. agg_df,
  20. idvar = "interval_block",
  21. timevar = "dur_num",
  22. v.names = "dt",
  23. direction = "wide",
  24. sep = "_"
  25. )
  26. # 清理数据
  27. wide_df[is.na(wide_df)] = 0
  28. row.names(wide_df) <- 1:nrow(wide_df)
  29. colnames(wide_df) <- gsub(
  30. "dt_", "dur_", colnames(wide_df), fixed=TRUE
  31. )
  32. wide_df
  33. interval_block dur_1 dur_2 dur_3
  34. 1 1 30 0 0
  35. 2 2 10 10 10
  36. 3 3 20 10 0
  37. 4 4 30 0 0

在线演示

英文:

Take advantage of base R by first calculating a unique group block count then aggregate the data and reshape it to final format with cleanup:

  1. # ADD COLUMN FOR UNIQUE BLOCK GROUP NUM
  2. data &lt;- within(
  3. data, {
  4. dur_num &lt;- ave(
  5. block,
  6. interval_block,
  7. FUN=\(x) as.integer(factor(x))
  8. )
  9. }
  10. )
  11. # AGGREGATE BY UNIQUE BLOCKS WITHIN INTERVAL BLOCK
  12. agg_df &lt;- aggregate(
  13. dt ~ dur_num + interval_block,
  14. data,
  15. FUN = sum
  16. )
  17. # RESHAPE WIDE
  18. wide_df &lt;- reshape(
  19. agg_df,
  20. idvar = &quot;interval_block&quot;,
  21. timevar = &quot;dur_num&quot;,
  22. v.names = &quot;dt&quot;,
  23. direction = &quot;wide&quot;,
  24. sep = &quot;_&quot;
  25. )
  26. # CLEAN UP
  27. wide_df[is.na(wide_df)] = 0
  28. row.names(wide_df) &lt;- 1:nrow(wide_df)
  29. colnames(wide_df) &lt;- gsub(
  30. &quot;dt_&quot;, &quot;dur_&quot;, colnames(wide_df), fixed=TRUE
  31. )
  32. wide_df
  33. interval_block dur_1 dur_2 dur_3
  34. 1 1 30 0 0
  35. 2 2 10 10 10
  36. 3 3 20 10 0
  37. 4 4 30 0 0

<kbd>Online Demo</kbd>

huangapple
  • 本文由 发表于 2023年5月21日 21:23:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76300122.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定