英文:
Count the number of types in the groups of data frame using R
问题
我有这样的数据:
```R
data<-data.frame(is.on=c("FALSE","FALSE","FALSE","TRUE","FALSE","TRUE","FALSE","FALSE","TRUE","TRUE","TRUE","TRUE"),
dur=c(10,20,30,10,10,10,10,20,10,20,30,40),
dt=c(10,10,10,10,10,10,10,10,10,10,10,10),
block=c(2,2,2,3,4,5,6,6,7,7,7,7),
interval_block=c(1,1,1,2,2,2,3,3,3,4,4,4))
现在我想基于block
创建summary_data
。
summary_data
的行数取决于interval_block
的类型数。
步骤1:
# 步骤1:找到每个interval_block中block列的类型数的最大值
max_types <- sapply(unique(data$interval_block), function(interval) {
blocks <- unique(data[data$interval_block == interval, "block"])
length(blocks)
})
max_num_types <- max(max_types)
对于interval_block
=1,有一种类型的block。(2)
对于interval_block
=2,有三种类型的block。(3,4和5)
对于interval_block
=3,有两种类型的block。(6和7)
对于interval_block
=4,有一种类型的block。(7)
因此,在每个interval_block
中,block
列的类型数的最大值是3。以上是计算这个数字的代码。基于这个数字,我想创建dur_
列。所以,在这种情况下,应该有dur_1
,dur_2
和dur_3
。
步骤2:
确定dur_
列的值。
对于interval_block
=1,有一种类型的block
。
我想填充dur_1
,并将dur_2
和dur_3
留为0。
#(block
=2在interval_block
=1中)=3。因此,我想将dur_1
填充为3次10=30。
对于interval_block
=2,有三种类型的block
。
我想填充dur_1
,dur_2
和dur_3
。
#(block
=3在interval_block
=2中)=1,
#(block
=4在interval_block
=2中)=1,
#(block
=5在interval_block
=2中)=1。
因此,我想将dur_1
填充为1次10=10,将dur_2
填充为1次10=10,将dur_3
填充为1次10=10。
对于interval_block
=3,有两种类型的block
。
我想填充dur_1
,dur_2
并将dur_3
留为0。
#(block
=6在interval_block
=3中)=2,
#(block
=7在interval_block
=3中)=1,
因此,我想将dur_1
填充为2次10=20,将dur_2
填充为1次10=10,将dur_3
留为0。
对于interval_block
=4,有一种类型的block
。
我想填充dur_1
并将dur_2
和dur_3
留为0。
#(block
=7在interval_block
=4中)=3。
因此,我想将dur_1
填充为3次10=30,将dur_2
和dur_3
留为0。
我描述了规则很长,但基本上都是关于计算interval_block
内类型的数量并乘以10。
我的期望输出应该是这样的:
summary_data<-data.frame(dur_1=c(30,10,20,30),
dur_2=c(0,10,10,0),
dur_3=c(0,10,10,0),
interval_block=c(1,2,3,4))
我不知道如何在R中编写代码。
为了澄清:
第一行:有3个block
=2(一种类型)。因为只有一种类型,所以我们只填充dur_1
,填充3次10。
第二行:有1个block
=3,1个block
=4和1个block
=5(三种类型)。因为有三种类型,我们将dur_1
,dur_2
和dur_3
分别填充1次10,1次10,1次10。
第三行:
有2个block
=6,1个block
=7(两种类型)。因为有两种类型,我们将dur_1
,dur_2
分别填充2次10,1次10。```
英文:
I have a data like this:
data<-data.frame(is.on=c("FALSE","FALSE","FALSE","TRUE","FALSE","TRUE","FALSE","FALSE","TRUE","TRUE","TRUE","TRUE"),
dur=c(10,20,30,10,10,10,10,20,10,20,30,40),
dt=c(10,10,10,10,10,10,10,10,10,10,10,10),
block=c(2,2,2,3,4,5,6,6,7,7,7,7),
interval_block=c(1,1,1,2,2,2,3,3,3,4,4,4))
Now I want to make summary_data
based on block
.
The number of rows of summary_data
is the number of types of interval_block
.
step1:
# Step 1: Find the maximum number of types for block column within each interval_block
max_types <- sapply(unique(data$interval_block), function(interval) {
blocks <- unique(data[data$interval_block == interval, "block"])
length(blocks)
})
max_num_types <- max(max_types)
For interval_block
=1, there is one type of block. (2)
For interval_block
=2, there are three types of block. (3,4 and 5)
For interval_block
=3, there are two types of block. (6 and 7)
For interval_block
=4, there is one type of block. (7)
So the maximum number of types for block
column within each interval_block
is 3. And the above is the code to calculate that number. Based on this number, I want to make dur_
columns. So, in this case, There should be dur_1
,dur_2
and dur_3
.
Step2:
Decide the values of dur_
columns.
For interval_block
=1, there is one type of block
.
I want to fill dur_1
and leave dur_2
and dur_3
as 0.
#(block
=2 within interval_block
=1)=3. So, I want to fill dur_1
as 3 times 10=30.
For interval_block
=2,there are three types of block
.
I want to fill dur_1
, dur_2
and dur_3
.
#(block
=3 within interval_block
=2)=1,
#(block
=4 within interval_block
=2)=1,
#(block
=5 within interval_block
=2)=1.
So, I want to fill dur_1
as 1 times 10=10, dur_2
as 1 times 10=10 and dur_3
as 1 times 10=10.
For interval_block
=3,there are two types of block
.
I want to fill dur_1
, dur_2
and leave dur_3
as 0.
#(block
=6 within interval_block
=3)=2,
#(block
=7 within interval_block
=3)=1,
So, I want to fill dur_1
as 2 times 10=20, dur_2
as 1 times 10=10 and dur_3
as 0.
For interval_block
=4,there is one type of block
.
I want to fill dur_1
and leave dur_2
and dur_3
as 0.
#(block
=7 within interval_block
=4)=3.
So, I want to fill dur_1
as 3 times 10=10, dur_2
and dur_3
as 0.
I described the rules quite long, but basically it is all about counting the number of types within interval_block
and multiply to 10.
My expected output should look like this:
summary_data<-data.frame(dur_1=c(30,10,20,30),
dur_2=c(0,10,10,0),
dur_3=c(0,10,10,0),
interval_block=c(1,2,3,4))
I don't know how to code in R.
For clarification.
First row: there are 3 block
=2 (one type). Sine one type, we fill only dur_1
with 3 times 10.
Second row, there are 1 block
=3 , 1 block
=4 and 1 block
=5 (three types). Since three types, we fill dur_1
,dur_2
and dur_3
with 1 times 10, 1 times 10, 1 times 10 respectively.
Third row:
there are 2 block
=6 , 1 block
=7 (two types). Since two types, we fill dur_1
,dur_2
with 2 times 10, 1 times 10 respectively.
答案1
得分: 1
利用 {dplyr} 和 {tidyr},你可以执行以下操作:
library(dplyr)
library(tidyr)
data |>
group_by(interval_block) |>
mutate(ID = row_number(),
dur = block |> as.factor() |> as.integer(),
dur = 1 + dur - min(dur),
dur_names = paste0('dur_', dur),
dur_values = 10 * dur
) |>
group_by(interval_block, dur_names) |>
summarise(dur_values = sum(dur_values)) |>
pivot_wider(names_from = dur_names, values_from = dur_values) |>
mutate(across(everything(), ~ ifelse(is.na(.x), 0, .x))) |>
select(starts_with('dur'), interval_block)
# A tibble: 4 x 4
# Groups: interval_block [4]
dur_1 dur_2 dur_3 interval_block
<dbl> <dbl> <dbl> <dbl>
1 30 0 0 1
2 10 20 30 2
3 20 20 0 3
4 30 0 0 4
编辑:
另一种略显奇特的基本 R 选择:
data |>
split(data$interval_block) |>
Map(f = \(x) {
max_blocks = with(data, max(table(interval_block, block)))
dur <- table(x$block)
`[<-`(integer(max_blocks), seq_along(dur), 10 * dur)
}) |>
Reduce(f = rbind) |>
cbind(unique(data$interval_block)) |>
as.data.frame(row.names = FALSE) |>
setNames(nm = c(paste0('dur_', 1:3), 'interval block'))
'[<-'
用于零填充,参见 这里。
英文:
Taking advantage of {dplyr} and {tidyr}, you could do the following:
library(dplyr)
library(tidyr)
data |>
group_by(interval_block) |>
mutate(ID = row_number(),
dur = block |> as.factor() |> as.integer(),
dur = 1 + dur - min(dur),
dur_names = paste0('dur_', dur),
dur_values = 10 * dur
) |>
group_by(interval_block, dur_names) |>
summarise(dur_values = sum(dur_values)) |>
pivot_wider(names_from = dur_names, values_from = dur_values) |>
mutate(across(everything(), ~ ifelse(is.na(.x), 0, .x))) |>
select(starts_with('dur'), interval_block)
# A tibble: 4 x 4
# Groups: interval_block [4]
dur_1 dur_2 dur_3 interval_block
<dbl> <dbl> <dbl> <dbl>
1 30 0 0 1
2 10 20 30 2
3 20 20 0 3
4 30 0 0 4
Edit:
a slightly esoteric alternative with base R:
data |>
split(data$interval_block) |>
Map(f = \(x) {
max_blocks = with(data, max(table(interval_block, block)))
dur <- table(x$block)
`[<-`(integer(max_blocks), seq_along(dur), 10 * dur)
}) |>
Reduce(f = rbind) |>
cbind(unique(data$interval_block)) |>
as.data.frame(row.names = FALSE) |>
setNames(nm = c(paste0('dur_', 1:3), 'interval block'))
'[<-'
for zero-padding taken from here
答案2
得分: 1
利用 base
R,首先通过计算独特的组块计数,然后对数据进行聚合并重新塑造成最终格式并进行清理:
# 添加独特块组编号的列
data <- within(
data, {
dur_num <- ave(
block,
interval_block,
FUN=function(x) as.integer(factor(x))
)
}
)
# 按独特块在时间间隔块内聚合
agg_df <- aggregate(
dt ~ dur_num + interval_block,
data,
FUN = sum
)
# 重新塑造数据为宽格式
wide_df <- reshape(
agg_df,
idvar = "interval_block",
timevar = "dur_num",
v.names = "dt",
direction = "wide",
sep = "_"
)
# 清理数据
wide_df[is.na(wide_df)] = 0
row.names(wide_df) <- 1:nrow(wide_df)
colnames(wide_df) <- gsub(
"dt_", "dur_", colnames(wide_df), fixed=TRUE
)
wide_df
interval_block dur_1 dur_2 dur_3
1 1 30 0 0
2 2 10 10 10
3 3 20 10 0
4 4 30 0 0
英文:
Take advantage of base
R by first calculating a unique group block count then aggregate the data and reshape it to final format with cleanup:
# ADD COLUMN FOR UNIQUE BLOCK GROUP NUM
data <- within(
data, {
dur_num <- ave(
block,
interval_block,
FUN=\(x) as.integer(factor(x))
)
}
)
# AGGREGATE BY UNIQUE BLOCKS WITHIN INTERVAL BLOCK
agg_df <- aggregate(
dt ~ dur_num + interval_block,
data,
FUN = sum
)
# RESHAPE WIDE
wide_df <- reshape(
agg_df,
idvar = "interval_block",
timevar = "dur_num",
v.names = "dt",
direction = "wide",
sep = "_"
)
# CLEAN UP
wide_df[is.na(wide_df)] = 0
row.names(wide_df) <- 1:nrow(wide_df)
colnames(wide_df) <- gsub(
"dt_", "dur_", colnames(wide_df), fixed=TRUE
)
wide_df
interval_block dur_1 dur_2 dur_3
1 1 30 0 0
2 2 10 10 10
3 3 20 10 0
4 4 30 0 0
<kbd>Online Demo</kbd>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论