2023年5月21日 21:23:08go评论101阅读模式

英文:

Count the number of types in the groups of data frame using R

问题

我有这样的数据：
```R
data&lt;-data.frame(is.on=c(&quot;FALSE&quot;,&quot;FALSE&quot;,&quot;FALSE&quot;,&quot;TRUE&quot;,&quot;FALSE&quot;,&quot;TRUE&quot;,&quot;FALSE&quot;,&quot;FALSE&quot;,&quot;TRUE&quot;,&quot;TRUE&quot;,&quot;TRUE&quot;,&quot;TRUE&quot;),
                 dur=c(10,20,30,10,10,10,10,20,10,20,30,40),
                 dt=c(10,10,10,10,10,10,10,10,10,10,10,10),
                 block=c(2,2,2,3,4,5,6,6,7,7,7,7),
                 interval_block=c(1,1,1,2,2,2,3,3,3,4,4,4))

现在我想基于block创建summary_data。
summary_data的行数取决于interval_block的类型数。
步骤1：

# 步骤1：找到每个interval_block中block列的类型数的最大值
max_types &lt;- sapply(unique(data$interval_block), function(interval) {
  blocks &lt;- unique(data[data$interval_block == interval, &quot;block&quot;])
  length(blocks)
})
max_num_types &lt;- max(max_types)

对于interval_block=1，有一种类型的block。（2）
对于interval_block=2，有三种类型的block。（3,4和5）
对于interval_block=3，有两种类型的block。（6和7）
对于interval_block=4，有一种类型的block。（7）
因此，在每个interval_block中，block列的类型数的最大值是3。以上是计算这个数字的代码。基于这个数字，我想创建dur_列。所以，在这种情况下，应该有dur_1，dur_2和dur_3。

步骤2：
确定dur_列的值。
对于interval_block=1，有一种类型的block。
我想填充dur_1，并将dur_2和dur_3留为0。
#（block=2在interval_block=1中）=3。因此，我想将dur_1填充为3次10=30。

对于interval_block=2，有三种类型的block。
我想填充dur_1，dur_2和dur_3。
#（block=3在interval_block=2中）=1，
#（block=4在interval_block=2中）=1，
#（block=5在interval_block=2中）=1。
因此，我想将dur_1填充为1次10=10，将dur_2填充为1次10=10，将dur_3填充为1次10=10。

对于interval_block=3，有两种类型的block。
我想填充dur_1，dur_2并将dur_3留为0。
#（block=6在interval_block=3中）=2，
#（block=7在interval_block=3中）=1，
因此，我想将dur_1填充为2次10=20，将dur_2填充为1次10=10，将dur_3留为0。

对于interval_block=4，有一种类型的block。
我想填充dur_1并将dur_2和dur_3留为0。
#（block=7在interval_block=4中）=3。
因此，我想将dur_1填充为3次10=30，将dur_2和dur_3留为0。

我描述了规则很长，但基本上都是关于计算interval_block内类型的数量并乘以10。

我的期望输出应该是这样的：

summary_data&lt;-data.frame(dur_1=c(30,10,20,30),
                     dur_2=c(0,10,10,0),
                     dur_3=c(0,10,10,0),
                     interval_block=c(1,2,3,4))

我不知道如何在R中编写代码。

为了澄清：
第一行：有3个block=2（一种类型）。因为只有一种类型，所以我们只填充dur_1，填充3次10。
第二行：有1个block=3，1个block=4和1个block=5（三种类型）。因为有三种类型，我们将dur_1，dur_2和dur_3分别填充1次10，1次10，1次10。

第三行：
有2个block=6，1个block=7（两种类型）。因为有两种类型，我们将dur_1，dur_2分别填充2次10，1次10。```

英文:

I have a data like this:

data&lt;-data.frame(is.on=c(&quot;FALSE&quot;,&quot;FALSE&quot;,&quot;FALSE&quot;,&quot;TRUE&quot;,&quot;FALSE&quot;,&quot;TRUE&quot;,&quot;FALSE&quot;,&quot;FALSE&quot;,&quot;TRUE&quot;,&quot;TRUE&quot;,&quot;TRUE&quot;,&quot;TRUE&quot;),
                 dur=c(10,20,30,10,10,10,10,20,10,20,30,40),
                 dt=c(10,10,10,10,10,10,10,10,10,10,10,10),
                 block=c(2,2,2,3,4,5,6,6,7,7,7,7),
                 interval_block=c(1,1,1,2,2,2,3,3,3,4,4,4))

Now I want to make summary_data based on block.
The number of rows of summary_data is the number of types of interval_block.
step1:

# Step 1: Find the maximum number of types for block column within each interval_block
max_types &lt;- sapply(unique(data$interval_block), function(interval) {
  blocks &lt;- unique(data[data$interval_block == interval, &quot;block&quot;])
  length(blocks)
})
max_num_types &lt;- max(max_types)

For interval_block=1, there is one type of block. (2)
For interval_block=2, there are three types of block. (3,4 and 5)
For interval_block=3, there are two types of block. (6 and 7)
For interval_block=4, there is one type of block. (7)
So the maximum number of types for block column within each interval_block is 3. And the above is the code to calculate that number. Based on this number, I want to make dur_ columns. So, in this case, There should be dur_1,dur_2 and dur_3.

Step2:
Decide the values of dur_ columns.
For interval_block=1, there is one type of block.
I want to fill dur_1 and leave dur_2 and dur_3 as 0.
#(block=2 within interval_block=1)=3. So, I want to fill dur_1 as 3 times 10=30.

For interval_block=2,there are three types of block.
I want to fill dur_1, dur_2 and dur_3.
#(block=3 within interval_block=2)=1,
#(block=4 within interval_block=2)=1,
#(block=5 within interval_block=2)=1.
So, I want to fill dur_1 as 1 times 10=10, dur_2 as 1 times 10=10 and dur_3 as 1 times 10=10.

For interval_block=3,there are two types of block.
I want to fill dur_1, dur_2 and leave dur_3 as 0.
#(block=6 within interval_block=3)=2,
#(block=7 within interval_block=3)=1,
So, I want to fill dur_1 as 2 times 10=20, dur_2 as 1 times 10=10 and dur_3 as 0.

For interval_block=4,there is one type of block.
I want to fill dur_1 and leave dur_2 and dur_3 as 0.
#(block=7 within interval_block=4)=3.
So, I want to fill dur_1 as 3 times 10=10, dur_2 and dur_3 as 0.

I described the rules quite long, but basically it is all about counting the number of types within interval_block and multiply to 10.
My expected output should look like this:

summary_data&lt;-data.frame(dur_1=c(30,10,20,30),
                     dur_2=c(0,10,10,0),
                     dur_3=c(0,10,10,0),
                     interval_block=c(1,2,3,4))

I don't know how to code in R.

For clarification.
First row: there are 3 block=2 (one type). Sine one type, we fill only dur_1 with 3 times 10.
Second row, there are 1 block=3 , 1 block=4 and 1 block=5 (three types). Since three types, we fill dur_1,dur_2 and dur_3 with 1 times 10, 1 times 10, 1 times 10 respectively.

Third row:
there are 2 block=6 , 1 block=7 (two types). Since two types, we fill dur_1,dur_2 with 2 times 10, 1 times 10 respectively.

答案1

得分: 1

利用 {dplyr} 和 {tidyr}，你可以执行以下操作：

library(dplyr)
library(tidyr)
data |&gt;
  group_by(interval_block) |&gt;
  mutate(ID = row_number(),
         dur = block |&gt; as.factor() |&gt; as.integer(),
         dur = 1 + dur - min(dur),
         dur_names = paste0(&#39;dur_&#39;, dur),
         dur_values = 10 * dur
         ) |&gt;
  group_by(interval_block, dur_names) |&gt;
  summarise(dur_values = sum(dur_values)) |&gt;
  pivot_wider(names_from = dur_names, values_from = dur_values) |&gt;
  mutate(across(everything(), ~ ifelse(is.na(.x), 0, .x))) |&gt;
  select(starts_with(&#39;dur&#39;), interval_block)

# A tibble: 4 x 4
# Groups:   interval_block [4]
  dur_1 dur_2 dur_3 interval_block
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;          &lt;dbl&gt;
1    30     0     0              1
2    10    20    30              2
3    20    20     0              3
4    30     0     0              4

编辑：
另一种略显奇特的基本 R 选择：

data |&gt;
  split(data$interval_block) |&gt;
  Map(f = \(x) {
    max_blocks = with(data,  max(table(interval_block, block)))
    dur &lt;- table(x$block)
    `[&lt;-`(integer(max_blocks), seq_along(dur), 10 * dur)
  }) |&gt;
  Reduce(f = rbind) |&gt;
  cbind(unique(data$interval_block)) |&gt;
  as.data.frame(row.names = FALSE) |&gt;
  setNames(nm = c(paste0(&#39;dur_&#39;, 1:3), &#39;interval block&#39;))

'[<-' 用于零填充，参见这里。

英文:

Taking advantage of {dplyr} and {tidyr}, you could do the following:

library(dplyr)
library(tidyr)
data |&gt;
  group_by(interval_block) |&gt;
  mutate(ID = row_number(),
         dur = block |&gt; as.factor() |&gt; as.integer(),
         dur = 1 + dur - min(dur),
         dur_names = paste0(&#39;dur_&#39;, dur),
         dur_values = 10 * dur
         ) |&gt;
  group_by(interval_block, dur_names) |&gt;
  summarise(dur_values = sum(dur_values)) |&gt;
  pivot_wider(names_from = dur_names, values_from = dur_values) |&gt;
  mutate(across(everything(), ~ ifelse(is.na(.x), 0, .x))) |&gt;
  select(starts_with(&#39;dur&#39;), interval_block)

# A tibble: 4 x 4
# Groups:   interval_block [4]
  dur_1 dur_2 dur_3 interval_block
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;          &lt;dbl&gt;
1    30     0     0              1
2    10    20    30              2
3    20    20     0              3
4    30     0     0              4

Edit:
a slightly esoteric alternative with base R:

data |&gt;
  split(data$interval_block) |&gt;
  Map(f = \(x) {
    max_blocks = with(data,  max(table(interval_block, block)))
    dur &lt;- table(x$block)
    `[&lt;-`(integer(max_blocks), seq_along(dur), 10 * dur)
  }) |&gt;
  Reduce(f = rbind) |&gt;
  cbind(unique(data$interval_block)) |&gt;
  as.data.frame(row.names = FALSE) |&gt;
  setNames(nm = c(paste0(&#39;dur_&#39;, 1:3), &#39;interval block&#39;))

'[<-' for zero-padding taken from here

答案2

得分: 1

利用 base R，首先通过计算独特的组块计数，然后对数据进行聚合并重新塑造成最终格式并进行清理：

# 添加独特块组编号的列
data <- within(
    data, {
        dur_num <- ave(
            block,
            interval_block, 
            FUN=function(x) as.integer(factor(x))
        )
    }
) 
# 按独特块在时间间隔块内聚合
agg_df <- aggregate(
    dt ~ dur_num + interval_block,
    data,
    FUN = sum
)
# 重新塑造数据为宽格式
wide_df <- reshape(
    agg_df,
    idvar = "interval_block",
    timevar = "dur_num",
    v.names = "dt",
    direction = "wide",
    sep = "_"
)
# 清理数据
wide_df[is.na(wide_df)] = 0
row.names(wide_df) <- 1:nrow(wide_df)
colnames(wide_df) <- gsub(
    "dt_", "dur_", colnames(wide_df), fixed=TRUE
)
wide_df
  interval_block dur_1 dur_2 dur_3
1              1    30     0     0
2              2    10    10    10
3              3    20    10     0
4              4    30     0     0

在线演示

英文:

Take advantage of base R by first calculating a unique group block count then aggregate the data and reshape it to final format with cleanup:

# ADD COLUMN FOR UNIQUE BLOCK GROUP NUM
data &lt;- within(
    data, {
        dur_num &lt;- ave(
            block,
            interval_block, 
            FUN=\(x) as.integer(factor(x))
        )
    }
) 
# AGGREGATE BY UNIQUE BLOCKS WITHIN INTERVAL BLOCK
agg_df &lt;- aggregate(
    dt ~ dur_num + interval_block,
    data,
    FUN = sum
)
# RESHAPE WIDE
wide_df &lt;- reshape(
    agg_df,
    idvar = &quot;interval_block&quot;,
    timevar = &quot;dur_num&quot;,
    v.names = &quot;dt&quot;,
    direction = &quot;wide&quot;,
    sep = &quot;_&quot;
)
# CLEAN UP
wide_df[is.na(wide_df)] = 0
row.names(wide_df) &lt;- 1:nrow(wide_df)
colnames(wide_df) &lt;- gsub(
    &quot;dt_&quot;, &quot;dur_&quot;, colnames(wide_df), fixed=TRUE
)
wide_df
  interval_block dur_1 dur_2 dur_3
1              1    30     0     0
2              2    10    10    10
3              3    20    10     0
4              4    30     0     0

<kbd>Online Demo</kbd>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Count the number of types in the groups of data frame using R.

问题

答案1

答案2

无法在R的reticulate中加载pandas，因为缺少GLIBCXX_3.4.29。

计算一组列的按行加权和

根据另一列的行值有条件地返回一向量的某些行值。

Pandas数据框内插值使用常数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。