将数据框按同一列的多个范围分组行。

huangapple go评论71阅读模式
英文:

group rows data frame by multiple ranges of same column

问题

Here's the code you provided with the translated parts:

# 给定这些数据:
id <- c("1","1", "1","2","2","2","3","3","3","4","4","4","5","5","5","6","6","6")
value <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
value2 <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
value3 <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
df <- data.frame(id, value, value2, value3)

# 我想按多个范围(group1: 1-2 和 5-6; group2:3-4)对行进行分组,并根据 value 进行汇总,以便最终结果如下所示:
newname <- c("newname1", "newname2")
sumvalues <- c("114", "57")
sumvalues2 <- c("114", "57")
sumvalues3 <- c("114", "57")
df2 <- data.frame(newname, sumvalues, sumvalues2, sumvalues3)

# 当新组(newname)有一个范围时,我已经尝试过以下方法,但我无法弄清如何将多个范围集成到一个新组中。
data_values_range <- data_values %>%
  # 将值聚合到范围中
  mutate(ranges = cut(group, seq(1, 6, 1))) %>%
  group_by(ranges) %>%
  summarize(sumvalues = sum(value)) %>%
  as.data.frame()

data_values_range

Note: I've translated the comments and variable names in the code, but the core code logic remains the same.

英文:

Given this data:

id &lt;- c(&quot;1&quot;,&quot;1&quot;, &quot;1&quot;,&quot;2&quot;,&quot;2&quot;,&quot;2&quot;,&quot;3&quot;,&quot;3&quot;,&quot;3&quot;,&quot;4&quot;,&quot;4&quot;,&quot;4&quot;,&quot;5&quot;,&quot;5&quot;,&quot;5&quot;,&quot;6&quot;,&quot;6&quot;,&quot;6&quot;)
value &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;, &quot;7&quot;, &quot;8&quot;,&quot;9&quot;,&quot;10&quot;,&quot;11&quot;,&quot;12&quot;,&quot;13&quot;,&quot;14&quot;,&quot;15&quot;,&quot;16&quot;,&quot;17&quot;,&quot;18&quot;)
value2 &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;, &quot;7&quot;, &quot;8&quot;,&quot;9&quot;,&quot;10&quot;,&quot;11&quot;,&quot;12&quot;,&quot;13&quot;,&quot;14&quot;,&quot;15&quot;,&quot;16&quot;,&quot;17&quot;,&quot;18&quot;)
value3 &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;, &quot;7&quot;, &quot;8&quot;,&quot;9&quot;,&quot;10&quot;,&quot;11&quot;,&quot;12&quot;,&quot;13&quot;,&quot;14&quot;,&quot;15&quot;,&quot;16&quot;,&quot;17&quot;,&quot;18&quot;)
df &lt;- data.frame(id, value, value2, value3)

I would like to group the rows in two groups by multiple ranges (group1: 1-2 and 5-6; group2:3-4) and summarize by value so that the end result is as follows:

newname &lt;- c(&quot;newname1&quot;, &quot;newname2&quot;)
sumvalues &lt;- c(&quot;114&quot;, &quot;57&quot;)
sumvalues2 &lt;- c(&quot;114&quot;, &quot;57&quot;)
sumvalues3 &lt;- c(&quot;114&quot;, &quot;57&quot;)
df2 &lt;- data.frame(newname, sumvalues, sumvalues2, sumvalues3)

I have tried the following when there is one single range of each new group (newname) but I can't figure out how to integrate several ranges into one new group

data_values_range &lt;- data_values %&gt;%                        # Aggregate values in range
  mutate(ranges = cut(group,
                      seq(1, 6, 1))) %&gt;% 
  group_by(ranges) %&gt;% 
  dplyr::summarize(sumvalues = sum(value)) %&gt;% 
  as.data.frame()
data_values_range 

in the case that there were more than one columns other than id, I would like that the end result shows the sum of the value of those columnes grouped by the new groups

答案1

得分: 1

以下是翻译好的部分:

# 我们可以使用以下代码
library(dplyr) # &gt;= 1.1.0
df %>%
  type.convert(as.is = TRUE) %>%
  group_by(newname = case_match(id, c(1, 2, 5, 6) ~ &#39;newname1&#39;,
    c(3, 4) ~ &#39;newname2&#39;,
    .default = &#39;other&#39;)) %>%
  select(-id) %>%
  reframe(across(where(is.numeric), ~ sum(.x, na.rm = TRUE),
    .names = &quot;sum{.col}&quot;))

-output

# 一个 tibble: 2 &#215; 4
  newname  sumvalue sumvalue2 sumvalue3
  &lt;chr&gt;       &lt;int&gt;     &lt;int&gt;     &lt;int&gt;
1 newname1      114       114       114
2 newname2       57        57        57

<details>
<summary>英文:</summary>

We could use

library(dplyr)# >= 1.1.0
df %>%
type.convert(as.is = TRUE) %>%
group_by(newname = case_match(id, c(1,2, 5, 6) ~ 'newname1',
c(3, 4)~ 'newname2',
.default = 'other')) %>%
select(-id) %>%
reframe(across(where(is.numeric), ~ sum(.x, na.rm = TRUE),
.names = "sum{.col}"))

-output

A tibble: 2 × 4

newname sumvalue sumvalue2 sumvalue3
<chr> <int> <int> <int>
1 newname1 114 114 114
2 newname2 57 57 57


</details>



# 答案2
**得分**: 0

你可以创建一个命名的组列表,然后以长格式获取它们,并将它们与原始的 `df` 连接,以对每个唯一的 `name` 进行求和。

```R
library(tidyverse)

groups <- list(newname1 = c(1, 2, 5, 6), newname2 = c(3, 4))

enframe(groups, value = "new_value") %>%
  unnest(new_value) %>%
  inner_join(df, by = c("new_value" = "id"), multiple = "all") %>%
  summarise(value = sum(value), .by = name)

#   name     value
#  <chr>    <int>
#1 newname1   114
#2 newname2    57

数据

我不确定为什么数据框 df 中的数字存储为字符。使用 type.convert 将其更改为数字。

df <- type.convert(df, as.is = TRUE)
英文:

You may create a named list of groups that you want to create. Get them in long format and join with original df to sum for each unique name.

library(tidyverse)

groups &lt;- list(newname1 = c(1, 2, 5, 6), newname2 = c(3, 4))

enframe(groups, value = &quot;new_value&quot;) %&gt;%
  unnest(new_value) %&gt;%
  inner_join(df, join_by(new_value == id), multiple = &quot;all&quot;)  %&gt;%
  summarise(value = sum(value), .by = name)

#   name     value
#  &lt;chr&gt;    &lt;int&gt;
#1 newname1   114
#2 newname2    57

data

I am not sure why the numbers are stored as characters in the dataframe df. Using type.convert will change them to numbers.

df &lt;- type.convert(df, as.is = TRUE)

huangapple
  • 本文由 发表于 2023年4月19日 21:58:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055405.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定