将数据框按同一列的多个范围分组行。

huangapple go评论96阅读模式
英文:

group rows data frame by multiple ranges of same column

问题

Here's the code you provided with the translated parts:

  1. # 给定这些数据:
  2. id <- c("1","1", "1","2","2","2","3","3","3","4","4","4","5","5","5","6","6","6")
  3. value <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
  4. value2 <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
  5. value3 <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
  6. df <- data.frame(id, value, value2, value3)
  7. # 我想按多个范围(group1: 1-2 和 5-6; group2:3-4)对行进行分组,并根据 value 进行汇总,以便最终结果如下所示:
  8. newname <- c("newname1", "newname2")
  9. sumvalues <- c("114", "57")
  10. sumvalues2 <- c("114", "57")
  11. sumvalues3 <- c("114", "57")
  12. df2 <- data.frame(newname, sumvalues, sumvalues2, sumvalues3)
  13. # 当新组(newname)有一个范围时,我已经尝试过以下方法,但我无法弄清如何将多个范围集成到一个新组中。
  14. data_values_range <- data_values %>%
  15. # 将值聚合到范围中
  16. mutate(ranges = cut(group, seq(1, 6, 1))) %>%
  17. group_by(ranges) %>%
  18. summarize(sumvalues = sum(value)) %>%
  19. as.data.frame()
  20. data_values_range

Note: I've translated the comments and variable names in the code, but the core code logic remains the same.

英文:

Given this data:

  1. id &lt;- c(&quot;1&quot;,&quot;1&quot;, &quot;1&quot;,&quot;2&quot;,&quot;2&quot;,&quot;2&quot;,&quot;3&quot;,&quot;3&quot;,&quot;3&quot;,&quot;4&quot;,&quot;4&quot;,&quot;4&quot;,&quot;5&quot;,&quot;5&quot;,&quot;5&quot;,&quot;6&quot;,&quot;6&quot;,&quot;6&quot;)
  2. value &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;, &quot;7&quot;, &quot;8&quot;,&quot;9&quot;,&quot;10&quot;,&quot;11&quot;,&quot;12&quot;,&quot;13&quot;,&quot;14&quot;,&quot;15&quot;,&quot;16&quot;,&quot;17&quot;,&quot;18&quot;)
  3. value2 &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;, &quot;7&quot;, &quot;8&quot;,&quot;9&quot;,&quot;10&quot;,&quot;11&quot;,&quot;12&quot;,&quot;13&quot;,&quot;14&quot;,&quot;15&quot;,&quot;16&quot;,&quot;17&quot;,&quot;18&quot;)
  4. value3 &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;, &quot;7&quot;, &quot;8&quot;,&quot;9&quot;,&quot;10&quot;,&quot;11&quot;,&quot;12&quot;,&quot;13&quot;,&quot;14&quot;,&quot;15&quot;,&quot;16&quot;,&quot;17&quot;,&quot;18&quot;)
  5. df &lt;- data.frame(id, value, value2, value3)

I would like to group the rows in two groups by multiple ranges (group1: 1-2 and 5-6; group2:3-4) and summarize by value so that the end result is as follows:

  1. newname &lt;- c(&quot;newname1&quot;, &quot;newname2&quot;)
  2. sumvalues &lt;- c(&quot;114&quot;, &quot;57&quot;)
  3. sumvalues2 &lt;- c(&quot;114&quot;, &quot;57&quot;)
  4. sumvalues3 &lt;- c(&quot;114&quot;, &quot;57&quot;)
  5. df2 &lt;- data.frame(newname, sumvalues, sumvalues2, sumvalues3)

I have tried the following when there is one single range of each new group (newname) but I can't figure out how to integrate several ranges into one new group

  1. data_values_range &lt;- data_values %&gt;% # Aggregate values in range
  2. mutate(ranges = cut(group,
  3. seq(1, 6, 1))) %&gt;%
  4. group_by(ranges) %&gt;%
  5. dplyr::summarize(sumvalues = sum(value)) %&gt;%
  6. as.data.frame()
  7. data_values_range

in the case that there were more than one columns other than id, I would like that the end result shows the sum of the value of those columnes grouped by the new groups

答案1

得分: 1

以下是翻译好的部分:

  1. # 我们可以使用以下代码
  2. library(dplyr) # &gt;= 1.1.0
  3. df %>%
  4. type.convert(as.is = TRUE) %>%
  5. group_by(newname = case_match(id, c(1, 2, 5, 6) ~ &#39;newname1&#39;,
  6. c(3, 4) ~ &#39;newname2&#39;,
  7. .default = &#39;other&#39;)) %>%
  8. select(-id) %>%
  9. reframe(across(where(is.numeric), ~ sum(.x, na.rm = TRUE),
  10. .names = &quot;sum{.col}&quot;))

-output

  1. # 一个 tibble: 2 &#215; 4
  2. newname sumvalue sumvalue2 sumvalue3
  3. &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
  4. 1 newname1 114 114 114
  5. 2 newname2 57 57 57
  1. <details>
  2. <summary>英文:</summary>
  3. We could use

library(dplyr)# >= 1.1.0
df %>%
type.convert(as.is = TRUE) %>%
group_by(newname = case_match(id, c(1,2, 5, 6) ~ 'newname1',
c(3, 4)~ 'newname2',
.default = 'other')) %>%
select(-id) %>%
reframe(across(where(is.numeric), ~ sum(.x, na.rm = TRUE),
.names = "sum{.col}"))

  1. -output

A tibble: 2 × 4

newname sumvalue sumvalue2 sumvalue3
<chr> <int> <int> <int>
1 newname1 114 114 114
2 newname2 57 57 57

  1. </details>
  2. # 答案2
  3. **得分**: 0
  4. 你可以创建一个命名的组列表,然后以长格式获取它们,并将它们与原始的 `df` 连接,以对每个唯一的 `name` 进行求和。
  5. ```R
  6. library(tidyverse)
  7. groups <- list(newname1 = c(1, 2, 5, 6), newname2 = c(3, 4))
  8. enframe(groups, value = "new_value") %>%
  9. unnest(new_value) %>%
  10. inner_join(df, by = c("new_value" = "id"), multiple = "all") %>%
  11. summarise(value = sum(value), .by = name)
  12. # name value
  13. # <chr> <int>
  14. #1 newname1 114
  15. #2 newname2 57

数据

我不确定为什么数据框 df 中的数字存储为字符。使用 type.convert 将其更改为数字。

  1. df <- type.convert(df, as.is = TRUE)
英文:

You may create a named list of groups that you want to create. Get them in long format and join with original df to sum for each unique name.

  1. library(tidyverse)
  2. groups &lt;- list(newname1 = c(1, 2, 5, 6), newname2 = c(3, 4))
  3. enframe(groups, value = &quot;new_value&quot;) %&gt;%
  4. unnest(new_value) %&gt;%
  5. inner_join(df, join_by(new_value == id), multiple = &quot;all&quot;) %&gt;%
  6. summarise(value = sum(value), .by = name)
  7. # name value
  8. # &lt;chr&gt; &lt;int&gt;
  9. #1 newname1 114
  10. #2 newname2 57

data

I am not sure why the numbers are stored as characters in the dataframe df. Using type.convert will change them to numbers.

  1. df &lt;- type.convert(df, as.is = TRUE)

huangapple
  • 本文由 发表于 2023年4月19日 21:58:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055405.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定