英文:
group rows data frame by multiple ranges of same column
问题
Here's the code you provided with the translated parts:
# 给定这些数据:
id <- c("1","1", "1","2","2","2","3","3","3","4","4","4","5","5","5","6","6","6")
value <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
value2 <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
value3 <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
df <- data.frame(id, value, value2, value3)
# 我想按多个范围(group1: 1-2 和 5-6; group2:3-4)对行进行分组,并根据 value 进行汇总,以便最终结果如下所示:
newname <- c("newname1", "newname2")
sumvalues <- c("114", "57")
sumvalues2 <- c("114", "57")
sumvalues3 <- c("114", "57")
df2 <- data.frame(newname, sumvalues, sumvalues2, sumvalues3)
# 当新组(newname)有一个范围时,我已经尝试过以下方法,但我无法弄清如何将多个范围集成到一个新组中。
data_values_range <- data_values %>%
# 将值聚合到范围中
mutate(ranges = cut(group, seq(1, 6, 1))) %>%
group_by(ranges) %>%
summarize(sumvalues = sum(value)) %>%
as.data.frame()
data_values_range
Note: I've translated the comments and variable names in the code, but the core code logic remains the same.
英文:
Given this data:
id <- c("1","1", "1","2","2","2","3","3","3","4","4","4","5","5","5","6","6","6")
value <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
value2 <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
value3 <- c("1", "2", "3", "4", "5", "6", "7", "8","9","10","11","12","13","14","15","16","17","18")
df <- data.frame(id, value, value2, value3)
I would like to group the rows in two groups by multiple ranges (group1: 1-2 and 5-6; group2:3-4) and summarize by value so that the end result is as follows:
newname <- c("newname1", "newname2")
sumvalues <- c("114", "57")
sumvalues2 <- c("114", "57")
sumvalues3 <- c("114", "57")
df2 <- data.frame(newname, sumvalues, sumvalues2, sumvalues3)
I have tried the following when there is one single range of each new group (newname) but I can't figure out how to integrate several ranges into one new group
data_values_range <- data_values %>% # Aggregate values in range
mutate(ranges = cut(group,
seq(1, 6, 1))) %>%
group_by(ranges) %>%
dplyr::summarize(sumvalues = sum(value)) %>%
as.data.frame()
data_values_range
in the case that there were more than one columns other than id, I would like that the end result shows the sum of the value of those columnes grouped by the new groups
答案1
得分: 1
以下是翻译好的部分:
# 我们可以使用以下代码
library(dplyr) # >= 1.1.0
df %>%
type.convert(as.is = TRUE) %>%
group_by(newname = case_match(id, c(1, 2, 5, 6) ~ 'newname1',
c(3, 4) ~ 'newname2',
.default = 'other')) %>%
select(-id) %>%
reframe(across(where(is.numeric), ~ sum(.x, na.rm = TRUE),
.names = "sum{.col}"))
-output
# 一个 tibble: 2 × 4
newname sumvalue sumvalue2 sumvalue3
<chr> <int> <int> <int>
1 newname1 114 114 114
2 newname2 57 57 57
<details>
<summary>英文:</summary>
We could use
library(dplyr)# >= 1.1.0
df %>%
type.convert(as.is = TRUE) %>%
group_by(newname = case_match(id, c(1,2, 5, 6) ~ 'newname1',
c(3, 4)~ 'newname2',
.default = 'other')) %>%
select(-id) %>%
reframe(across(where(is.numeric), ~ sum(.x, na.rm = TRUE),
.names = "sum{.col}"))
-output
A tibble: 2 × 4
newname sumvalue sumvalue2 sumvalue3
<chr> <int> <int> <int>
1 newname1 114 114 114
2 newname2 57 57 57
</details>
# 答案2
**得分**: 0
你可以创建一个命名的组列表,然后以长格式获取它们,并将它们与原始的 `df` 连接,以对每个唯一的 `name` 进行求和。
```R
library(tidyverse)
groups <- list(newname1 = c(1, 2, 5, 6), newname2 = c(3, 4))
enframe(groups, value = "new_value") %>%
unnest(new_value) %>%
inner_join(df, by = c("new_value" = "id"), multiple = "all") %>%
summarise(value = sum(value), .by = name)
# name value
# <chr> <int>
#1 newname1 114
#2 newname2 57
数据
我不确定为什么数据框 df
中的数字存储为字符。使用 type.convert
将其更改为数字。
df <- type.convert(df, as.is = TRUE)
英文:
You may create a named list of groups that you want to create. Get them in long format and join with original df
to sum
for each unique name
.
library(tidyverse)
groups <- list(newname1 = c(1, 2, 5, 6), newname2 = c(3, 4))
enframe(groups, value = "new_value") %>%
unnest(new_value) %>%
inner_join(df, join_by(new_value == id), multiple = "all") %>%
summarise(value = sum(value), .by = name)
# name value
# <chr> <int>
#1 newname1 114
#2 newname2 57
data
I am not sure why the numbers are stored as characters in the dataframe df
. Using type.convert
will change them to numbers.
df <- type.convert(df, as.is = TRUE)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论