英文:
Categorize data based on multiple criteria using dplyr
问题
我需要使用一长串的标准对一个非常长的数据框进行分类。以下是标准的简化版本,以数据框的形式呈现:
crit <- data.frame(grp = c("g1", "g1", "g1", "g2", "g2", "g2"),
class = c("A", "B", "C", "A", "B", "C"),
min = c(1, 3, 5, 8, 10, 12),
max = c(3, 5, 8, 10, 12, 14)
)
第二个数据框将接收一个包含“class”的列,根据值是否与“grp”相关(过程的第一部分)并且是否落在指定的范围内(min,max)(过程的第二部分)。此外,如果一个值低于范围中的最低值或高于范围中的最高值,它将被归类为属于最低/最高的“class”。例如:
df <- data.frame(grp = c("g1", "g1", "g2", "g2"),
val = c(0, 1, 7, 11)
)
您对如何使用dplyr执行此操作有任何建议吗?非常感谢任何帮助。
英文:
I need to categorize a very long df using a long list of criteria. Here is a simplified version of the criteria as a df:
crit <- data.frame(grp = c("g1", "g1", "g1", "g2", "g2", "g2"),
class = c("A", "B", "C", "A", "B", "C"),
min = c(1, 3, 5, 8, 10, 12),
max = c(3, 5, 8, 10, 12, 14)
)
A second df would receive a column containing "class" based on whether the value is linked to "grp" (part 1 of the procedure) and falls within the specified ranges (min, max) (part 2 of the procedure). Also, if a value is below the lowest or above the highest value in a range, it will be categorized as belonging to the lowest/highest "class." For example:
grp | val | class |
---|---|---|
g1 | 0 | A |
g1 | 1 | A |
g2 | 7 | A |
g2 | 11 | B |
df <- data.frame(grp = c("g1", "g1", "g2", "g2"),
val = c(0, 1, 7, 11)
)
Do you have any suggestions on how to do this using dplyr? Any help is very much appreciated.
答案1
得分: 0
以下是翻译好的代码部分:
第一个选项类似于这样:
df %>%
left_join(crit, by = "grp", relationship = "many-to-many") %>%
filter(val >= min & val <= max) %>%
select(-min, -max)
实际上,它执行了一种交叉连接,然后根据条件筛选匹配的部分。
另一个选项是这样的:
# 按`grp`分组,以便我们只有每个`grp`的一行,并且有一个类别、最小值和最大值的列表
crit <- crit %>%
mutate(class = list(class), min = list(min), max = list(max), .by = "grp") %>%
distinct()
df %>%
left_join(crit, by = "grp") %>%
mutate(class = pmap(list(val, class, min, max), ~..2[..3 <= ..1 & ..1 <= ..4])) %>%
select(-min, -max) %>%
unnest(class)
希望这些帮助!
英文:
One option is something like this:
df %>%
left_join(crit, by = "grp", relationship = "many-to-many") %>%
filter(val >= min & val <= max) %>%
select(-min, -max)
Essentially, it peforms a kind-of crossjoin, then filters to find the ones that match the criteria.
Another option is this:
# group everything by `grp`, so we just have one row for each `grp`, and a list of classes, mins and maxes
crit <- crit %>%
mutate(class = list(class), min = list(min), max = list(max), .by = "grp") %>%
distinct()
df %>%
left_join(crit, by = "grp") %>%
mutate(class = pmap(list(val, class, min, max), ~..2[..3 <= ..1 & ..1 <= ..4])) %>% # parallel map
select(-min, -max) %>%
unnest(class)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论