使用dplyr根据多个标准对数据进行分类。

huangapple go评论58阅读模式
英文:

Categorize data based on multiple criteria using dplyr

问题

我需要使用一长串的标准对一个非常长的数据框进行分类。以下是标准的简化版本,以数据框的形式呈现:

crit <- data.frame(grp = c("g1", "g1", "g1", "g2", "g2", "g2"),
                   class = c("A", "B", "C", "A", "B", "C"),
                   min = c(1, 3, 5, 8, 10, 12),
                   max = c(3, 5, 8, 10, 12, 14)
                   )

第二个数据框将接收一个包含“class”的列,根据值是否与“grp”相关(过程的第一部分)并且是否落在指定的范围内(min,max)(过程的第二部分)。此外,如果一个值低于范围中的最低值或高于范围中的最高值,它将被归类为属于最低/最高的“class”。例如:

df <- data.frame(grp = c("g1", "g1", "g2", "g2"),
                 val = c(0, 1, 7, 11)
                )

您对如何使用dplyr执行此操作有任何建议吗?非常感谢任何帮助。

英文:

I need to categorize a very long df using a long list of criteria. Here is a simplified version of the criteria as a df:

crit &lt;- data.frame(grp = c(&quot;g1&quot;, &quot;g1&quot;, &quot;g1&quot;, &quot;g2&quot;, &quot;g2&quot;, &quot;g2&quot;),
                   class = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;A&quot;, &quot;B&quot;, &quot;C&quot;),
                   min = c(1, 3, 5, 8, 10, 12),
                   max = c(3, 5, 8, 10, 12, 14)
                   )

A second df would receive a column containing "class" based on whether the value is linked to "grp" (part 1 of the procedure) and falls within the specified ranges (min, max) (part 2 of the procedure). Also, if a value is below the lowest or above the highest value in a range, it will be categorized as belonging to the lowest/highest "class." For example:

grp val class
g1 0 A
g1 1 A
g2 7 A
g2 11 B
df &lt;- data.frame(grp = c(&quot;g1&quot;, &quot;g1&quot;, &quot;g2&quot;, &quot;g2&quot;),
                 val = c(0, 1, 7, 11)
                )

Do you have any suggestions on how to do this using dplyr? Any help is very much appreciated.

答案1

得分: 0

以下是翻译好的代码部分:

第一个选项类似于这样:

df %>%
    left_join(crit, by = "grp", relationship = "many-to-many") %>%
    filter(val >= min & val <= max) %>%
    select(-min, -max)

实际上,它执行了一种交叉连接,然后根据条件筛选匹配的部分。

另一个选项是这样的:

# 按`grp`分组,以便我们只有每个`grp`的一行,并且有一个类别、最小值和最大值的列表
crit <- crit %>%
    mutate(class = list(class), min = list(min), max = list(max), .by = "grp") %>%
    distinct()

df %>%
    left_join(crit, by = "grp") %>%
    mutate(class = pmap(list(val, class, min, max), ~..2[..3 <= ..1 & ..1 <= ..4])) %>%
    select(-min, -max) %>%
    unnest(class)

希望这些帮助!

英文:

One option is something like this:

df %&gt;%
    left_join(crit, by = &quot;grp&quot;, relationship = &quot;many-to-many&quot;) %&gt;%
    filter(val &gt;= min &amp; val &lt;= max) %&gt;%
    select(-min, -max)

Essentially, it peforms a kind-of crossjoin, then filters to find the ones that match the criteria.

Another option is this:

# group everything by `grp`, so we just have one row for each `grp`, and a list of classes, mins and maxes
crit &lt;- crit %&gt;%
    mutate(class = list(class), min = list(min), max = list(max), .by = &quot;grp&quot;) %&gt;%
    distinct()

df %&gt;%
    left_join(crit, by = &quot;grp&quot;) %&gt;%
    mutate(class = pmap(list(val, class, min, max), ~..2[..3 &lt;= ..1 &amp; ..1 &lt;= ..4])) %&gt;% # parallel map
    select(-min, -max) %&gt;%
    unnest(class)

huangapple
  • 本文由 发表于 2023年6月26日 01:11:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76551586.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定