2023年6月26日 01:11:25go评论77阅读模式

英文:

Categorize data based on multiple criteria using dplyr

问题

我需要使用一长串的标准对一个非常长的数据框进行分类。以下是标准的简化版本，以数据框的形式呈现：

crit <- data.frame(grp = c("g1", "g1", "g1", "g2", "g2", "g2"),
                   class = c("A", "B", "C", "A", "B", "C"),
                   min = c(1, 3, 5, 8, 10, 12),
                   max = c(3, 5, 8, 10, 12, 14)
                   )

第二个数据框将接收一个包含“class”的列，根据值是否与“grp”相关（过程的第一部分）并且是否落在指定的范围内（min，max）（过程的第二部分）。此外，如果一个值低于范围中的最低值或高于范围中的最高值，它将被归类为属于最低/最高的“class”。例如：

df <- data.frame(grp = c("g1", "g1", "g2", "g2"),
                 val = c(0, 1, 7, 11)
                )

您对如何使用dplyr执行此操作有任何建议吗？非常感谢任何帮助。

英文:

I need to categorize a very long df using a long list of criteria. Here is a simplified version of the criteria as a df:

crit &lt;- data.frame(grp = c(&quot;g1&quot;, &quot;g1&quot;, &quot;g1&quot;, &quot;g2&quot;, &quot;g2&quot;, &quot;g2&quot;),
                   class = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;A&quot;, &quot;B&quot;, &quot;C&quot;),
                   min = c(1, 3, 5, 8, 10, 12),
                   max = c(3, 5, 8, 10, 12, 14)
                   )

A second df would receive a column containing "class" based on whether the value is linked to "grp" (part 1 of the procedure) and falls within the specified ranges (min, max) (part 2 of the procedure). Also, if a value is below the lowest or above the highest value in a range, it will be categorized as belonging to the lowest/highest "class." For example:

grp	val	class
g1	0	A
g1	1	A
g2	7	A
g2	11	B

df &lt;- data.frame(grp = c(&quot;g1&quot;, &quot;g1&quot;, &quot;g2&quot;, &quot;g2&quot;),
                 val = c(0, 1, 7, 11)
                )

Do you have any suggestions on how to do this using dplyr? Any help is very much appreciated.

答案1

得分: 0

以下是翻译好的代码部分：

第一个选项类似于这样：

df %>%
    left_join(crit, by = "grp", relationship = "many-to-many") %>%
    filter(val >= min & val <= max) %>%
    select(-min, -max)

实际上，它执行了一种交叉连接，然后根据条件筛选匹配的部分。

另一个选项是这样的：

# 按`grp`分组，以便我们只有每个`grp`的一行，并且有一个类别、最小值和最大值的列表
crit <- crit %>%
    mutate(class = list(class), min = list(min), max = list(max), .by = "grp") %>%
    distinct()
df %>%
    left_join(crit, by = "grp") %>%
    mutate(class = pmap(list(val, class, min, max), ~..2[..3 <= ..1 & ..1 <= ..4])) %>%
    select(-min, -max) %>%
    unnest(class)

希望这些帮助！

英文:

One option is something like this:

df %&gt;%
    left_join(crit, by = &quot;grp&quot;, relationship = &quot;many-to-many&quot;) %&gt;%
    filter(val &gt;= min &amp; val &lt;= max) %&gt;%
    select(-min, -max)

Essentially, it peforms a kind-of crossjoin, then filters to find the ones that match the criteria.

Another option is this:

# group everything by `grp`, so we just have one row for each `grp`, and a list of classes, mins and maxes
crit &lt;- crit %&gt;%
    mutate(class = list(class), min = list(min), max = list(max), .by = &quot;grp&quot;) %&gt;%
    distinct()
df %&gt;%
    left_join(crit, by = &quot;grp&quot;) %&gt;%
    mutate(class = pmap(list(val, class, min, max), ~..2[..3 &lt;= ..1 &amp; ..1 &lt;= ..4])) %&gt;% # parallel map
    select(-min, -max) %&gt;%
    unnest(class)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用dplyr根据多个标准对数据进行分类。

问题

答案1

如何使用strsplit基于行名称筛选数据框。

过滤最接近目标值的数字并消除重复观察。

你可以在Dyplr的`rename_with()`函数的`.cols`参数中指定tibble的最后一列吗？

将多列文本拆分成不同列的R代码示例：

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。