2023年3月7日 21:38:04go评论83阅读模式

英文:

Values falling into different ranges in R

问题

我有一个名为grd的网格，其中包含不同的范围，如下所示：

> grd
   count  treshold
1   1      0.01
2   2      0.02
3   3      0.05
4   4      0.10
5   5      0.20

还有一个名为df的数据框，内容如下：

> df
    param   name
1   0.124   Tim
2   0.011   John
3   0.002   Alex
4   0.023   Jessica
5   0.056   Rose

我想要使用grd$treshold将另一列添加到数据框中，命名为df$bucket，用于报告df$param中的值属于哪个范围。

例如，param的第一个值0.124大于阈值0.10，因此它将属于计数5。第二个值0.011在0.01和0.02之间，因此它将属于计数2，以此类推。

这是最终的结果：

> df
    param   name      bucket
1   0.124   Tim         5
2   0.011   John        2
3   0.002   Alex        1
4   0.023   Jessica     3
5   0.056   Rose        4

英文:

I have a grid grd of different ranges like this one:

&gt; grd
   count  treshold
1   1      0.01
2   2      0.02
3   3      0.05
4   4      0.10
5   5      0.20

and a dataframe df like this one:

&gt; df
    param   name
1   0.124   Tim
2   0.011   John
3   0.002   Alex
4   0.023   Jessica
5   0.056   Rose

I would like to use grd$treshold to add another column to the dataframe, df$bucket reporting which range the values in df$param fall into.

For instance the first value of param, 0.124, is higher than treshold, 0.10, then it will fall in count 5. The second one, 0.011, is between 0.01 and 0.02, then it will fall into count 2, and so on.

This is the final result:

&gt; df
        param   name      bucket
    1   0.124   Tim         5
    2   0.011   John        2
    3   0.002   Alex        1
    4   0.023   Jessica     3
    5   0.056   Rose        4

答案1

得分: 2

使用findInterval()的基本解决方案：

df$bucket <- findInterval(df$param, grd$treshold) + 1
df$bucket
# [1] 5 2 1 3 4

您还可以使用dplyr的滚动连接（rolling join）：

library(dplyr)
df %>%
  left_join(grd, by = join_by(closest(param < treshold))) %>%
  select(-treshold)
#   param    name count
# 1 0.124     Tim     5
# 2 0.011    John     2
# 3 0.002    Alex     1
# 4 0.023 Jessica     3
# 5 0.056    Rose     4

数据

grd <- read.table(text = "
count  treshold
1   1      0.01
2   2      0.02
3   3      0.05
4   4      0.10
5   5      0.20")
df <- read.table(text = "
param   name
1   0.124   Tim
2   0.011   John
3   0.002   Alex
4   0.023   Jessica
5   0.056   Rose")

英文:

A base solution with findInterval():

df$bucket &lt;- findInterval(df$param, grd$treshold) + 1
df$bucket
# [1] 5 2 1 3 4

You can also use a rolling join with dplyr:

library(dplyr)
df %&gt;%
  left_join(grd, by = join_by(closest(param &lt; treshold))) %&gt;%
  select(-treshold)
#   param    name count
# 1 0.124     Tim     5
# 2 0.011    John     2
# 3 0.002    Alex     1
# 4 0.023 Jessica     3
# 5 0.056    Rose     4

Data

grd &lt;- read.table(text = &quot;
count  treshold
1   1      0.01
2   2      0.02
3   3      0.05
4   4      0.10
5   5      0.20&quot;)
df &lt;- read.table(text = &quot;
param   name
1   0.124   Tim
2   0.011   John
3   0.002   Alex
4   0.023   Jessica
5   0.056   Rose&quot;)

答案2

得分: 0

以下是使用dplyr的可能解决方案：

library(dplyr)
df <- df %>%
  mutate(
    bucket = case_when(
      param <= 0.01 ~ 1,
      param <= 0.02 ~ 2,
      param <= 0.05 ~ 3,
      param <= 0.10 ~ 4,
      param <= 0.20 ~ 5
    )
  )

据我理解，你在问题中分享的最终结果是不正确的（第2行）。如果我理解错了，你可以轻松地调整case_when()中的阈值参数。

英文:

Here is a possible solution using dplyr

library(dplyr)
df &lt;- df |&gt; 
  mutate(
    bucket = case_when(
      param &lt;= 0.01 ~ 1,
      param &lt;= 0.02 ~ 2,
      param &lt;= 0.05 ~ 3,
      param &lt;= 0.10 ~ 4,
      param &lt;= 0.20 ~ 5
    )
  )

As far as I understood you question, the final result you shared in your question is not correct (row 2). If I misunderstood you can easily adjust the threshold parameters in case_when()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中不同范围内的数值

问题

答案1

数据

Data

答案2

你可以使用以下方法在基本R中找到对象中给定值的索引：

Improving the Speed of Pairwise Calculations

Subsetting a reactive dataframe in R shiny based on TRUE/FALSE values in one column.

DataFrame列表的列表会覆盖先前的值 (pandas, python)

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。