在R中不同范围内的数值

huangapple go评论53阅读模式
英文:

Values falling into different ranges in R

问题

我有一个名为grd的网格,其中包含不同的范围,如下所示:

> grd
   count  treshold
1   1      0.01
2   2      0.02
3   3      0.05
4   4      0.10
5   5      0.20

还有一个名为df的数据框,内容如下:

> df
    param   name
1   0.124   Tim
2   0.011   John
3   0.002   Alex
4   0.023   Jessica
5   0.056   Rose

我想要使用grd$treshold将另一列添加到数据框中,命名为df$bucket,用于报告df$param中的值属于哪个范围。

例如,param的第一个值0.124大于阈值0.10,因此它将属于计数5。第二个值0.011在0.01和0.02之间,因此它将属于计数2,以此类推。

这是最终的结果:

> df
    param   name      bucket
1   0.124   Tim         5
2   0.011   John        2
3   0.002   Alex        1
4   0.023   Jessica     3
5   0.056   Rose        4
英文:

I have a grid grd of different ranges like this one:

> grd
   count  treshold
1   1      0.01
2   2      0.02
3   3      0.05
4   4      0.10
5   5      0.20

and a dataframe df like this one:

> df
    param   name
1   0.124   Tim
2   0.011   John
3   0.002   Alex
4   0.023   Jessica
5   0.056   Rose

I would like to use grd$treshold to add another column to the dataframe, df$bucket reporting which range the values in df$param fall into.

For instance the first value of param, 0.124, is higher than treshold, 0.10, then it will fall in count 5. The second one, 0.011, is between 0.01 and 0.02, then it will fall into count 2, and so on.

This is the final result:

> df
        param   name      bucket
    1   0.124   Tim         5
    2   0.011   John        2
    3   0.002   Alex        1
    4   0.023   Jessica     3
    5   0.056   Rose        4

答案1

得分: 2

使用findInterval()的基本解决方案:

df$bucket <- findInterval(df$param, grd$treshold) + 1

df$bucket
# [1] 5 2 1 3 4

您还可以使用dplyr的滚动连接(rolling join):

library(dplyr)

df %>%
  left_join(grd, by = join_by(closest(param < treshold))) %>%
  select(-treshold)

#   param    name count
# 1 0.124     Tim     5
# 2 0.011    John     2
# 3 0.002    Alex     1
# 4 0.023 Jessica     3
# 5 0.056    Rose     4

数据

grd <- read.table(text = "
count  treshold
1   1      0.01
2   2      0.02
3   3      0.05
4   4      0.10
5   5      0.20")

df <- read.table(text = "
param   name
1   0.124   Tim
2   0.011   John
3   0.002   Alex
4   0.023   Jessica
5   0.056   Rose")
英文:

A base solution with findInterval():

df$bucket &lt;- findInterval(df$param, grd$treshold) + 1

df$bucket
# [1] 5 2 1 3 4

You can also use a rolling join with dplyr:

library(dplyr)

df %&gt;%
  left_join(grd, by = join_by(closest(param &lt; treshold))) %&gt;%
  select(-treshold)

#   param    name count
# 1 0.124     Tim     5
# 2 0.011    John     2
# 3 0.002    Alex     1
# 4 0.023 Jessica     3
# 5 0.056    Rose     4

Data

grd &lt;- read.table(text = &quot;
count  treshold
1   1      0.01
2   2      0.02
3   3      0.05
4   4      0.10
5   5      0.20&quot;)

df &lt;- read.table(text = &quot;
param   name
1   0.124   Tim
2   0.011   John
3   0.002   Alex
4   0.023   Jessica
5   0.056   Rose&quot;)

答案2

得分: 0

以下是使用dplyr的可能解决方案:

library(dplyr)
df <- df %>%
  mutate(
    bucket = case_when(
      param <= 0.01 ~ 1,
      param <= 0.02 ~ 2,
      param <= 0.05 ~ 3,
      param <= 0.10 ~ 4,
      param <= 0.20 ~ 5
    )
  )

据我理解,你在问题中分享的最终结果是不正确的(第2行)。如果我理解错了,你可以轻松地调整case_when()中的阈值参数。

英文:

Here is a possible solution using dplyr

library(dplyr)
df &lt;- df |&gt; 
  mutate(
    bucket = case_when(
      param &lt;= 0.01 ~ 1,
      param &lt;= 0.02 ~ 2,
      param &lt;= 0.05 ~ 3,
      param &lt;= 0.10 ~ 4,
      param &lt;= 0.20 ~ 5
    )
  )

As far as I understood you question, the final result you shared in your question is not correct (row 2). If I misunderstood you can easily adjust the threshold parameters in case_when()

huangapple
  • 本文由 发表于 2023年3月7日 21:38:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75662703.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定