英文:
Values falling into different ranges in R
问题
我有一个名为grd
的网格,其中包含不同的范围,如下所示:
> grd
count treshold
1 1 0.01
2 2 0.02
3 3 0.05
4 4 0.10
5 5 0.20
还有一个名为df
的数据框,内容如下:
> df
param name
1 0.124 Tim
2 0.011 John
3 0.002 Alex
4 0.023 Jessica
5 0.056 Rose
我想要使用grd$treshold
将另一列添加到数据框中,命名为df$bucket
,用于报告df$param
中的值属于哪个范围。
例如,param的第一个值0.124大于阈值0.10,因此它将属于计数5。第二个值0.011在0.01和0.02之间,因此它将属于计数2,以此类推。
这是最终的结果:
> df
param name bucket
1 0.124 Tim 5
2 0.011 John 2
3 0.002 Alex 1
4 0.023 Jessica 3
5 0.056 Rose 4
英文:
I have a grid grd
of different ranges like this one:
> grd
count treshold
1 1 0.01
2 2 0.02
3 3 0.05
4 4 0.10
5 5 0.20
and a dataframe df
like this one:
> df
param name
1 0.124 Tim
2 0.011 John
3 0.002 Alex
4 0.023 Jessica
5 0.056 Rose
I would like to use grd$treshold
to add another column to the dataframe, df$bucket
reporting which range the values in df$param
fall into.
For instance the first value of param
, 0.124, is higher than treshold, 0.10, then it will fall in count 5. The second one, 0.011, is between 0.01 and 0.02, then it will fall into count 2, and so on.
This is the final result:
> df
param name bucket
1 0.124 Tim 5
2 0.011 John 2
3 0.002 Alex 1
4 0.023 Jessica 3
5 0.056 Rose 4
答案1
得分: 2
使用findInterval()
的基本解决方案:
df$bucket <- findInterval(df$param, grd$treshold) + 1
df$bucket
# [1] 5 2 1 3 4
您还可以使用dplyr
的滚动连接(rolling join):
library(dplyr)
df %>%
left_join(grd, by = join_by(closest(param < treshold))) %>%
select(-treshold)
# param name count
# 1 0.124 Tim 5
# 2 0.011 John 2
# 3 0.002 Alex 1
# 4 0.023 Jessica 3
# 5 0.056 Rose 4
数据
grd <- read.table(text = "
count treshold
1 1 0.01
2 2 0.02
3 3 0.05
4 4 0.10
5 5 0.20")
df <- read.table(text = "
param name
1 0.124 Tim
2 0.011 John
3 0.002 Alex
4 0.023 Jessica
5 0.056 Rose")
英文:
A base
solution with findInterval()
:
df$bucket <- findInterval(df$param, grd$treshold) + 1
df$bucket
# [1] 5 2 1 3 4
You can also use a rolling join with dplyr
:
library(dplyr)
df %>%
left_join(grd, by = join_by(closest(param < treshold))) %>%
select(-treshold)
# param name count
# 1 0.124 Tim 5
# 2 0.011 John 2
# 3 0.002 Alex 1
# 4 0.023 Jessica 3
# 5 0.056 Rose 4
Data
grd <- read.table(text = "
count treshold
1 1 0.01
2 2 0.02
3 3 0.05
4 4 0.10
5 5 0.20")
df <- read.table(text = "
param name
1 0.124 Tim
2 0.011 John
3 0.002 Alex
4 0.023 Jessica
5 0.056 Rose")
答案2
得分: 0
以下是使用dplyr的可能解决方案:
library(dplyr)
df <- df %>%
mutate(
bucket = case_when(
param <= 0.01 ~ 1,
param <= 0.02 ~ 2,
param <= 0.05 ~ 3,
param <= 0.10 ~ 4,
param <= 0.20 ~ 5
)
)
据我理解,你在问题中分享的最终结果是不正确的(第2行)。如果我理解错了,你可以轻松地调整case_when()中的阈值参数。
英文:
Here is a possible solution using dplyr
library(dplyr)
df <- df |>
mutate(
bucket = case_when(
param <= 0.01 ~ 1,
param <= 0.02 ~ 2,
param <= 0.05 ~ 3,
param <= 0.10 ~ 4,
param <= 0.20 ~ 5
)
)
As far as I understood you question, the final result you shared in your question is not correct (row 2). If I misunderstood you can easily adjust the threshold parameters in case_when()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论