英文:
Problem using lapply in R with a function that uses constants in the global environment
问题
我正在尝试应用一个函数,该函数从全局环境中获取一个向量(BASELINE_CLASSIFICATION_THRESHOLDS)中的一些输入,并使用lapply将其应用于数据框。实质上,它将数字转换为级别(轻微、中等、严重、极端):
BASELINE_CLASSIFICATION_THRESHOLDS <- c(0, 3.5, 6.5, 10.0000001)
value_to_classification <- function(x){
if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2])){
classification <- "轻微"
} else if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[2]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[3])){
classification <- "中等"
} else if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[3]) && (x < round(BASELINE_CLASSIFICATION_THRESHOLDS[4]))){
classification <- "严重"
} else {
classification <- "极端"
}
return(classification)
}
df <- data.frame(x = runif(10, min = 0, max = 10),
y = runif(10, min = 0, max = 10),
z = runif(10, min = 0, max = 10))
但是,当我尝试将value_to_classification应用于x列时,我遇到了一个错误:
lapply(df["x"], value_to_classification)
$x
[1] "轻微"
警告信息:
1: 在 (x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2]) 中:
'length(x) = 10 > 1' 在强制类型转换为 'logical(1)' 时
2: 在 (x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2]) 中:
'length(x) = 10 > 1' 在强制类型转换为 'logical(1)' 时
另一方面,如果我写成:
lapply(df[["x"]], value_to_classification)
它可以工作。最终我想做的是类似于:
df[c("x1", "x2")] <- lapply(df[c("x", "y")], value_to_classification)
一些搜索似乎表明我的语法是正确的,但我显然做错了什么。我做错了什么,该如何修复?
诚挚地感谢您提前的帮助。
Thomas Philips
英文:
I'm trying to apply a function that takes some inputs from the global environment held in a vector (BASELINE_CLASSIFICATION_THRESHOLDS) to a dataframe using lapply. In essence it transforms numbers to levels (Mild, Moderate, Severe, Extreme):
BASELINE_CLASSIFICATION_THRESHOLDS <- c(0, 3.5, 6.5, 10.0000001)
value_to_classification <- function(x){
if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2])){
classification <- "Mild"
} else if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[2]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[3])){
classification <- "Moderate"
} else if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[3]) && (x < round(BASELINE_CLASSIFICATION_THRESHOLDS[4]))){
classification <- "Severe"
} else {
classification <- "Extreme"
}
return(classification)
}
df <- data.frame(x = runif(10, min = 0, max = 10),
y = runif(10, min = 0, max = 10),
z = runif(10, min = 0, max = 10))
But when I try to lapply value_to_classification to a column of x, I get an error:
lapply(df["x"], value_to_classification)
$x
[1] "Mild"
Warning messages:
1: In (x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2]) :
'length(x) = 10 > 1' in coercion to 'logical(1)'
2: In (x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2]) :
'length(x) = 10 > 1' in coercion to 'logical(1)'
On the other hand, if I write
lapply(df[["x"]], value_to_classification)
it works. What I eventually want to do is to write something like
df[c("x1", "x2")] <- lapply(df[c("x", "y")], value_to_classification)
Some searching seems to suggest that my syntax is OK, but I'm clearly getting something wrong. What am I doing wrong, and how can I fix this?
Sincerely and with many thanks in advance
Thomas Philips
答案1
得分: 1
问题是value_to_classification
不适用于向量。您可以运行value_to_classification(c(1,2,3))
,它将只返回一个值(而不是3个)。
一种解决方法是将函数向量化:
vectorized_value_to_classification <- Vectorize(value_to_classification)
df[c("x1", "x2")] <- lapply(df[c("x", "y")], vectorized_value_to_classification)
df
x y z x1 x2
1 3.233599 5.612147 2.9525939 轻度 中度
2 5.453014 3.659298 8.1642952 中度 中度
3 7.104259 6.333049 7.1706136 重度 中度
4 4.199447 3.277607 8.9458447 中度 轻度
5 9.352140 7.135801 2.6721405 重度 重度
6 7.682951 1.358830 4.2102313 重度 轻度
7 6.551999 9.986188 1.9995422 重度 重度
8 7.436272 9.260056 0.1093833 重度 重度
9 5.163593 7.689474 0.2999034 中度 重度
10 7.500994 4.599129 8.5266752 重度 中度
(Note: I've translated the variable names in the code as well.)
英文:
The issue is value_to_classification
does not work for a vector. You can run value_to_classification(c(1,2,3))
and it would only return one value (instead of 3).
One solution is to vectorize the function:
vectorized_value_to_classification <- Vectorize(value_to_classification)
df[c("x1", "x2")] <- lapply(df[c("x", "y")], vectorized_value_to_classification)
df
x y z x1 x2
1 3.233599 5.612147 2.9525939 Mild Moderate
2 5.453014 3.659298 8.1642952 Moderate Moderate
3 7.104259 6.333049 7.1706136 Severe Moderate
4 4.199447 3.277607 8.9458447 Moderate Mild
5 9.352140 7.135801 2.6721405 Severe Severe
6 7.682951 1.358830 4.2102313 Severe Mild
7 6.551999 9.986188 1.9995422 Severe Severe
8 7.436272 9.260056 0.1093833 Severe Severe
9 5.163593 7.689474 0.2999034 Moderate Severe
10 7.500994 4.599129 8.5266752 Severe Moderate
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论