使用lapply在R中出现问题,该函数在全局环境中使用常量。

huangapple go评论68阅读模式
英文:

Problem using lapply in R with a function that uses constants in the global environment

问题

我正在尝试应用一个函数,该函数从全局环境中获取一个向量(BASELINE_CLASSIFICATION_THRESHOLDS)中的一些输入,并使用lapply将其应用于数据框。实质上,它将数字转换为级别(轻微、中等、严重、极端):

BASELINE_CLASSIFICATION_THRESHOLDS <- c(0, 3.5, 6.5, 10.0000001)

value_to_classification <- function(x){

  if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2])){
    classification <- "轻微"
  } else if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[2]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[3])){
    classification <- "中等"
  } else if((x >= BASELINE_CLASSIFICATION_THRESHOLDS[3]) && (x < round(BASELINE_CLASSIFICATION_THRESHOLDS[4]))){
    classification <- "严重"
  } else {
    classification <- "极端"
  }
  return(classification)
}

df <- data.frame(x = runif(10, min = 0, max = 10),
                 y = runif(10, min = 0, max = 10),
                 z = runif(10, min = 0, max = 10))

但是,当我尝试将value_to_classification应用于x列时,我遇到了一个错误:

lapply(df["x"], value_to_classification)
$x
[1] "轻微"

警告信息:
1: 在 (x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2]) 中:
  'length(x) = 10 > 1' 在强制类型转换为 'logical(1)'2: 在 (x >= BASELINE_CLASSIFICATION_THRESHOLDS[1]) && (x < BASELINE_CLASSIFICATION_THRESHOLDS[2]) 中:
  'length(x) = 10 > 1' 在强制类型转换为 'logical(1)'

另一方面,如果我写成:

lapply(df[["x"]], value_to_classification)

它可以工作。最终我想做的是类似于:

df[c("x1", "x2")] <-  lapply(df[c("x", "y")], value_to_classification)

一些搜索似乎表明我的语法是正确的,但我显然做错了什么。我做错了什么,该如何修复?

诚挚地感谢您提前的帮助。

Thomas Philips

英文:

I'm trying to apply a function that takes some inputs from the global environment held in a vector (BASELINE_CLASSIFICATION_THRESHOLDS) to a dataframe using lapply. In essence it transforms numbers to levels (Mild, Moderate, Severe, Extreme):

BASELINE_CLASSIFICATION_THRESHOLDS  &lt;- c(0, 3.5, 6.5, 10.0000001)

value_to_classification &lt;- function(x){

  if((x &gt;= BASELINE_CLASSIFICATION_THRESHOLDS[1]) &amp;&amp; (x &lt; BASELINE_CLASSIFICATION_THRESHOLDS[2])){
    classification &lt;- &quot;Mild&quot;
  } else if((x &gt;= BASELINE_CLASSIFICATION_THRESHOLDS[2]) &amp;&amp; (x &lt; BASELINE_CLASSIFICATION_THRESHOLDS[3])){
    classification &lt;- &quot;Moderate&quot;
  } else if((x &gt;= BASELINE_CLASSIFICATION_THRESHOLDS[3]) &amp;&amp; (x &lt; round(BASELINE_CLASSIFICATION_THRESHOLDS[4]))){
    classification &lt;- &quot;Severe&quot;
  } else {
    classification &lt;- &quot;Extreme&quot;
  }
  return(classification)
}

df &lt;- data.frame(x = runif(10, min = 0, max = 10),
                 y = runif(10, min = 0, max = 10),
                 z = runif(10, min = 0, max = 10))

But when I try to lapply value_to_classification to a column of x, I get an error:

lapply(df[&quot;x&quot;], value_to_classification)
$x
[1] &quot;Mild&quot;

Warning messages:
1: In (x &gt;= BASELINE_CLASSIFICATION_THRESHOLDS[1]) &amp;&amp; (x &lt; BASELINE_CLASSIFICATION_THRESHOLDS[2]) :
  &#39;length(x) = 10 &gt; 1&#39; in coercion to &#39;logical(1)&#39;
2: In (x &gt;= BASELINE_CLASSIFICATION_THRESHOLDS[1]) &amp;&amp; (x &lt; BASELINE_CLASSIFICATION_THRESHOLDS[2]) :
  &#39;length(x) = 10 &gt; 1&#39; in coercion to &#39;logical(1)&#39;

On the other hand, if I write

lapply(df[[&quot;x&quot;]], value_to_classification)

it works. What I eventually want to do is to write something like

df[c(&quot;x1&quot;, &quot;x2&quot;)] &lt;-  lapply(df[c(&quot;x&quot;, &quot;y&quot;)], value_to_classification)

Some searching seems to suggest that my syntax is OK, but I'm clearly getting something wrong. What am I doing wrong, and how can I fix this?

Sincerely and with many thanks in advance

Thomas Philips

答案1

得分: 1

问题是value_to_classification不适用于向量。您可以运行value_to_classification(c(1,2,3)),它将只返回一个值(而不是3个)。

一种解决方法是将函数向量化:

vectorized_value_to_classification <- Vectorize(value_to_classification)

df[c("x1", "x2")] <-  lapply(df[c("x", "y")], vectorized_value_to_classification)

df
          x        y         z       x1       x2
1  3.233599 5.612147 2.9525939     轻度     中度
2  5.453014 3.659298 8.1642952     中度     中度
3  7.104259 6.333049 7.1706136     重度     中度
4  4.199447 3.277607 8.9458447     中度     轻度
5  9.352140 7.135801 2.6721405     重度     重度
6  7.682951 1.358830 4.2102313     重度     轻度
7  6.551999 9.986188 1.9995422     重度     重度
8  7.436272 9.260056 0.1093833     重度     重度
9  5.163593 7.689474 0.2999034     中度     重度
10 7.500994 4.599129 8.5266752     重度     中度

(Note: I've translated the variable names in the code as well.)

英文:

The issue is value_to_classification does not work for a vector. You can run value_to_classification(c(1,2,3)) and it would only return one value (instead of 3).

One solution is to vectorize the function:

vectorized_value_to_classification &lt;- Vectorize(value_to_classification)

df[c(&quot;x1&quot;, &quot;x2&quot;)] &lt;-  lapply(df[c(&quot;x&quot;, &quot;y&quot;)], vectorized_value_to_classification)

df
          x        y         z       x1       x2
1  3.233599 5.612147 2.9525939     Mild Moderate
2  5.453014 3.659298 8.1642952 Moderate Moderate
3  7.104259 6.333049 7.1706136   Severe Moderate
4  4.199447 3.277607 8.9458447 Moderate     Mild
5  9.352140 7.135801 2.6721405   Severe   Severe
6  7.682951 1.358830 4.2102313   Severe     Mild
7  6.551999 9.986188 1.9995422   Severe   Severe
8  7.436272 9.260056 0.1093833   Severe   Severe
9  5.163593 7.689474 0.2999034 Moderate   Severe
10 7.500994 4.599129 8.5266752   Severe Moderate

huangapple
  • 本文由 发表于 2023年6月15日 06:01:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477861.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定