In R, how can I create a function that can take values from columns using dplyr::mutate, but also still take specific strings as values?

huangapple go评论67阅读模式
英文:

In R, how can I create a function that can take values from columns using dplyr::mutate, but also still take specific strings as values?

问题

Here's the translated code part you requested:

这是可能表述不清的问题,但请跟我走。我正在尝试创建一个在R中根据用户提供的单位字符串转换值的函数。这是一个简化版:

conv <- function(val, from, to){
  if(from == "g" & to == "kg"){
    return(val / 1000)
  }else if(from == "kg" & to == "g"){
    return(val * 1000)
  }
}

到目前为止,一切顺利。只要我明确提供单位,它就能正常工作:

> conv(val = 10, from = "g", to = "kg")
[1] 0.01

但是,我还想能够在不事先知道单位的情况下在数据框中转换值。相反,单位将来自数据框中的列。

假设我有以下数据框:

library(dplyr)

df <- tibble(VAL = round(runif(n = 10, min = 5, max = 50), 0),
             FROM = sample(c("g", "kg"), size = 10, replace = TRUE),
             TO = sample(c("g", "kg"), size = 10, replace = TRUE))

在这里,单位可以变化,因此我无法在函数中指定它们。但是,如果我只是通过dplyr::mutate运行我的函数,我会收到一个错误:

df_conv <- df |>
  mutate(VAL_CONV = conv(val = VAL, from = FROM, to = TO))

Error in `mutate()`:
ℹ In argument: `VAL_CONV = conv(val = VAL, from = FROM, to = TO)`.
Caused by error in `if (from == "g" & to == "kg") ...`:
! the condition has length > 1

我如何编写一个函数,以便它可以接受用户直接输入的值,同时还可以接受通过mutate提供的列中的值?

我希望保持解决方案在基本R中,但不是完全必要的。

英文:

This is probably a poorly worded question, but bear with me.

I'm trying to make a function in R that converts values based on user-supplied units as strings. Here's a simplified version:

conv &lt;- function(val, from, to){
  if(from == &quot;g&quot; &amp; to == &quot;kg&quot;){
    return(val / 1000)
  }else if(from == &quot;kg&quot; &amp; to == &quot;g&quot;){
    return(val * 1000)
  }
}

So far, so good. As long as I specifically provide units, it works fine:

&gt; conv(val = 10, from = &quot;g&quot;, to = &quot;kg&quot;)
[1] 0.01

However, I would also like to be able to use this to convert values in a data frame where I don't know the units beforehand. Instead, the units would come from columns in the data frame.

Let's say I have the following data frame:

library(dplyr)

df &lt;- tibble(VAL = round(runif(n = 10, min = 5, max = 50), 0),
             FROM = sample(c(&quot;g&quot;, &quot;kg&quot;), size = 10, replace = TRUE),
             TO = sample(c(&quot;g&quot;, &quot;kg&quot;), size = 10, replace = TRUE))

Here, the units can change so I can't specify them in the function. But if I just run my function via dplyr::mutate, I get an error:

df_conv &lt;- df |&gt;
+   mutate(VAL_CONV = conv(val = VAL, from = FROM, to = TO))

Error in `mutate()`:
ℹ In argument: `VAL_CONV = conv(val = VAL, from = FROM, to = TO)`.
Caused by error in `if (from == &quot;g&quot; &amp; to == &quot;kg&quot;) ...`:
! the condition has length &gt; 1

How can I write a function so that it can take values the user types in directly, but also take values provided in columns via mutate?

I'd like to keep the solution in base R, but not totally necessary.

答案1

得分: 2

你可以尝试使用 dplyr::case_when 函数:

library(dplyr)

conv <- function(val, from, to){
  case_when(from == "g" & to == "kg" ~ val / 1000,
            from == "kg" & to == "g" ~ val * 1000,
            .default = val)
}

或者在基本的 R 中,使用嵌套的 ifelse 函数:

conv <- function(val, from, to){
  ifelse(from == "g" & to == "kg", val / 1000,
            ifelse(from == "kg" & to == "g", val * 1000, val))
}

case_whenifelse 在您的测试案例中给出相同的结果,但在有多个条件时,case_when 更容易阅读。

mutate 中使用:

df %> mutate(VAL_CONV = conv(val = VAL, from = FROM, to = TO))

接受用户输入:

conv(val = 10, from = "g", to = "kg")
# [1] 0.01
英文:

You can try dplyr::case_when

library(dplyr)

conv &lt;- function(val, from, to){
  case_when(from == &quot;g&quot; &amp; to == &quot;kg&quot; ~ val / 1000,
            from == &quot;kg&quot; &amp; to == &quot;g&quot; ~ val * 1000,
            .default = val)
}

Or in base R, nested ifelse:

conv &lt;- function(val, from, to){
  ifelse(from == &quot;g&quot; &amp; to == &quot;kg&quot;, val / 1000,
            ifelse(from == &quot;kg&quot; &amp; to == &quot;g&quot;, val * 1000, val))
}

Both case_when and ifelse give the same results on your test case, but case_when would be much more readable when you have multiple conditions.

In mutate:

df |&gt; mutate(VAL_CONV = conv(val = VAL, from = FROM, to = TO))
#&gt; # A tibble: 10 &#215; 4
#&gt;      VAL FROM  TO     VAL_CONV
#&gt;    &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;     &lt;dbl&gt;
#&gt;  1    17 kg    kg       17    
#&gt;  2    45 kg    kg       45    
#&gt;  3    27 kg    g     27000    
#&gt;  4    30 g     kg        0.03 
#&gt;  5    34 g     kg        0.034
#&gt;  6    47 g     kg        0.047
#&gt;  7    48 kg    g     48000    
#&gt;  8    44 g     g        44    
#&gt;  9    19 g     g        19    
#&gt; 10    24 kg    g     24000

Take user input:

conv(val = 10, from = &quot;g&quot;, to = &quot;kg&quot;)
#&gt; [1] 0.01

答案2

得分: 0

我认为你要么在你的函数中放一个功能性的东西,要么在管道中的mutate调用中放一个功能性的东西。对于第一个选项,你可以将你的函数更改如下:

conv <- function(val, from, to){
  Map(\(val, from, to) {
    if (from == "g" & to == "kg"){
      return(val / 1000)
    } else if (from == "kg" & to == "g"){
      return(val * 1000)
    } else { # 处理from/to相同时的问题
      return(NA_real_)
    }
  }, 
  val, from, to
  ) 
}

mutate(df, new = conv(VAL, FROM, TO))

这个方法的问题是它返回了一个列表,但是它是一个基本的R解决方案。我建议改用purrr::pmap_dbl来替代:

conv <- function(val, from, to) {
  purrr::pmap_dbl(
    list(val, from, to), 
    \(val, from, to) {
      if (from == "g" & to == "kg") {
        return(val / 1000)
      } else if (from == "kg" & to == "g") {
        return(val * 1000)
      } else {
        return(NA_real_)
      }
    }
  ) 
}

最后,你可以保持你的函数不变,然后像这样做:

conv2 <- function(val, from, to){
  if(from == "g" & to == "kg"){
    return(val / 1000)
  }else if(from == "kg" & to == "g"){
    return(val * 1000)
  } else {
    return(NA_real_)
  }
}

df <- tibble(VAL = round(runif(n = 10, min = 5, max = 50), 0),
             FROM = sample(c("g", "kg"), size = 10, replace = TRUE),
             TO = sample(c("g", "kg"), size = 10, replace = TRUE)) %>%
  rowwise() %>%
  mutate(new = pmap_dbl(list(VAL, FROM, TO), conv2))
英文:

I think you have to either put a functional in your function or in the mutate call in your pipe. For the first, you can change your function the following:

conv &lt;- function(val, from, to){
  Map(\(val, from, to) {
    if (from == &quot;g&quot; &amp; to == &quot;kg&quot;){
      return(val / 1000)
    } else if (from == &quot;kg&quot; &amp; to == &quot;g&quot;){
      return(val * 1000)
    } else { # handles problems with example when from/to are the same
      return(NA_real_)
    }
  }, 
  val, from, to
  ) 
}

mutate(df, new = conv(VAL, FROM, TO))

The problem with this is that it returns lists, but is a base R solution. I'd suggest using purrr::pmap_dbl instead:

conv &lt;- function(val, from, to) {
  purrr::pmap_dbl(
    list(val, from, to), 
    \(val, from, to) {
      if (from == &quot;g&quot; &amp; to == &quot;kg&quot;) {
        return(val / 1000)
      } else if (from == &quot;kg&quot; &amp; to == &quot;g&quot;) {
        return(val * 1000)
      } else {
        return(NA_real_)
      }
    }
  ) 
}

Finally, you can leave your function as is and do something like this:

conv2 &lt;- function(val, from, to){
  if(from == &quot;g&quot; &amp; to == &quot;kg&quot;){
    return(val / 1000)
  }else if(from == &quot;kg&quot; &amp; to == &quot;g&quot;){
    return(val * 1000)
  } else {
    return(NA_real_)
  }
}

df &lt;- tibble(VAL = round(runif(n = 10, min = 5, max = 50), 0),
             FROM = sample(c(&quot;g&quot;, &quot;kg&quot;), size = 10, replace = TRUE),
             TO = sample(c(&quot;g&quot;, &quot;kg&quot;), size = 10, replace = TRUE)) |&gt; 
  rowwise() |&gt; 
  mutate(new = pmap_dbl(list(VAL, FROM, TO), conv2))

huangapple
  • 本文由 发表于 2023年6月13日 11:30:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76461527.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定