英文:
In R, how can I create a function that can take values from columns using dplyr::mutate, but also still take specific strings as values?
问题
Here's the translated code part you requested:
这是可能表述不清的问题,但请跟我走。我正在尝试创建一个在R中根据用户提供的单位字符串转换值的函数。这是一个简化版:
conv <- function(val, from, to){
if(from == "g" & to == "kg"){
return(val / 1000)
}else if(from == "kg" & to == "g"){
return(val * 1000)
}
}
到目前为止,一切顺利。只要我明确提供单位,它就能正常工作:
> conv(val = 10, from = "g", to = "kg")
[1] 0.01
但是,我还想能够在不事先知道单位的情况下在数据框中转换值。相反,单位将来自数据框中的列。
假设我有以下数据框:
library(dplyr)
df <- tibble(VAL = round(runif(n = 10, min = 5, max = 50), 0),
FROM = sample(c("g", "kg"), size = 10, replace = TRUE),
TO = sample(c("g", "kg"), size = 10, replace = TRUE))
在这里,单位可以变化,因此我无法在函数中指定它们。但是,如果我只是通过dplyr::mutate
运行我的函数,我会收到一个错误:
df_conv <- df |>
mutate(VAL_CONV = conv(val = VAL, from = FROM, to = TO))
Error in `mutate()`:
ℹ In argument: `VAL_CONV = conv(val = VAL, from = FROM, to = TO)`.
Caused by error in `if (from == "g" & to == "kg") ...`:
! the condition has length > 1
我如何编写一个函数,以便它可以接受用户直接输入的值,同时还可以接受通过mutate提供的列中的值?
我希望保持解决方案在基本R中,但不是完全必要的。
英文:
This is probably a poorly worded question, but bear with me.
I'm trying to make a function in R that converts values based on user-supplied units as strings. Here's a simplified version:
conv <- function(val, from, to){
if(from == "g" & to == "kg"){
return(val / 1000)
}else if(from == "kg" & to == "g"){
return(val * 1000)
}
}
So far, so good. As long as I specifically provide units, it works fine:
> conv(val = 10, from = "g", to = "kg")
[1] 0.01
However, I would also like to be able to use this to convert values in a data frame where I don't know the units beforehand. Instead, the units would come from columns in the data frame.
Let's say I have the following data frame:
library(dplyr)
df <- tibble(VAL = round(runif(n = 10, min = 5, max = 50), 0),
FROM = sample(c("g", "kg"), size = 10, replace = TRUE),
TO = sample(c("g", "kg"), size = 10, replace = TRUE))
Here, the units can change so I can't specify them in the function. But if I just run my function via dplyr::mutate
, I get an error:
df_conv <- df |>
+ mutate(VAL_CONV = conv(val = VAL, from = FROM, to = TO))
Error in `mutate()`:
ℹ In argument: `VAL_CONV = conv(val = VAL, from = FROM, to = TO)`.
Caused by error in `if (from == "g" & to == "kg") ...`:
! the condition has length > 1
How can I write a function so that it can take values the user types in directly, but also take values provided in columns via mutate?
I'd like to keep the solution in base R, but not totally necessary.
答案1
得分: 2
你可以尝试使用 dplyr::case_when
函数:
library(dplyr)
conv <- function(val, from, to){
case_when(from == "g" & to == "kg" ~ val / 1000,
from == "kg" & to == "g" ~ val * 1000,
.default = val)
}
或者在基本的 R 中,使用嵌套的 ifelse
函数:
conv <- function(val, from, to){
ifelse(from == "g" & to == "kg", val / 1000,
ifelse(from == "kg" & to == "g", val * 1000, val))
}
case_when
和 ifelse
在您的测试案例中给出相同的结果,但在有多个条件时,case_when
更容易阅读。
在 mutate
中使用:
df %> mutate(VAL_CONV = conv(val = VAL, from = FROM, to = TO))
接受用户输入:
conv(val = 10, from = "g", to = "kg")
# [1] 0.01
英文:
You can try dplyr::case_when
library(dplyr)
conv <- function(val, from, to){
case_when(from == "g" & to == "kg" ~ val / 1000,
from == "kg" & to == "g" ~ val * 1000,
.default = val)
}
Or in base R, nested ifelse
:
conv <- function(val, from, to){
ifelse(from == "g" & to == "kg", val / 1000,
ifelse(from == "kg" & to == "g", val * 1000, val))
}
Both case_when
and ifelse
give the same results on your test case, but case_when
would be much more readable when you have multiple conditions.
In mutate
:
df |> mutate(VAL_CONV = conv(val = VAL, from = FROM, to = TO))
#> # A tibble: 10 × 4
#> VAL FROM TO VAL_CONV
#> <dbl> <chr> <chr> <dbl>
#> 1 17 kg kg 17
#> 2 45 kg kg 45
#> 3 27 kg g 27000
#> 4 30 g kg 0.03
#> 5 34 g kg 0.034
#> 6 47 g kg 0.047
#> 7 48 kg g 48000
#> 8 44 g g 44
#> 9 19 g g 19
#> 10 24 kg g 24000
Take user input:
conv(val = 10, from = "g", to = "kg")
#> [1] 0.01
答案2
得分: 0
我认为你要么在你的函数中放一个功能性的东西,要么在管道中的mutate调用中放一个功能性的东西。对于第一个选项,你可以将你的函数更改如下:
conv <- function(val, from, to){
Map(\(val, from, to) {
if (from == "g" & to == "kg"){
return(val / 1000)
} else if (from == "kg" & to == "g"){
return(val * 1000)
} else { # 处理from/to相同时的问题
return(NA_real_)
}
},
val, from, to
)
}
mutate(df, new = conv(VAL, FROM, TO))
这个方法的问题是它返回了一个列表,但是它是一个基本的R解决方案。我建议改用purrr::pmap_dbl
来替代:
conv <- function(val, from, to) {
purrr::pmap_dbl(
list(val, from, to),
\(val, from, to) {
if (from == "g" & to == "kg") {
return(val / 1000)
} else if (from == "kg" & to == "g") {
return(val * 1000)
} else {
return(NA_real_)
}
}
)
}
最后,你可以保持你的函数不变,然后像这样做:
conv2 <- function(val, from, to){
if(from == "g" & to == "kg"){
return(val / 1000)
}else if(from == "kg" & to == "g"){
return(val * 1000)
} else {
return(NA_real_)
}
}
df <- tibble(VAL = round(runif(n = 10, min = 5, max = 50), 0),
FROM = sample(c("g", "kg"), size = 10, replace = TRUE),
TO = sample(c("g", "kg"), size = 10, replace = TRUE)) %>%
rowwise() %>%
mutate(new = pmap_dbl(list(VAL, FROM, TO), conv2))
英文:
I think you have to either put a functional in your function or in the mutate call in your pipe. For the first, you can change your function the following:
conv <- function(val, from, to){
Map(\(val, from, to) {
if (from == "g" & to == "kg"){
return(val / 1000)
} else if (from == "kg" & to == "g"){
return(val * 1000)
} else { # handles problems with example when from/to are the same
return(NA_real_)
}
},
val, from, to
)
}
mutate(df, new = conv(VAL, FROM, TO))
The problem with this is that it returns lists, but is a base R solution. I'd suggest using purrr::pmap_dbl
instead:
conv <- function(val, from, to) {
purrr::pmap_dbl(
list(val, from, to),
\(val, from, to) {
if (from == "g" & to == "kg") {
return(val / 1000)
} else if (from == "kg" & to == "g") {
return(val * 1000)
} else {
return(NA_real_)
}
}
)
}
Finally, you can leave your function as is and do something like this:
conv2 <- function(val, from, to){
if(from == "g" & to == "kg"){
return(val / 1000)
}else if(from == "kg" & to == "g"){
return(val * 1000)
} else {
return(NA_real_)
}
}
df <- tibble(VAL = round(runif(n = 10, min = 5, max = 50), 0),
FROM = sample(c("g", "kg"), size = 10, replace = TRUE),
TO = sample(c("g", "kg"), size = 10, replace = TRUE)) |>
rowwise() |>
mutate(new = pmap_dbl(list(VAL, FROM, TO), conv2))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论