Using parse_expr(), quo_name(), and enquo() to define a character object for plotting country-wise graphs in ggplot

huangapple go评论73阅读模式
英文:

Using parse_expr(), quo_name(), and enquo() to define a character object for plotting country-wise graphs in ggplot

问题

The first line of the code defines a Country_name object using the rlang package. When you try to run it separately, you encounter an error because the enquo function requires a symbol as its argument, but you are passing a character string ('United States') instead.

第一行的代码使用rlang包定义了一个Country_name对象。当你尝试单独运行它时,会出现错误,因为enquo函数需要一个符号作为其参数,但你传递的是一个字符字符串('United States')。

Here is the corrected code:

以下是已更正的代码:

parse_expr(quo_name(enquo(United States)))

Make sure to pass the symbol United States without quotes for it to work properly.

确保传递没有引号的符号United States以使其正常工作。

英文:

I have a function from a source that uses a couple of inputs, including country name, and return a graph for that country. The first line of the function defines a Country_name object as something that I cannot understand. When I tried to pull out that part from the function and run it separately, it returns an error while it works fine inside the function. Anyone has the opinion why this happened and what is the purpose of that line of code for Country_name?

function(df, dfline, Country_name){
  Country_name <- rlang::parse_expr(quo_name(enquo(Country_name)))
  df %>%
    filter(Country == Country_name ...
}

Pull out the first line and run it separately returns an error:

parse_expr(quo_name(enquo('United States')))

### Error in `enquo()`:
### ! `arg` must be a symbol

答案1

得分: 4

假设这是你的数据集:

df1 <- tribble(~ Country, ~ Value,
               'Brazil', 1,
               'Brazil', 2,
               'Canada', 3,
               'Canada', 4)

> df1 
# A tibble: 4 × 2
  Country Value
  <chr>   <dbl>
1 Brazil      1
2 Brazil      2
3 Canada      3
4 Canada      4

你可以简单地编写自定义的筛选函数如下:

fun1 <- function(df, Country_name){
  df %>%
    filter(Country == Country_name)
}

> fun1(df1, 'Brazil')
# A tibble: 2 × 2
  Country Value
  <chr>   <dbl>
1 Brazil      1
2 Brazil      2

但是假设你想要省略'Brazil'周围的引号并且仍然获得相同的输出。如果不做任何修改,你会得到一个错误:

> fun1(df1, Brazil)
# ...
#! object 'Brazil' not found
# ...

R将Brazil理解为一个变量,并在全局环境中查找它。它找不到它,然后返回错误。如果Brazil是一个变量,你可能会得到奇怪的结果:

Brazil <- 'Canada'

> fun1(df1, Brazil)
# A tibble: 2 × 2
  Country Value
  <chr>   <dbl>
1 Canada      3
2 Canada      4

R看到Brazil具有'Canada'的值,将该值绑定到Country_name,然后在筛选中使用该值。

这不是你想要的。你想要获得实际的单词Brazil,而不是它代表的值。这就是你所提到的那一行的作用。我将解释它是如何工作的。

第一步是告诉R:“我不希望你评估你收到的参数,我只想保存它的文本”。也就是说,我们想要延迟评估传递给Country_name的表达式。可以用几种方法来实现:

  • 在基本R中使用substitute(Country_name),正如Nir Graham所指出的;

substitute返回**(未评估)表达式**的解析树…- substitute的帮助页面。

  • 使用rlang中的enquo(Country_name),就像你的函数所做的那样。

enquo()enquos()解除函数参数。解除的表达式可以被检查、修改并注入到其他表达式中。- enquo的帮助页面。

  • 使用rlang中的enexpr(Country_name),也正如Nir Graham所指出的;

enexpr()enexprs()类似于enquo()enquos(),但返回裸表达式而不是quosures。- enexpr的帮助页面

因此,它们都有非常相似的效果。最大的区别是enquo“返回quosures而不是裸表达式”。简单来说,_quosures_是表达式,还指向应找到其相关变量值的环境*。我们不需要这个(但这也不是问题),因为将不会评估所讨论的表达式,我们只想要它的文本。

之后,我们只需要获取解除表达式的文本,可以使用:

  • as.character()
  • deparse1()
  • rlang::quo_name()
  • rlang::expr_name()

等等。因此,选项与Nir Graham所做的类似:

fun2_base <- function(df, Country_name){
  Country_name <- deparse1(substitute(Country_name))
  df %>%
    filter(Country == Country_name)
}

fun2_rlang <- function(df, Country_name){
  Country_name <- as.character(enexpr(Country_name))
  df %>%
    filter(Country == Country_name)
}

fun2_base(df1, Brazil)
fun2_rlang(df1, Brazil)

都会产生相同的结果:

# A tibble: 2 × 2
  Country Value
  <chr>   <dbl>
1 Brazil      1
2 Brazil      2

请注意,我们不需要删除那个Brazil变量,因为它不会被评估。

英文:

Assume this was your dataset:

df1 &lt;- tribble(~ Country, ~ Value,
               &#39;Brazil&#39;, 1,
               &#39;Brazil&#39;, 2,
               &#39;Canada&#39;, 3,
               &#39;Canada&#39;, 4)

&gt; df1 
# A tibble: 4 &#215; 2
  Country Value
  &lt;chr&gt;   &lt;dbl&gt;
1 Brazil      1
2 Brazil      2
3 Canada      3
4 Canada      4

You could write your custom filter function simply as:

fun1 &lt;- function(df, Country_name){
  df %&gt;%
    filter(Country == Country_name)
}

&gt; fun1(df1, &#39;Brazil&#39;)
# A tibble: 2 &#215; 2
  Country Value
  &lt;chr&gt;   &lt;dbl&gt;
1 Brazil      1
2 Brazil      2

But imagine you want to be able to omit the quotes around &#39;Brazil&#39; and still get the same output. If you made no modification you would get an error:

&gt; fun1(df1, Brazil)
# ...
#! object &#39;Brazil&#39; not found
# ...

R is understanding Brazil as a variable, and is looking for it in your global environment. It is failing to find it, and then, it returns an error. If Brazil were a variable, you could get weird results:

Brazil &lt;- &#39;Cadada&#39;

&gt; fun2(df1, Brazil)
# A tibble: 2 &#215; 2
  Country Value
  &lt;chr&gt;   &lt;dbl&gt;
1 Canada      3
2 Canada      4

R is seeing that Brazil has the value of &#39;Canada&#39;, binding that value to Country_name, and using that value on the filter.

That's not what you wanted. You wanted to get the actual word Brazil, and not the value it represents. That is what the line you were referring to does. I'll explain how it works below.

The first step is saying to R "I don't want you to evaluate the argument you received, I just want you to save it's text". That is, we want to delay the evaluation of the expression that was passed onto Country_name. That can be done in several ways:

  • substitute(Country_name) in base R, as Nir Graham noted;
    > substitute returns the parse tree for the (unevaluated) expression ... -substitute's help page.

  • enquo(Country_name) with rlang, as your function did.
    > enquo() and enquos() defuse function arguments. A defused expression can be examined, modified, and injected into other expressions. -enquo's help page.

  • enexpr(Country_name) with rlang, also as Nir Graham noted;
    > enexpr() and enexprs() are like enquo() and enquos() but return naked expressions instead of quosures. -enexpr's help page

So they all have very similar effects. The biggest difference is that enquo "return quosures instead of naked expressions". In simple terms, quosures are expressions that also point to the environment where the value for it's relevant variables should be found*. We don't need that (but it's also not a problem), as the expression in question wont be evaluated, we just want it's text.

After that we just want to get the text of that defused expression, which can be made with:

  • as.character();
  • deparse1();
  • rlang::quo_name();
  • rlang::expr_name().

And others. Thus, the options are similar to what Nir Graham did:

fun2_base &lt;- function(df, Country_name){
  Country_name &lt;- deparse1(substitute(Country_name))
  df %&gt;%
    filter(Country == Country_name)
}

fun2_rlang &lt;- function(df, Country_name){
  Country_name &lt;- as.character(enexpr(Country_name))
  df %&gt;%
    filter(Country == Country_name)
}

fun2_base(df1, Brazil)
fun2_rlang(df1, Brazil)

All yield:

# A tibble: 2 &#215; 2
  Country Value
  &lt;chr&gt;   &lt;dbl&gt;
1 Brazil      1
2 Brazil      2

Note that we didn't needed to remove that Brazil variable, because it's not being evaluated.

*: To know more, read about tidy evaluation and the metaprogramming chapters of "Advanced R"

答案2

得分: 2

首先,让我们构建一个最小的 reprex。

然后,让我们使用 boomer 打印中间输出:

我们可以看到:

  • enquo() 捕获输入并生成一个 quosure。
  • quo_name() 提取表达式作为字符串。
  • parse_expr() 从字符串构建一个符号。
  • 这个符号用于相等性比较(在这里它被强制转换为字符,尝试 quote(a) == "a" 来查看它是如何工作的)。

如果我们想更好地了解这些对象,我们可以在 print 参数中使用 {constructive}。它不会打印对象,而是打印用于重建它们的代码。

底线是,代码有点臃肿,也有点奇怪和不安全,你不应该提供字符串作为变量,只是为了节省双引号,那如果要提供 "United Kingdom" 该怎么办呢?

正确的方法是简单地将 Country_name 提供为字符串,并且有:

或者为了更安全,以防 df 可能包含与参数发生冲突的 Country_name 列:

或者

fun <- function(df, Country_name){
  df %>%
    filter(Country == !!Country_name)
}
英文:

First let's build a minimal reprex

library(dplyr, warn.conflicts = FALSE)
fun &lt;- function(df, Country_name){
  Country_name &lt;- rlang::parse_expr(quo_name(enquo(Country_name)))
  df %&gt;%
    filter(Country == Country_name)
}
df &lt;- data.frame(x = 1:2, Country = c(&quot;Belgium&quot;, &quot;Ukraine&quot;))
df
#&gt;   x Country
#&gt; 1 1 Belgium
#&gt; 2 2 Ukraine

fun(df, Ukraine)
#&gt;   x Country
#&gt; 1 2 Ukraine

Then let's use boomer to print intermediate outputs :

fun1 &lt;- boomer::rig(fun)
fun1(df, Ukraine)
#&gt; &#128071; fun
#&gt; &#128163; rlang::parse_expr(quo_name(enquo(Country_name))) 
#&gt; &#183; &#128163; quo_name(enquo(Country_name)) 
#&gt; &#183; &#183; &#128163; &#128165; enquo(Country_name) 
#&gt; &#183; &#183; &lt;quosure&gt;
#&gt; &#183; &#183; expr: ^Ukraine
#&gt; &#183; &#183; env:  global
#&gt; &#183; &#183; 
#&gt; &#183; &#128165; quo_name(enquo(Country_name)) 
#&gt; &#183; [1] &quot;Ukraine&quot;
#&gt; &#183; 
#&gt; &#128165; rlang::parse_expr(quo_name(enquo(Country_name))) 
#&gt; Ukraine
#&gt; 
#&gt; &#128163; df %&gt;% filter(Country == Country_name) 
#&gt; &#183; &#128163; filter(., Country == Country_name) 
#&gt; &#183; &#183; df :
#&gt; &#183; &#183;   x Country
#&gt; &#183; &#183; 1 1 Belgium
#&gt; &#183; &#183; 2 2 Ukraine
#&gt; &#183; &#183; Country_name :
#&gt; &#183; &#183; Ukraine
#&gt; &#183; &#183; &#128163; &#128165; Country == Country_name 
#&gt; &#183; &#183; [1] FALSE  TRUE
#&gt; &#183; &#183; 
#&gt; &#183; &#128165; filter(., Country == Country_name) 
#&gt; &#183;   x Country
#&gt; &#183; 1 2 Ukraine
#&gt; &#183; 
#&gt; &#128165; df %&gt;% filter(Country == Country_name) 
#&gt;   x Country
#&gt; 1 2 Ukraine
#&gt; 
#&gt; &#128070; fun
#&gt;   x Country
#&gt; 1 2 Ukraine

We see that:

  • enquo() captures the input into a quosure
  • quo_name() extract the expression as a string
  • parse_expr() build a symbol from the string
  • This symbol is used in the equality (there's it's coerced to character, try quote(a) == &quot;a&quot; to check how this works).

If we want to understand the objects better we might use {constructive} in the print argument. Instead of printing the objects it will print the code to reconstruct them.

# remotes::install_github(&quot;cynkra/constructive&quot;)
fun2 &lt;- boomer::rig(fun, print = constructive::construct)
fun2(df, Ukraine)
#&gt; &#128071; fun
#&gt; &#128163; rlang::parse_expr(quo_name(enquo(Country_name))) 
#&gt; &#183; &#128163; quo_name(enquo(Country_name)) 
#&gt; &#183; &#183; &#128163; &#128165; enquo(Country_name) 
#&gt; &#183; &#183; rlang::new_quosure(quote(Ukraine), .GlobalEnv)
#&gt; &#183; &#183; 
#&gt; &#183; &#128165; quo_name(enquo(Country_name)) 
#&gt; &#183; &quot;Ukraine&quot;
#&gt; &#183; 
#&gt; &#128165; rlang::parse_expr(quo_name(enquo(Country_name))) 
#&gt; quote(Ukraine)
#&gt; 
#&gt; &#128163; df %&gt;% filter(Country == Country_name) 
#&gt; &#183; &#128163; filter(., Country == Country_name) 
#&gt; &#183; &#183; df :
#&gt; &#183; &#183; data.frame(x = 1:2, Country = c(&quot;Belgium&quot;, &quot;Ukraine&quot;))
#&gt; &#183; &#183; Country_name :
#&gt; &#183; &#183; quote(Ukraine)
#&gt; &#183; &#183; &#128163; &#128165; Country == Country_name 
#&gt; &#183; &#183; c(FALSE, TRUE)
#&gt; &#183; &#183; 
#&gt; &#183; &#128165; filter(., Country == Country_name) 
#&gt; &#183; data.frame(x = 2L, Country = &quot;Ukraine&quot;)
#&gt; &#183; 
#&gt; &#128165; df %&gt;% filter(Country == Country_name) 
#&gt; data.frame(x = 2L, Country = &quot;Ukraine&quot;)
#&gt; 
#&gt; &#128070; fun
#&gt;   x Country
#&gt; 1 2 Ukraine

boomer::boom(fun(df, Ukraine), print = function(x) print(constructive::construct(x)))
#&gt; &#128163; &#128165; fun(df, Ukraine) 
#&gt; data.frame(x = 2L, Country = &quot;Ukraine&quot;)
#&gt;   x Country
#&gt; 1 2 Ukraine

<sup>Created on 2023-06-02 with reprex v2.0.2</sup>

The bottom line is that the code is bloated, and also weird and unsafe, you shouldn't provide strings as variables just to spare double quotes, how will you provide "United Kingdom" ?

The right way to do it is simply to provide Country_name as a string and have :

fun &lt;- function(df, Country_name){
  df %&gt;%
    filter(Country == Country_name)
}

Or to be extra safe, in case df could contain a Country_name column that would collide with the argument:

fun &lt;- function(df, Country_name){
  df %&gt;%
    filter(Country == .env$Country_name)
}

or

fun &lt;- function(df, Country_name){
  df %&gt;%
    filter(Country == !!Country_name)
}

答案3

得分: 0

它使用3个函数调用来实现可以在2个函数调用中完成的任务,无论是在基本环境中还是使用 rlang。

library(dplyr)
library(rlang)

myfilt_base <- function(x){
mysym <- deparse1(substitute(x))
filter(iris, Species == mysym)
}

myfilt_base(versicolor)

myfilt_rlang <- function(x){
mysym <- as.character(enexpr(x))
filter(iris, Species == mysym)
}

myfilt_rlang(virginica)



<details>
<summary>英文:</summary>

Its using 3 function calls to be able to do what is acheivable in 2, whether in base or using rlang.

library(dplyr)
library(rlang)

myfilt_base <- function(x){
mysym <- deparse1(substitute(x))
filter(iris, Species == mysym)
}

myfilt_base(versicolor)

myfilt_rlang <- function(x){
mysym <- as.character(enexpr(x))
filter(iris, Species == mysym)
}

myfilt_rlang(virginica)

huangapple
  • 本文由 发表于 2023年5月29日 10:51:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76354420.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定