R包或函数,用于记录应用于您的tibble的筛选器。

huangapple go评论66阅读模式
英文:

R package or function to record filters applied to your tibble

问题

I can translate the text you provided:

"是否存在任何R函数或包记录应用于tibble/data frame的操作?

例如,如果我执行以下操作:

data(iris)
my_table <- iris %>% filter(Sepal.Length > 6) %>% filter(Species == 'virginica')

我希望输出的格式类似于:

display_filter_function(my_table)

output:
Step   filter
1       sepal.length > 6
2       Species == 'virginica'

我认为这与recipes包提供的功能类似,但不需要使用step_系列函数。"

英文:

Does there exist any R function or packages that records the operations applied to a tibble/data frame?

For example, if I did the following

data(iris)
my_table &lt;- iris %&gt;% filter(Sepal.Length&gt;6) %&gt;% filter(Species == &#39;virginica&#39;)

I would want the output to be something of the form

display_filter_function(my_table)

output:
Step   filter
1       sepal.length &gt; 6
2       Species == &#39;virginica&#39;

I am thinking that this would be something similar to the functionality provided by the recipes package, but not needing to use the step_ family of function

答案1

得分: 4

I've translated the text you provided. Here it is:

"I've written a little module for you. It is a standalone resource and has only one dependency beyond base R: namely dplyr itself. The module is long, so I have put it at the bottom of this post. You can find the code itself under the Module section, and its usage is demonstrated under the Usage section.

This model could theoretically be extended to all dplyr functions, and to other generic functions as well. To keep things manageable, I myself have implemented it for dplyr::filter() alone.

Background:

This module leverages the R concept of generic methods, like print() and format() and mean() and summary(). Suppose you wish to print() a data.frame object. The generic print() function...

print
#> function (x, ...) 
#> UseMethod("print")
#> <bytecode: 0x000002429186e2c8>
#> <environment: namespace:base>

...does not do the work itself! Rather, it dispatches to some print.*() method, via the line:

UseMethod("print")

Now the native data.frame class has its own special print() method called print.data.frame().

print.data.frame
#> function (x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE, max = NULL) 
#> {
#>     n <- length(row.names(x))
#>     ⋮
#>     invisible(x)
#> }
#> <bytecode: 0x000002429186b7e0>
#> <environment: namespace:base>

So when UseMethod() seeks a matching ("print") method, it finds print.data.frame() ready and waiting! It is the print.data.frame() function that actually handles the printing for the data.frame.

More generally, a generic function like fn()...

fn <- function(x, ...) {
  UseMethod("fn")
}

can be implemented for an S3 class like cls, with a function of the form fn.cls():

fn.cls <- function(x, arg_1, arg_2, arg_3, ...) {
  # ...
} 

Note:

The fn.default() method handles fn() for unimplemented classes. So in the absence of a print.cls() function, then UseMethod() would dispatch a cls object to print.default():

print.default
#> function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, width = NULL, useSource = TRUE, ...) 
#> {
#>     args <- pairlist(digits = digits, quote = quote, na.print = na.print, ...
#>     ⋮
#>     .Internal(print.default(x, args, missings))
#> }
#> <bytecode: 0x0000024291917b80>
#> <environment: namespace:base>

Approach:

By defining a custom S3 class called hst_obj — "historical object" — I override the "generic" behavior of dplyr::filter()...

dplyr::filter
#> function (.data, ..., .preserve = FALSE) 
#> {
#>     UseMethod("filter")
#> }
#> <bytecode: 0x0000024292d10b40>
#> <environment: namespace:dplyr>

...which is designed to dispatch via UseMethod("filter"). To that end, I implement the function filter.hst_obj():

filter.hst_obj
#> function (.data, ..., .preserve = FALSE) 
#> {
#>     .update_hst(x = `class<-`(dplyr::filter(.data = un_hst_obj(.data, ...
#> }
#> <bytecode: 0x000002428f842958>

When you call dplyr::filter() on a hst_obj object, then filter.hst_obj() jumps into action! Whenever it filters the object, it also records the filtration criteria in the special attribute obj_hst, which maintains the "object history".

This history is a tibble...

# A tibble: m × 4
   step order expr       text            
  <int> <int> <list>     <chr>           
1     1     1 <language> sepal.length > 6
⋮      ⋮     ⋮      ⋮             ⋮

...which has four columns:

  • step: The filter() step in the workflow.

  • order: The criterion within the filter() step.

  • expr: The actual code (language) for the criterion (sepal.length > 6), useful for programmatic manipulation of R.

  • text: A textual (character) representation of that code ("sepal.length > 6"), for visual clarity.

Usage:

You'll want to load dplyr itself, and then source() the module (mod.R) from your working directory.

# Load the dplyr package...
library(dplyr)

# ...along with the hst_obj functions from the module:
source("./mod.R")

Warning:

The modular function filter.hst_obj() must be loaded into the same workspace where you use dplyr::filter(). Per the documentation:

UseMethod...searches for methods in two places: in the environment in which the generic function is called, and in the registration database for the environment in which the generic is defined (typically a namespace). So methods for a generic function need to be available in the environment of the call to the generic, or they must be registered.

Here is a simple workflow on the iris dataset:

iris %>%
  filter(Sepal.Length > 7, Sepal.Width <= 3) %>%
  filter(Petal.Width > 2)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 1          7.1         3.0          5.9         2.1 virginica
#> 2          7.6         3.0          6.6         2.1 virginica
#> 3          7.7         2.6          6.9         2.3 virginica
#> 4          7.7         3.0          6.1         2.3 virginica

Now we transform the dataset into a "historical object" called iris_hst, via as_hst_obj():

iris_hst <- as_hst_obj(iris)

Per is_hst_obj(), it is indeed a historical object:

iris_hst %>% is_hst_obj()
#> TRUE

However, its history

英文:

I've written a little module for you. It is a standalone resource and has only one dependency beyond base R: namely dplyr itself. The module is long, so I have put it at the bottom of this post. You can find the code itself under the Module section, and its usage is demonstrated under the Usage section.

This model could theoretically be extended to all dplyr functions, and to other generic functions as well. To keep things manageable, I myself have implemented it for dplyr::filter() alone.

Background

This module leverages the R concept of generic methods, like print() and format() and mean() and summary(). Suppose you wish to print() a data.frame object. The generic print() function...

print
#&gt; function (x, ...) 
#&gt; UseMethod(&quot;print&quot;)
#&gt; &lt;bytecode: 0x000002429186e2c8&gt;
#&gt; &lt;environment: namespace:base&gt;

...does not do the work itself! Rather, it dispatches to some print.*() method, via the line:

UseMethod(&quot;print&quot;)

Now the native data.frame class has its own special print() method called print.data.frame().

print.data.frame
#&gt; function (x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE, max = NULL) 
#&gt; {
#&gt;     n &lt;- length(row.names(x))
#&gt;     ⋮
#&gt;     invisible(x)
#&gt; }
#&gt; &lt;bytecode: 0x000002429186b7e0&gt;
#&gt; &lt;environment: namespace:base&gt;

So when UseMethod() seeks a matching (&quot;print&quot;) method, it finds print.data.frame() ready and waiting! It is the print.data.frame() function that actually handles the printing for the data.frame.

More generally, a generic function like fn()...

fn &lt;- function(x, ...) {
  UseMethod(&quot;fn&quot;)
}

can be implemented for a (S3) class like cls, with a function of the form fn.cls():

fn.cls &lt;- function(x, arg_1, arg_2, arg_3, ...) {
  # ...
} 

Note

The fn.default() method handles fn() for unimplemented classes. So in the absence of a print.cls() function, then UseMethod() would dispatch a cls object to print.default():

print.default
#&gt; function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, width = NULL, useSource = TRUE, ...) 
#&gt; {
#&gt;     args &lt;- pairlist(digits = digits, quote = quote, na.print = na.print, ...
#&gt;     ⋮
#&gt;     .Internal(print.default(x, args, missings))
#&gt; }
#&gt; &lt;bytecode: 0x0000024291917b80&gt;
#&gt; &lt;environment: namespace:base&gt;

Approach

By defining a custom S3 class called hst_obj — "historical object" — I override the "generic" behavior of dplyr::filter()...

dplyr::filter
#&gt; function (.data, ..., .preserve = FALSE) 
#&gt; {
#&gt;     UseMethod(&quot;filter&quot;)
#&gt; }
#&gt; &lt;bytecode: 0x0000024292d10b40&gt;
#&gt; &lt;environment: namespace:dplyr&gt;

...which is designed to dispatch via UseMethod(&quot;filter&quot;). To that end, I implement the function filter.hst_obj():

filter.hst_obj
#&gt; function (.data, ..., .preserve = FALSE) 
#&gt; {
#&gt;     .update_hst(x = `class&lt;-`(dplyr::filter(.data = un_hst_obj(.data, ...
#&gt; }
#&gt; &lt;bytecode: 0x000002428f842958&gt;

When you call dplyr::filter() on a hst_obj object, then filter.hst_obj() jumps into action! Whenever it filters the object, it also records the filtration criteria in the special attribute obj_hst, which maintains the "object history".

This history is a tibble...

# A tibble: m &#215; 4
   step order expr       text            
  &lt;int&gt; &lt;int&gt; &lt;list&gt;     &lt;chr&gt;           
1     1     1 &lt;language&gt; sepal.length &gt; 6
⋮      ⋮     ⋮      ⋮             ⋮

...which has four columns:

  • step: The filter() step in the workflow.
iris %&gt;%                              # step
  filter(Sepal.Length &gt; 6) %&gt;%        # } 1
  filter(Species == &#39;virginica&#39;) %&gt;%  # } 2
  ...                                 #   ⋮
  • order: The criterion within the filter() step.
     filter(a &lt; 10, b == 3 | c &gt; 5, ...)
#           |----|  |------------|
# order:       1           2        ...
  • expr: The actual code (language) for the criterion (Sepal.Length &gt; 6), useful for programmatic manipulation of R.
  • text: A textual (character) representation of that code (&quot;Sepal.Length &gt; 6&quot;), for visual clarity.

Usage

You'll want to load dplyr itself, and then source() the module (mod.R) from (say) your working directory.

# Load the `dplyr` package...
library(dplyr)

# ...along with the `hst_obj` functions from the module:
source(&quot;./mod.R&quot;)

Warning

The modular function filter.hst_obj() must be loaded into the same workspace where you use dplyr::filter(). Per the documentation
> UseMethod...search[es] for methods in two places: in the environment in which the generic function is called, and in the registration data base for the environment in which the generic is defined (typically a namespace). So methods for a generic function need to be available in the environment of the call to the generic, or they must be registered.


Here is a simple workflow on the iris dataset.

iris %&gt;%
  filter(Sepal.Length &gt; 7, Sepal.Width &lt;= 3) %&gt;%
  filter(Petal.Width &gt; 2)
#&gt; Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#&gt; 1          7.1         3.0          5.9         2.1 virginica
#&gt; 2          7.6         3.0          6.6         2.1 virginica
#&gt; 3          7.7         2.6          6.9         2.3 virginica
#&gt; 4          7.7         3.0          6.1         2.3 virginica

Now we transform the dataset into a "historical object" called iris_hst, via as_hst_obj().

iris_hst &lt;- as_hst_obj(iris)

Per is_hst_obj(), it is indeed a historical object.

iris_hst %&gt;% is_hst_obj()
#&gt; TRUE

However, its history via get_hst() is still blank.

iris_hst %&gt;% get_hst()
#&gt; # A tibble: 0 &#215; 4
#&gt; # … with 4 variables: step &lt;int&gt;, order &lt;int&gt;, expr &lt;list&gt;, text &lt;chr&gt;

We now perform the same workflow on the historical dataset iris_hst...

iris_hst &lt;- iris_hst %&gt;%
  filter(Sepal.Length &gt; 7, Sepal.Width &lt;= 3) %&gt;%
  filter(Petal.Width &gt; 2)

...which yields a consistent output.

iris_hst
#&gt; Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#&gt; 1          7.1         3.0          5.9         2.1 virginica
#&gt; 2          7.6         3.0          6.6         2.1 virginica
#&gt; 3          7.7         2.6          6.9         2.3 virginica
#&gt; 4          7.7         3.0          6.1         2.3 virginica

Crucially, we can now access the history via get_hst():

iris_hst %&gt;% get_hst()
#&gt; # A tibble: 3 &#215; 4
#&gt;    step order expr       text            
#&gt;   &lt;int&gt; &lt;int&gt; &lt;list&gt;     &lt;chr&gt;           
#&gt; 1     1     1 &lt;language&gt; Sepal.Length &gt; 7
#&gt; 2     1     2 &lt;language&gt; Sepal.Width &lt;= 3
#&gt; 3     2     1 &lt;language&gt; Petal.Width &gt; 2 

We can also "reset" the history via reset_hst(), which clears the tibble of historical data.

iris_hst &lt;- iris_hst %&gt;% reset_hst()

iris_hst %&gt;% get_hst()
#&gt; # A tibble: 0 &#215; 4
#&gt; # … with 4 variables: step &lt;int&gt;, order &lt;int&gt;, expr &lt;list&gt;, text &lt;chr&gt;

Finally, we can revert to an "unhistorical" object via un_hst_obj(), which removes the hst_obj classification and deletes the obj_hst attribute:

iris_unhst &lt;- iris_hst %&gt;% un_hst_obj()

# It is no longer a &quot;historical&quot; object...
iris_unhst %&gt;% is_hst_obj()
#&gt; FALSE

# ...and the history is nonexistent (not merely blank) entirely.
iris_unhst %&gt;% get_hst()
#&gt; 

Module

Here is the module. I recommend saving it locally, as (say) mod.R in (say) your working directory. I also recommend the box package, which can load such modules painlessly via box::use(./mod).

#########
## API ##
#########

# Test if an object is &quot;historical object&quot; whose filtrations are recorded.
is_hst_obj &lt;- function(x) {
  inherits(x, .HST_OBJ_CLASS)
}


# Treat an object as &quot;historical&quot;.
as_hst_obj &lt;- function(x) {
  if (!is_hst_obj(x)) {
    class(x) &lt;- c(.HST_OBJ_CLASS, class(x))
  }
  
  x
}


# Erase the &quot;historicity&quot; of an object.
un_hst_obj &lt;- function(x, hst = TRUE) {
  if (is_hst_obj(x)) {
    org_class &lt;- class(x)
    class(x) &lt;- org_class[org_class != .HST_OBJ_CLASS]
    
    if (isTRUE(hst)) {
      x &lt;- .set_hst(x, hst = NULL)
    }
  }
  
  x
}


# Get the history from a historical object.
get_hst &lt;- function(x) {
  hst &lt;- attr(x, .OBJ_HST_ATTR)
  
  if (is.null(hst)) {
    if (is_hst_obj(x)) {
      .BLANK_OBJ_HST
      # NULL
    } else {
      invisible(NULL)
    }
  } else {
    hst
  }
}


# Reset the history for a historical object.
reset_hst &lt;- function(x) {
  if (is_hst_obj(x)) {
    x &lt;- .set_hst(x, hst = NULL)
  }
  
  x
}



##############
## Dispatch ##
##############

# Dispatch filtration for historical objects.
filter.hst_obj &lt;- evalq(envir = new.env(), {
  # Define the filtration function: `dplyr::filter()`
  fn_expr &lt;- quote(dplyr::filter)
  #                ^^^^^^^^^^^^^
  #                 UPDATE HERE
  fn &lt;- eval(fn_expr)
  
  # Replicate in our result the signature of that original function.
  arg_syms &lt;- as.list(args(fn))
  arg_syms &lt;- utils::head(arg_syms, n = -1)
  arg_syms &lt;- sapply(names(arg_syms), as.symbol, USE.NAMES = TRUE)
  
  
  # Prepare the elements for the function body...
  obj_sym &lt;- arg_syms[[1]]   # The (1st) argument (.data) for the object...
  cnd_exprs &lt;- arg_syms$...  # ...and dots (...) for filtration condition(s).
  
  # ...including a similar call to the filter with an &quot;ahistorical&quot; object...
  arg_syms[[as.character(obj_sym)]] &lt;- substitute(un_hst_obj(
    obj_sym,
    hst = FALSE
  ))
  fn_call &lt;- as.call(c(list(fn_expr), arg_syms))
  
  sub_list &lt;- list(
    obj = obj_sym,
    cnd = cnd_exprs,
    cll = fn_call
  )
  
  # ...and assemble those elements.
  fn_body &lt;- substitute(env = sub_list, quote({
    .update_hst(
      # Perform the unclassed call and then restore any &quot;historicity&quot;...
      x = `class&lt;-`(cll, class(obj)),
      # ...and then update the history with the filtration criteria.
      exprs = match.call(expand.dots = FALSE)$cnd
    )
  }))
  
  
  # Pair this body with the header from `dplyr::filter()`...
  fn_body &lt;- eval(fn_body)
  body(fn) &lt;- fn_body
  
  # ...and transfer the resulting function to the calling environment.
  environment(fn) &lt;- parent.frame(n = 2)
  
  
  # Return the resulting function.
  fn
})



#############
## Support ##
#############

# Labels for the object class...
.HST_OBJ_CLASS &lt;- &quot;hst_obj&quot;

# ...and its history attribute.
.OBJ_HST_ATTR &lt;- &quot;obj_hst&quot;

# The default history for an object.
.BLANK_OBJ_HST &lt;- dplyr::tibble(
  step = integer(),
  order = integer(),
  expr = list(),
  text = character()
)


# Set the history for a historical object.
.set_hst &lt;- function(x, hst) {
  attr(x, .OBJ_HST_ATTR) &lt;- hst
  x
}

# Update the history with a list of filtration expressions.
.update_hst &lt;- function(x, exprs) {
  # Augment the history of a &quot;historical&quot; object.
  if (is_hst_obj(x)) {
    # Get the current history.
    hst &lt;- get_hst(x)
    
    # # ...and default if the history is missing.
    # if (is.null(hst)) {
    #   hst &lt;- .BLANK_OBJ_HST
    # }
    
    
    # Augment the history: format the new additions...
    next_cnd &lt;- exprs
    # next_cnd &lt;- sapply(next_cnd, as.expression, simplify = FALSE)
    next_txt &lt;- sapply(next_cnd, deparse, simplify = TRUE)
    next_ord &lt;- seq_along(next_cnd)
    
    if (length(exprs) == 0) {
      next_stp &lt;- integer()
    } else if (nrow(hst) == 0) {
      next_stp &lt;- 1
    } else {
      next_stp &lt;- max(hst$step) + 1
    }
    
    next_hst &lt;- dplyr::tibble(
      step = as.integer(next_stp),
      order = as.integer(next_ord),
      expr = as.list(next_cnd),
      text = as.character(next_txt)
    )
    
    # ...and append them to the existing history.
    hst &lt;- dplyr::bind_rows(hst, next_hst)
    
    
    # Update the history.
    x &lt;- .set_hst(x, hst = hst)
  }
  
  # Return the updated object.
  x
}

huangapple
  • 本文由 发表于 2023年5月10日 23:27:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76220216.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定