英文:
R package or function to record filters applied to your tibble
问题
I can translate the text you provided:
"是否存在任何R函数或包记录应用于tibble/data frame的操作?
例如,如果我执行以下操作:
data(iris)
my_table <- iris %>% filter(Sepal.Length > 6) %>% filter(Species == 'virginica')
我希望输出的格式类似于:
display_filter_function(my_table)
output:
Step filter
1 sepal.length > 6
2 Species == 'virginica'
我认为这与recipes包提供的功能类似,但不需要使用step_系列函数。"
英文:
Does there exist any R function or packages that records the operations applied to a tibble/data frame?
For example, if I did the following
data(iris)
my_table <- iris %>% filter(Sepal.Length>6) %>% filter(Species == 'virginica')
I would want the output to be something of the form
display_filter_function(my_table)
output:
Step filter
1 sepal.length > 6
2 Species == 'virginica'
I am thinking that this would be something similar to the functionality provided by the recipes package, but not needing to use the step_ family of function
答案1
得分: 4
I've translated the text you provided. Here it is:
"I've written a little module for you. It is a standalone resource and has only one dependency beyond base
R: namely dplyr
itself. The module is long, so I have put it at the bottom of this post. You can find the code itself under the Module section, and its usage is demonstrated under the Usage section.
This model could theoretically be extended to all dplyr
functions, and to other generic functions as well. To keep things manageable, I myself have implemented it for dplyr::filter()
alone.
Background:
This module leverages the R concept of generic methods, like print()
and format()
and mean()
and summary()
. Suppose you wish to print()
a data.frame
object. The generic print()
function...
print
#> function (x, ...)
#> UseMethod("print")
#> <bytecode: 0x000002429186e2c8>
#> <environment: namespace:base>
...does not do the work itself! Rather, it dispatches to some print.*()
method, via the line:
UseMethod("print")
Now the native data.frame
class has its own special print()
method called print.data.frame()
.
print.data.frame
#> function (x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE, max = NULL)
#> {
#> n <- length(row.names(x))
#> ⋮
#> invisible(x)
#> }
#> <bytecode: 0x000002429186b7e0>
#> <environment: namespace:base>
So when UseMethod()
seeks a matching ("print") method, it finds print.data.frame()
ready and waiting! It is the print.data.frame()
function that actually handles the printing for the data.frame
.
More generally, a generic function like fn()
...
fn <- function(x, ...) {
UseMethod("fn")
}
can be implemented for an S3 class like cls
, with a function of the form fn.cls()
:
fn.cls <- function(x, arg_1, arg_2, arg_3, ...) {
# ...
}
Note:
The fn.default()
method handles fn()
for unimplemented classes. So in the absence of a print.cls()
function, then UseMethod()
would dispatch a cls
object to print.default()
:
print.default
#> function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, width = NULL, useSource = TRUE, ...)
#> {
#> args <- pairlist(digits = digits, quote = quote, na.print = na.print, ...
#> ⋮
#> .Internal(print.default(x, args, missings))
#> }
#> <bytecode: 0x0000024291917b80>
#> <environment: namespace:base>
Approach:
By defining a custom S3 class called hst_obj
— "historical object" — I override the "generic" behavior of dplyr::filter()
...
dplyr::filter
#> function (.data, ..., .preserve = FALSE)
#> {
#> UseMethod("filter")
#> }
#> <bytecode: 0x0000024292d10b40>
#> <environment: namespace:dplyr>
...which is designed to dispatch via UseMethod("filter")
. To that end, I implement the function filter.hst_obj()
:
filter.hst_obj
#> function (.data, ..., .preserve = FALSE)
#> {
#> .update_hst(x = `class<-`(dplyr::filter(.data = un_hst_obj(.data, ...
#> }
#> <bytecode: 0x000002428f842958>
When you call dplyr::filter()
on a hst_obj
object, then filter.hst_obj()
jumps into action! Whenever it filters the object, it also records the filtration criteria in the special attribute obj_hst
, which maintains the "object history".
This history is a tibble
...
# A tibble: m × 4
step order expr text
<int> <int> <list> <chr>
1 1 1 <language> sepal.length > 6
⋮ ⋮ ⋮ ⋮ ⋮
...which has four columns:
-
step
: Thefilter()
step in the workflow. -
order
: The criterion within thefilter()
step. -
expr
: The actual code (language) for the criterion (sepal.length > 6
), useful for programmatic manipulation of R. -
text
: A textual (character) representation of that code ("sepal.length > 6"), for visual clarity.
Usage:
You'll want to load dplyr
itself, and then source()
the module (mod.R
) from your working directory.
# Load the dplyr package...
library(dplyr)
# ...along with the hst_obj functions from the module:
source("./mod.R")
Warning:
The modular function filter.hst_obj()
must be loaded into the same workspace where you use dplyr::filter()
. Per the documentation:
UseMethod...searches for methods in two places: in the environment in which the generic function is called, and in the registration database for the environment in which the generic is defined (typically a namespace). So methods for a generic function need to be available in the environment of the call to the generic, or they must be registered.
Here is a simple workflow on the iris
dataset:
iris %>%
filter(Sepal.Length > 7, Sepal.Width <= 3) %>%
filter(Petal.Width > 2)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 7.1 3.0 5.9 2.1 virginica
#> 2 7.6 3.0 6.6 2.1 virginica
#> 3 7.7 2.6 6.9 2.3 virginica
#> 4 7.7 3.0 6.1 2.3 virginica
Now we transform the dataset into a "historical object" called iris_hst
, via as_hst_obj()
:
iris_hst <- as_hst_obj(iris)
Per is_hst_obj()
, it is indeed a historical object:
iris_hst %>% is_hst_obj()
#> TRUE
However, its history
英文:
I've written a little module for you. It is a standalone resource and has only one dependency beyond base
R: namely dplyr
itself. The module is long, so I have put it at the bottom of this post. You can find the code itself under the Module section, and its usage is demonstrated under the Usage section.
This model could theoretically be extended to all dplyr
functions, and to other generic functions as well. To keep things manageable, I myself have implemented it for dplyr::filter()
alone.
Background
This module leverages the R concept of generic methods, like print()
and format()
and mean()
and summary()
. Suppose you wish to print()
a data.frame
object. The generic print()
function...
print
#> function (x, ...)
#> UseMethod("print")
#> <bytecode: 0x000002429186e2c8>
#> <environment: namespace:base>
...does not do the work itself! Rather, it dispatches to some print.*()
method, via the line:
UseMethod("print")
Now the native data.frame
class has its own special print()
method called print.data.frame()
.
print.data.frame
#> function (x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE, max = NULL)
#> {
#> n <- length(row.names(x))
#> ⋮
#> invisible(x)
#> }
#> <bytecode: 0x000002429186b7e0>
#> <environment: namespace:base>
So when UseMethod()
seeks a matching ("print"
) method, it finds print.data.frame()
ready and waiting! It is the print.data.frame()
function that actually handles the printing for the data.frame
.
More generally, a generic function like fn()
...
fn <- function(x, ...) {
UseMethod("fn")
}
can be implemented for a (S3) class like cls
, with a function of the form fn.cls()
:
fn.cls <- function(x, arg_1, arg_2, arg_3, ...) {
# ...
}
Note
The fn.default()
method handles fn()
for unimplemented classes. So in the absence of a print.cls()
function, then UseMethod()
would dispatch a cls
object to print.default()
:
print.default
#> function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, right = FALSE, max = NULL, width = NULL, useSource = TRUE, ...)
#> {
#> args <- pairlist(digits = digits, quote = quote, na.print = na.print, ...
#> ⋮
#> .Internal(print.default(x, args, missings))
#> }
#> <bytecode: 0x0000024291917b80>
#> <environment: namespace:base>
Approach
By defining a custom S3 class called hst_obj
— "historical object" — I override the "generic" behavior of dplyr::filter()
...
dplyr::filter
#> function (.data, ..., .preserve = FALSE)
#> {
#> UseMethod("filter")
#> }
#> <bytecode: 0x0000024292d10b40>
#> <environment: namespace:dplyr>
...which is designed to dispatch via UseMethod("filter")
. To that end, I implement the function filter.hst_obj()
:
filter.hst_obj
#> function (.data, ..., .preserve = FALSE)
#> {
#> .update_hst(x = `class<-`(dplyr::filter(.data = un_hst_obj(.data, ...
#> }
#> <bytecode: 0x000002428f842958>
When you call dplyr::filter()
on a hst_obj
object, then filter.hst_obj()
jumps into action! Whenever it filters the object, it also records the filtration criteria in the special attribute obj_hst
, which maintains the "object history".
This history is a tibble
...
# A tibble: m × 4
step order expr text
<int> <int> <list> <chr>
1 1 1 <language> sepal.length > 6
⋮ ⋮ ⋮ ⋮ ⋮
...which has four columns:
step
: Thefilter()
step in the workflow.
iris %>% # step
filter(Sepal.Length > 6) %>% # } 1
filter(Species == 'virginica') %>% # } 2
... # ⋮
order
: The criterion within thefilter()
step.
filter(a < 10, b == 3 | c > 5, ...)
# |----| |------------|
# order: 1 2 ...
expr
: The actual code (language
) for the criterion (Sepal.Length > 6
), useful for programmatic manipulation of R.text
: A textual (character
) representation of that code ("Sepal.Length > 6"
), for visual clarity.
Usage
You'll want to load dplyr
itself, and then source()
the module (mod.R
) from (say) your working directory.
# Load the `dplyr` package...
library(dplyr)
# ...along with the `hst_obj` functions from the module:
source("./mod.R")
Warning
The modular function filter.hst_obj()
must be loaded into the same workspace where you use dplyr::filter()
. Per the documentation
> UseMethod
...search[es] for methods in two places: in the environment in which the generic function is called, and in the registration data base for the environment in which the generic is defined (typically a namespace). So methods for a generic function need to be available in the environment of the call to the generic, or they must be registered.
Here is a simple workflow on the iris
dataset.
iris %>%
filter(Sepal.Length > 7, Sepal.Width <= 3) %>%
filter(Petal.Width > 2)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 7.1 3.0 5.9 2.1 virginica
#> 2 7.6 3.0 6.6 2.1 virginica
#> 3 7.7 2.6 6.9 2.3 virginica
#> 4 7.7 3.0 6.1 2.3 virginica
Now we transform the dataset into a "historical object" called iris_hst
, via as_hst_obj()
.
iris_hst <- as_hst_obj(iris)
Per is_hst_obj()
, it is indeed a historical object.
iris_hst %>% is_hst_obj()
#> TRUE
However, its history via get_hst()
is still blank.
iris_hst %>% get_hst()
#> # A tibble: 0 × 4
#> # … with 4 variables: step <int>, order <int>, expr <list>, text <chr>
We now perform the same workflow on the historical dataset iris_hst
...
iris_hst <- iris_hst %>%
filter(Sepal.Length > 7, Sepal.Width <= 3) %>%
filter(Petal.Width > 2)
...which yields a consistent output.
iris_hst
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 7.1 3.0 5.9 2.1 virginica
#> 2 7.6 3.0 6.6 2.1 virginica
#> 3 7.7 2.6 6.9 2.3 virginica
#> 4 7.7 3.0 6.1 2.3 virginica
Crucially, we can now access the history via get_hst()
:
iris_hst %>% get_hst()
#> # A tibble: 3 × 4
#> step order expr text
#> <int> <int> <list> <chr>
#> 1 1 1 <language> Sepal.Length > 7
#> 2 1 2 <language> Sepal.Width <= 3
#> 3 2 1 <language> Petal.Width > 2
We can also "reset" the history via reset_hst()
, which clears the tibble
of historical data.
iris_hst <- iris_hst %>% reset_hst()
iris_hst %>% get_hst()
#> # A tibble: 0 × 4
#> # … with 4 variables: step <int>, order <int>, expr <list>, text <chr>
Finally, we can revert to an "unhistorical" object via un_hst_obj()
, which removes the hst_obj
classification and deletes the obj_hst
attribute:
iris_unhst <- iris_hst %>% un_hst_obj()
# It is no longer a "historical" object...
iris_unhst %>% is_hst_obj()
#> FALSE
# ...and the history is nonexistent (not merely blank) entirely.
iris_unhst %>% get_hst()
#>
Module
Here is the module. I recommend saving it locally, as (say) mod.R
in (say) your working directory. I also recommend the box
package, which can load such modules painlessly via box::use
(./mod)
.
#########
## API ##
#########
# Test if an object is "historical object" whose filtrations are recorded.
is_hst_obj <- function(x) {
inherits(x, .HST_OBJ_CLASS)
}
# Treat an object as "historical".
as_hst_obj <- function(x) {
if (!is_hst_obj(x)) {
class(x) <- c(.HST_OBJ_CLASS, class(x))
}
x
}
# Erase the "historicity" of an object.
un_hst_obj <- function(x, hst = TRUE) {
if (is_hst_obj(x)) {
org_class <- class(x)
class(x) <- org_class[org_class != .HST_OBJ_CLASS]
if (isTRUE(hst)) {
x <- .set_hst(x, hst = NULL)
}
}
x
}
# Get the history from a historical object.
get_hst <- function(x) {
hst <- attr(x, .OBJ_HST_ATTR)
if (is.null(hst)) {
if (is_hst_obj(x)) {
.BLANK_OBJ_HST
# NULL
} else {
invisible(NULL)
}
} else {
hst
}
}
# Reset the history for a historical object.
reset_hst <- function(x) {
if (is_hst_obj(x)) {
x <- .set_hst(x, hst = NULL)
}
x
}
##############
## Dispatch ##
##############
# Dispatch filtration for historical objects.
filter.hst_obj <- evalq(envir = new.env(), {
# Define the filtration function: `dplyr::filter()`
fn_expr <- quote(dplyr::filter)
# ^^^^^^^^^^^^^
# UPDATE HERE
fn <- eval(fn_expr)
# Replicate in our result the signature of that original function.
arg_syms <- as.list(args(fn))
arg_syms <- utils::head(arg_syms, n = -1)
arg_syms <- sapply(names(arg_syms), as.symbol, USE.NAMES = TRUE)
# Prepare the elements for the function body...
obj_sym <- arg_syms[[1]] # The (1st) argument (.data) for the object...
cnd_exprs <- arg_syms$... # ...and dots (...) for filtration condition(s).
# ...including a similar call to the filter with an "ahistorical" object...
arg_syms[[as.character(obj_sym)]] <- substitute(un_hst_obj(
obj_sym,
hst = FALSE
))
fn_call <- as.call(c(list(fn_expr), arg_syms))
sub_list <- list(
obj = obj_sym,
cnd = cnd_exprs,
cll = fn_call
)
# ...and assemble those elements.
fn_body <- substitute(env = sub_list, quote({
.update_hst(
# Perform the unclassed call and then restore any "historicity"...
x = `class<-`(cll, class(obj)),
# ...and then update the history with the filtration criteria.
exprs = match.call(expand.dots = FALSE)$cnd
)
}))
# Pair this body with the header from `dplyr::filter()`...
fn_body <- eval(fn_body)
body(fn) <- fn_body
# ...and transfer the resulting function to the calling environment.
environment(fn) <- parent.frame(n = 2)
# Return the resulting function.
fn
})
#############
## Support ##
#############
# Labels for the object class...
.HST_OBJ_CLASS <- "hst_obj"
# ...and its history attribute.
.OBJ_HST_ATTR <- "obj_hst"
# The default history for an object.
.BLANK_OBJ_HST <- dplyr::tibble(
step = integer(),
order = integer(),
expr = list(),
text = character()
)
# Set the history for a historical object.
.set_hst <- function(x, hst) {
attr(x, .OBJ_HST_ATTR) <- hst
x
}
# Update the history with a list of filtration expressions.
.update_hst <- function(x, exprs) {
# Augment the history of a "historical" object.
if (is_hst_obj(x)) {
# Get the current history.
hst <- get_hst(x)
# # ...and default if the history is missing.
# if (is.null(hst)) {
# hst <- .BLANK_OBJ_HST
# }
# Augment the history: format the new additions...
next_cnd <- exprs
# next_cnd <- sapply(next_cnd, as.expression, simplify = FALSE)
next_txt <- sapply(next_cnd, deparse, simplify = TRUE)
next_ord <- seq_along(next_cnd)
if (length(exprs) == 0) {
next_stp <- integer()
} else if (nrow(hst) == 0) {
next_stp <- 1
} else {
next_stp <- max(hst$step) + 1
}
next_hst <- dplyr::tibble(
step = as.integer(next_stp),
order = as.integer(next_ord),
expr = as.list(next_cnd),
text = as.character(next_txt)
)
# ...and append them to the existing history.
hst <- dplyr::bind_rows(hst, next_hst)
# Update the history.
x <- .set_hst(x, hst = hst)
}
# Return the updated object.
x
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论