英文:
How do I conditionally provide S3 methods for S3 generics from another package?
问题
I'm making a package for data manipulation that uses some other libraries under the hood. Let's say that my data always has a class "custom" and that I have a function custom_select()
to select some columns.
I would like my package to have few dependencies but also a similar syntax as functions from dplyr
. Because several dplyr
functions are generics, I can use the same function names for a different input type. In my situation, I could make a method select.custom()
so that the user can either pass a data.frame
or a custom
object to select()
and both would work.
Now from my understanding, this requires putting dplyr
in Imports
because I need to have access to its select()
generic. I'd like to avoid doing this because I want to limit the number of hard dependencies.
The scenario I have in mind is:
- the user already loads
dplyr
anyway, then they can useselect()
with the data of classcustom
and it should work - the user doesn't have
dplyr
installed/loaded, and I don't want to force them to have it, so they can use the functioncustom_select()
instead.
Ideally, I'd like to put dplyr
in Suggests
so that it's not strictly necessary but it adds something if the user has it.
Example
custom.R
:
#' @export
#' @importFrom dplyr select
custom_select <- function(data, select) {
print("Hello, world!")
}
#' @export
select.custom <- custom_select
NAMESPACE
:
# Generated by roxygen2: do not edit by hand
export(custom_select)
export(select.custom)
importFrom(dplyr,select)
R CMD check errors if I don't put dplyr
in Imports
and putting it in Suggests
also doesn't work (same error for both cases):
❯ checking package dependencies ... ERROR
Namespace dependency missing from DESCRIPTION Imports/Depends entries: 'dplyr'
In summary, is there a way to keep dplyr
out of hard dependencies while still providing methods for dplyr
's generics if it is available?
Edit: I tried @VonC's answer but couldn't make it work. In the example below, dplyr
is loaded before my custom package so select.custom()
should be available but isn't:
library(dplyr, warn.conflicts = FALSE)
library(custompackage)
foo <- letters
class(foo) <- "custom"
custom_select(foo)
#> [1] "Hello, world!"
select(foo)
#> Error in UseMethod("select"): no applicable method for 'select' applied to an object of class "custom"
Here are the important files:
custom.R
#' @export
custom_select <- function(data, select) {
print("Hello, world!")
}
if (requireNamespace("dplyr", quietly = TRUE)) {
select.custom <- function(data, select) {
custom_select(data, select)
}
utils::globalVariables("select.custom")
}
NAMESPACE
# Generated by roxygen2: do not edit by hand
export(custom_select)
DESCRIPTION
(no Imports
)
[...]
Suggests:
dplyr
英文:
I'm making a package for data manipulation that uses some other libraries under the hood. Let's say that my data always has a class "custom"
and that I have a function custom_select()
to select some columns.
I would like my package to have few dependencies but also a similar syntax as functions from dplyr
. Because several dplyr
functions are generics, I can use the same function names for a different input type. In my situation, I could make a method select.custom()
so that the user can either pass a data.frame
or a custom
object to select()
and both would work.
Now from my understanding, this requires putting dplyr
in Imports
because I need to have access to its select()
generic. I'd like to avoid doing this because I want to limit the number of hard dependencies.
The scenario I have in mind is:
- the user already loads
dplyr
anyway, then they can useselect()
with the data of classcustom
and it should work - the user doesn't have
dplyr
installed/loaded, and I don't want to force them to have it, so they can use the functioncustom_select()
instead.
Ideally, I'd like to put dplyr
in Suggests
so that it's not strictly necessary but it adds something if the user has it.
Example
custom.R
:
#' @export
#' @importFrom dplyr select
custom_select <- function(data, select) {
print("Hello, world!")
}
#' @export
select.custom <- custom_select
NAMESPACE
:
# Generated by roxygen2: do not edit by hand
export(custom_select)
export(select.custom)
importFrom(dplyr,select)
R CMD check errors if I don't put dplyr
in Imports
and putting it in Suggests
also doesn't work (same error for both cases):
❯ checking package dependencies ... ERROR
Namespace dependency missing from DESCRIPTION Imports/Depends entries: 'dplyr'
In summary, is there a way to keep dplyr
out of hard dependencies while still providing methods for dplyr
's generics if it is available?
Edit: I tried @VonC's answer but couldn't make it work. In the example below, dplyr
is loaded before my custom package so select.custom()
should be available but isn't:
library(dplyr, warn.conflicts = FALSE)
library(custompackage)
foo <- letters
class(foo) <- "custom"
custom_select(foo)
#> [1] "Hello, world!"
select(foo)
#> Error in UseMethod("select"): no applicable method for 'select' applied to an object of class "custom"
Here are the important files:
custom.R
#' @export
custom_select <- function(data, select) {
print("Hello, world!")
}
if (requireNamespace("dplyr", quietly = TRUE)) {
select.custom <- function(data, select) {
custom_select(data, select)
}
utils::globalVariables("select.custom")
}
NAMESPACE
# Generated by roxygen2: do not edit by hand
export(custom_select)
DESCRIPTION
(no Imports
)
[...]
Suggests:
dplyr
答案1
得分: 5
你需要将 **dplyr** 放在 `Enhances` 中,并使用 `.onLoad` 来有条件地在 **dplyr** 命名空间中注册你的方法,具体取决于是否在加载时安装了 **dplyr**。
```lang-r
nm <- package <- "TestPackage"
dir.create(file.path(package, "R"), recursive = TRUE)
dir.create(file.path(package, "man"), recursive = TRUE)
dir.create(file.path(package, "tests"), recursive = TRUE)
cat(file = file.path(package, "DESCRIPTION"), "
Package: TestPackage
Version: 0.0-0
License: GPL (>= 2)
Description: A (one paragraph) description of what
the package does and why it may be useful.
Title: My First Collection of Functions
Author: First Last [aut, cre]
Maintainer: First Last <First.Last@some.domain.net>
Enhances: dplyr
")
cat(file = file.path(package, "NAMESPACE"), "
export(selectDotZzz)
")
cat(file = file.path(package, "R", paste0(nm, ".R")), "
selectDotZzz <- function(.data, ...) 0
.onLoad <- function(libname, pkgname) {
if(requireNamespace(\"dplyr\", quietly = TRUE))
registerS3method(\"select\", \"zzz\", selectDotZzz,
envir = asNamespace(\"dplyr\"))
}
")
cat(file = file.path(package, "man", paste0(nm, ".Rd")), "
\\name{whatever}
\\alias{selectDotZzz}
\\title{whatever}
\\description{whatever}
")
cat(file = file.path(package, "tests", paste0(nm, ".R")),
sprintf("library(%s)", nm))
cat(file = file.path(package, "tests", paste0(nm, ".R")), append = TRUE, "
if(requireNamespace(\"dplyr\", quietly = TRUE))
stopifnot(identical(dplyr::select(structure(0, class = \"zzz\")), 0))
")
getRversion()
packageVersion("dplyr")
tools:::Rcmd(c("build", package))
tools:::Rcmd(c("check", Sys.glob(paste0(nm, "_*.tar.gz"))))
unlink(Sys.glob(paste0(nm, "*")), recursive = TRUE)
相关输出:
> getRversion()
[1] '4.3.1'
> packageVersion("dplyr")
[1] '1.1.2'
> tools:::Rcmd(c("build", package))
* checking for file 'TestPackage/DESCRIPTION' ... OK
* preparing 'TestPackage':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'TestPackage_0.0-0.tar.gz'
> tools:::Rcmd(c("check", Sys.glob(paste0(nm, "_*.tar.gz"))))
* using log directory '/Users/mikael/Desktop/R-experiments/codetools/TestPackage.Rcheck'
* using R version 4.3.1 Patched (2023-06-19 r84580)
* using platform: aarch64-apple-darwin22.5.0 (64-bit)
* R was compiled by
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
GNU Fortran (GCC) 12.2.0
* running under: macOS Ventura 13.4
* using session charset: UTF-8
* checking for file 'TestPackage/DESCRIPTION' ... OK
* this is package 'TestPackage' version '0.0-0'
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package 'TestPackage' can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking startup messages can be suppressed ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... NONE
* checking for unstated dependencies in 'tests' ... OK
* checking tests ...
Running ‘TestPackage.R’
OK
* checking PDF version of manual ... OK
* DONE
Status: OK
英文:
You need to put dplyr in Enhances
and use .onLoad
to conditionally register your method in the dplyr namespace, depending on whether dplyr is installed at load time.
nm <- package <- "TestPackage"
dir.create(file.path(package, "R"), recursive = TRUE)
dir.create(file.path(package, "man"), recursive = TRUE)
dir.create(file.path(package, "tests"), recursive = TRUE)
cat(file = file.path(package, "DESCRIPTION"), "
Package: TestPackage
Version: 0.0-0
License: GPL (>= 2)
Description: A (one paragraph) description of what
the package does and why it may be useful.
Title: My First Collection of Functions
Author: First Last [aut, cre]
Maintainer: First Last <First.Last@some.domain.net>
Enhances: dplyr
")
cat(file = file.path(package, "NAMESPACE"), "
export(selectDotZzz)
")
cat(file = file.path(package, "R", paste0(nm, ".R")), "
selectDotZzz <- function(.data, ...) 0
.onLoad <- function(libname, pkgname) {
if(requireNamespace(\"dplyr\", quietly = TRUE))
registerS3method(\"select\", \"zzz\", selectDotZzz,
envir = asNamespace(\"dplyr\"))
}
")
cat(file = file.path(package, "man", paste0(nm, ".Rd")), "
\\name{whatever}
\\alias{selectDotZzz}
\\title{whatever}
\\description{whatever}
")
cat(file = file.path(package, "tests", paste0(nm, ".R")),
sprintf("library(%s)", nm))
cat(file = file.path(package, "tests", paste0(nm, ".R")), append = TRUE, "
if(requireNamespace(\"dplyr\", quietly = TRUE))
stopifnot(identical(dplyr::select(structure(0, class = \"zzz\")), 0))
")
getRversion()
packageVersion("dplyr")
tools:::Rcmd(c("build", package))
tools:::Rcmd(c("check", Sys.glob(paste0(nm, "_*.tar.gz"))))
unlink(Sys.glob(paste0(nm, "*")), recursive = TRUE)
The relevant output:
> getRversion()
[1] '4.3.1'
> packageVersion("dplyr")
[1] '1.1.2'
> tools:::Rcmd(c("build", package))
* checking for file 'TestPackage/DESCRIPTION' ... OK
* preparing 'TestPackage':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'TestPackage_0.0-0.tar.gz'
> tools:::Rcmd(c("check", Sys.glob(paste0(nm, "_*.tar.gz"))))
* using log directory '/Users/mikael/Desktop/R-experiments/codetools/TestPackage.Rcheck'
* using R version 4.3.1 Patched (2023-06-19 r84580)
* using platform: aarch64-apple-darwin22.5.0 (64-bit)
* R was compiled by
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
GNU Fortran (GCC) 12.2.0
* running under: macOS Ventura 13.4
* using session charset: UTF-8
* checking for file 'TestPackage/DESCRIPTION' ... OK
* this is package 'TestPackage' version '0.0-0'
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package 'TestPackage' can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking startup messages can be suppressed ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... NONE
* checking for unstated dependencies in 'tests' ... OK
* checking tests ...
Running ‘TestPackage.R’
OK
* checking PDF version of manual ... OK
* DONE
Status: OK
答案2
得分: 1
您可以通过在检查包是否可用的if语句内定义您的方法,并使用utils::globalVariables()
来避免R CMD检查关于未定义全局函数或变量的提示来实现这一点。
修改另一个包的命名空间通常是不允许的根据CRAN政策,并且尝试直接分配一个函数给dplyr::select.custom
可能不会被接受。
因此,思路是将该方法分配给全局环境中的select.custom
,这将允许它作为通用的select()
函数的方法来使用,如果dplyr
已加载。关键点在于您不直接修改dplyr
的命名空间。
您需要调整您的custom.R
文件:
# 自定义select函数
#
# @export
custom_select <- function(data, select) {
print("Hello, world!")
}
if (requireNamespace("dplyr", quietly = TRUE)) {
select.custom <- function(data, select) {
custom_select(data, select)
}
utils::globalVariables("select.custom")
}
对于NAMESPACE
文件,您可以避免从dplyr
导入任何内容:
# 由roxygen2生成:不要手动编辑
export(custom_select)
在您的DESCRIPTION
文件中,您应该在Suggests
下列出dplyr
:
Suggests:
dplyr
这样,如果用户加载了dplyr
,select.custom
方法将可用。如果未加载dplyr
,用户仍然可以使用custom_select()
函数。这种方法将dplyr
排除在硬依赖项之外,同时仍然提供了dplyr
通用方法(如果可用)。
这将在您的软件包的命名空间中创建一个名为select.custom
的函数,该函数将在加载dplyr
软件包时用作select()
的方法。请注意,这不会修改dplyr
的命名空间本身。
此外,建议文档化select.custom
功能仅在安装并加载了dplyr
时可用,并且dplyr
是一个建议的,而不是必需的依赖项。
在您的DESCRIPTION
文件中,您应该在Suggests
下列出dplyr
(链接):
Suggests:
dplyr
这种方法应该符合CRAN政策,因为您没有修改另一个软件包的命名空间,并且根据dplyr
软件包的可用性有条件地定义了方法。
话虽如此,Mikael Jagan在评论中指出:
> 警告:如果您在安装时根据建议的包是否可用而运行的事情上要非常小心——这包括R代码文件中的顶级代码、.onLoad
函数和S4类和方法的定义。
>
> 问题在于,一旦建议包的命名空间被加载,对它的引用可能会被捕获在安装的软件包中(最常见的是在S4方法中),但是当使用安装的软件包时,建议的软件包可能不可用(尤其是对于不同计算机上的二进制软件包可能如此)。
>
> 更糟糕的是,问题可能不仅仅局限于您的软件包,因为导入您的任何软件包时都会加载建议的软件包的命名空间,因此也可能在那里被捕获。
英文:
You can do this by defining your method within an if statement that checks if the package is available, and using utils::globalVariables()
to avoid R CMD check notes about undefined global functions or variables.
Altering the namespace of another package is generally not allowed according to CRAN policies, and trying to assign a function directly to dplyr::select.custom
would likely not be acceptable.
The idea is therefore to assign the method to select.custom
in the global environment, which will allow it to be used as a method for the generic select()
function if dplyr
is loaded. The key point is that you are not modifying the dplyr
namespace directly.
You would need to adjust your custom.R
file:
#' Custom select function
#'
#' @export
custom_select <- function(data, select) {
print("Hello, world!")
}
if (requireNamespace("dplyr", quietly = TRUE)) {
select.custom <- function(data, select) {
custom_select(data, select)
}
utils::globalVariables("select.custom")
}
And for the NAMESPACE
file, you can avoid importing anything from dplyr
:
# Generated by roxygen2: do not edit by hand
export(custom_select)
In your DESCRIPTION
file, you should list dplyr
under Suggests
:
Suggests:
dplyr
This way, if the user has dplyr
loaded, the select.custom
method will be available. If dplyr
is not loaded, the user can still use the custom_select()
function. This approach keeps dplyr
out of the hard dependencies while still providing methods for dplyr
's generics if it is available.
That will create a function select.custom
in your package's namespace, which will be used as a method for select()
when the dplyr
package is loaded. Note that this does not modify the dplyr
namespace itself.
Also, it is a good idea to document that the select.custom
functionality is only available if dplyr
is installed and loaded, and that dplyr
is a suggested, not required, dependency.
In your DESCRIPTION
file, you should list dplyr
under Suggests
:
Suggests:
dplyr
This approach should be compliant with CRAN policies, as you are not altering another package's namespace and are conditionally defining methods based on the availability of the dplyr
package.
That being said, Mikael Jagan points out in the comments to "Writing R Extensions / Creating R packages / Package structure / Package Dependencies / Suggested packages"
> WARNING: Be extremely careful if you do things which would be run at installation time depending on whether suggested packages are available or not—this includes top-level code in R code files, .onLoad
functions and the definitions of S4 classes and methods.
>
> The problem is that once a namespace of a suggested package is loaded, references to it may be captured in the installed package (most commonly in S4 methods), but the suggested package may not be available when the installed package is used (which especially for binary packages might be on a different machine).
>
> Even worse, the problems might not be confined to your package, for the namespaces of your suggested packages will also be loaded whenever any package which imports yours is installed and so may be captured there.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论