如何从另一个包中有条件地为S3通用函数提供S3方法?

huangapple go评论68阅读模式
英文:

How do I conditionally provide S3 methods for S3 generics from another package?

问题

I'm making a package for data manipulation that uses some other libraries under the hood. Let's say that my data always has a class "custom" and that I have a function custom_select() to select some columns.

I would like my package to have few dependencies but also a similar syntax as functions from dplyr. Because several dplyr functions are generics, I can use the same function names for a different input type. In my situation, I could make a method select.custom() so that the user can either pass a data.frame or a custom object to select() and both would work.

Now from my understanding, this requires putting dplyr in Imports because I need to have access to its select() generic. I'd like to avoid doing this because I want to limit the number of hard dependencies.

The scenario I have in mind is:

  • the user already loads dplyr anyway, then they can use select() with the data of class custom and it should work
  • the user doesn't have dplyr installed/loaded, and I don't want to force them to have it, so they can use the function custom_select() instead.

Ideally, I'd like to put dplyr in Suggests so that it's not strictly necessary but it adds something if the user has it.


Example

custom.R:

#' @export
#' @importFrom dplyr select
custom_select <- function(data, select) {
  print("Hello, world!")
}

#' @export
select.custom <- custom_select

NAMESPACE:

# Generated by roxygen2: do not edit by hand

export(custom_select)
export(select.custom)
importFrom(dplyr,select)

R CMD check errors if I don't put dplyr in Imports and putting it in Suggests also doesn't work (same error for both cases):

❯ checking package dependencies ... ERROR
  Namespace dependency missing from DESCRIPTION Imports/Depends entries: 'dplyr'

In summary, is there a way to keep dplyr out of hard dependencies while still providing methods for dplyr's generics if it is available?


Edit: I tried @VonC's answer but couldn't make it work. In the example below, dplyr is loaded before my custom package so select.custom() should be available but isn't:

library(dplyr, warn.conflicts = FALSE)
library(custompackage)

foo <- letters
class(foo) <- "custom"

custom_select(foo)
#> [1] "Hello, world!"
select(foo)
#> Error in UseMethod("select"): no applicable method for 'select' applied to an object of class "custom"

Here are the important files:

custom.R

#' @export
custom_select <- function(data, select) {
  print("Hello, world!")
}

if (requireNamespace("dplyr", quietly = TRUE)) {
  select.custom <- function(data, select) {
    custom_select(data, select)
  }
  utils::globalVariables("select.custom")
}

NAMESPACE

# Generated by roxygen2: do not edit by hand

export(custom_select)

DESCRIPTION (no Imports)

[...]
Suggests:
  dplyr
英文:

I'm making a package for data manipulation that uses some other libraries under the hood. Let's say that my data always has a class &quot;custom&quot; and that I have a function custom_select() to select some columns.

I would like my package to have few dependencies but also a similar syntax as functions from dplyr. Because several dplyr functions are generics, I can use the same function names for a different input type. In my situation, I could make a method select.custom() so that the user can either pass a data.frame or a custom object to select() and both would work.

Now from my understanding, this requires putting dplyr in Imports because I need to have access to its select() generic. I'd like to avoid doing this because I want to limit the number of hard dependencies.

The scenario I have in mind is:

  • the user already loads dplyr anyway, then they can use select() with the data of class custom and it should work
  • the user doesn't have dplyr installed/loaded, and I don't want to force them to have it, so they can use the function custom_select() instead.

Ideally, I'd like to put dplyr in Suggests so that it's not strictly necessary but it adds something if the user has it.


Example

custom.R:

#&#39; @export
#&#39; @importFrom dplyr select
custom_select &lt;- function(data, select) {
  print(&quot;Hello, world!&quot;)
}

#&#39; @export
select.custom &lt;- custom_select

NAMESPACE:

# Generated by roxygen2: do not edit by hand

export(custom_select)
export(select.custom)
importFrom(dplyr,select)

R CMD check errors if I don't put dplyr in Imports and putting it in Suggests also doesn't work (same error for both cases):

❯ checking package dependencies ... ERROR
  Namespace dependency missing from DESCRIPTION Imports/Depends entries: &#39;dplyr&#39;

In summary, is there a way to keep dplyr out of hard dependencies while still providing methods for dplyr's generics if it is available?



Edit: I tried @VonC's answer but couldn't make it work. In the example below, dplyr is loaded before my custom package so select.custom() should be available but isn't:

library(dplyr, warn.conflicts = FALSE)
library(custompackage)

foo &lt;- letters
class(foo) &lt;- &quot;custom&quot;

custom_select(foo)
#&gt; [1] &quot;Hello, world!&quot;
select(foo)
#&gt; Error in UseMethod(&quot;select&quot;): no applicable method for &#39;select&#39; applied to an object of class &quot;custom&quot;

Here are the important files:

custom.R

#&#39; @export
custom_select &lt;- function(data, select) {
  print(&quot;Hello, world!&quot;)
}

if (requireNamespace(&quot;dplyr&quot;, quietly = TRUE)) {
  select.custom &lt;- function(data, select) {
    custom_select(data, select)
  }
  utils::globalVariables(&quot;select.custom&quot;)
}

NAMESPACE

# Generated by roxygen2: do not edit by hand

export(custom_select)

DESCRIPTION (no Imports)

[...]
Suggests:
  dplyr

答案1

得分: 5

你需要将 **dplyr** 放在 `Enhances` 中,并使用 `.onLoad` 来有条件地在 **dplyr** 命名空间中注册你的方法,具体取决于是否在加载时安装了 **dplyr**。

```lang-r
nm <- package <- "TestPackage"
dir.create(file.path(package, "R"), recursive = TRUE)
dir.create(file.path(package, "man"), recursive = TRUE)
dir.create(file.path(package, "tests"), recursive = TRUE)

cat(file = file.path(package, "DESCRIPTION"), "
Package: TestPackage
Version: 0.0-0
License: GPL (>= 2)
Description: A (one paragraph) description of what
  the package does and why it may be useful.
Title: My First Collection of Functions
Author: First Last [aut, cre]
Maintainer: First Last <First.Last@some.domain.net>
Enhances: dplyr
")

cat(file = file.path(package, "NAMESPACE"), "
export(selectDotZzz)
")

cat(file = file.path(package, "R", paste0(nm, ".R")), "
selectDotZzz <- function(.data, ...) 0
.onLoad <- function(libname, pkgname) {
    if(requireNamespace(\"dplyr\", quietly = TRUE))
        registerS3method(\"select\", \"zzz\", selectDotZzz,
                         envir = asNamespace(\"dplyr\"))
}
")

cat(file = file.path(package, "man", paste0(nm, ".Rd")), "
\\name{whatever}
\\alias{selectDotZzz}
\\title{whatever}
\\description{whatever}
")

cat(file = file.path(package, "tests", paste0(nm, ".R")),
    sprintf("library(%s)", nm))
cat(file = file.path(package, "tests", paste0(nm, ".R")), append = TRUE, "
if(requireNamespace(\"dplyr\", quietly = TRUE))
    stopifnot(identical(dplyr::select(structure(0, class = \"zzz\")), 0))
")

getRversion()
packageVersion("dplyr")
tools:::Rcmd(c("build", package))
tools:::Rcmd(c("check", Sys.glob(paste0(nm, "_*.tar.gz"))))

unlink(Sys.glob(paste0(nm, "*")), recursive = TRUE)

相关输出:

> getRversion()
[1] '4.3.1'
> packageVersion("dplyr")
[1] '1.1.2'
> tools:::Rcmd(c("build", package))
* checking for file 'TestPackage/DESCRIPTION' ... OK
* preparing 'TestPackage':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'TestPackage_0.0-0.tar.gz'
> tools:::Rcmd(c("check", Sys.glob(paste0(nm, "_*.tar.gz"))))
* using log directory '/Users/mikael/Desktop/R-experiments/codetools/TestPackage.Rcheck'
* using R version 4.3.1 Patched (2023-06-19 r84580)
* using platform: aarch64-apple-darwin22.5.0 (64-bit)
* R was compiled by
    Apple clang version 14.0.3 (clang-1403.0.22.14.1)
    GNU Fortran (GCC) 12.2.0
* running under: macOS Ventura 13.4
* using session charset: UTF-8
* checking for file 'TestPackage/DESCRIPTION' ... OK
* this is package 'TestPackage' version '0.0-0'
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package 'TestPackage' can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking startup messages can be suppressed ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... NONE
* checking for unstated dependencies in 'tests' ... OK
* checking tests ...
  Running ‘TestPackage.R’
 OK
* checking PDF version of manual ... OK
* DONE

Status: OK
英文:

You need to put dplyr in Enhances and use .onLoad to conditionally register your method in the dplyr namespace, depending on whether dplyr is installed at load time.

nm &lt;- package &lt;- &quot;TestPackage&quot;
dir.create(file.path(package,     &quot;R&quot;), recursive = TRUE)
dir.create(file.path(package,   &quot;man&quot;), recursive = TRUE)
dir.create(file.path(package, &quot;tests&quot;), recursive = TRUE)

cat(file = file.path(package, &quot;DESCRIPTION&quot;), &quot;
Package: TestPackage
Version: 0.0-0
License: GPL (&gt;= 2)
Description: A (one paragraph) description of what
  the package does and why it may be useful.
Title: My First Collection of Functions
Author: First Last [aut, cre]
Maintainer: First Last &lt;First.Last@some.domain.net&gt;
Enhances: dplyr
&quot;)

cat(file = file.path(package, &quot;NAMESPACE&quot;), &quot;
export(selectDotZzz)
&quot;)

cat(file = file.path(package, &quot;R&quot;, paste0(nm, &quot;.R&quot;)), &quot;
selectDotZzz &lt;- function(.data, ...) 0
.onLoad &lt;- function(libname, pkgname) {
    if(requireNamespace(\&quot;dplyr\&quot;, quietly = TRUE))
        registerS3method(\&quot;select\&quot;, \&quot;zzz\&quot;, selectDotZzz,
                         envir = asNamespace(\&quot;dplyr\&quot;))
}
&quot;)

cat(file = file.path(package, &quot;man&quot;, paste0(nm, &quot;.Rd&quot;)), &quot;
\\name{whatever}
\\alias{selectDotZzz}
\\title{whatever}
\\description{whatever}
&quot;)

cat(file = file.path(package, &quot;tests&quot;, paste0(nm, &quot;.R&quot;)),
    sprintf(&quot;library(%s)&quot;, nm))
cat(file = file.path(package, &quot;tests&quot;, paste0(nm, &quot;.R&quot;)), append = TRUE, &quot;
if(requireNamespace(\&quot;dplyr\&quot;, quietly = TRUE))
    stopifnot(identical(dplyr::select(structure(0, class = \&quot;zzz\&quot;)), 0))
&quot;)

getRversion()
packageVersion(&quot;dplyr&quot;)
tools:::Rcmd(c(&quot;build&quot;, package))
tools:::Rcmd(c(&quot;check&quot;, Sys.glob(paste0(nm, &quot;_*.tar.gz&quot;))))

unlink(Sys.glob(paste0(nm, &quot;*&quot;)), recursive = TRUE)

The relevant output:

&gt; getRversion()
[1] &#39;4.3.1&#39;
&gt; packageVersion(&quot;dplyr&quot;)
[1] &#39;1.1.2&#39;
&gt; tools:::Rcmd(c(&quot;build&quot;, package))
* checking for file &#39;TestPackage/DESCRIPTION&#39; ... OK
* preparing &#39;TestPackage&#39;:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building &#39;TestPackage_0.0-0.tar.gz&#39;
&gt; tools:::Rcmd(c(&quot;check&quot;, Sys.glob(paste0(nm, &quot;_*.tar.gz&quot;))))
* using log directory &#39;/Users/mikael/Desktop/R-experiments/codetools/TestPackage.Rcheck&#39;
* using R version 4.3.1 Patched (2023-06-19 r84580)
* using platform: aarch64-apple-darwin22.5.0 (64-bit)
* R was compiled by
    Apple clang version 14.0.3 (clang-1403.0.22.14.1)
    GNU Fortran (GCC) 12.2.0
* running under: macOS Ventura 13.4
* using session charset: UTF-8
* checking for file &#39;TestPackage/DESCRIPTION&#39; ... OK
* this is package &#39;TestPackage&#39; version &#39;0.0-0&#39;
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package &#39;TestPackage&#39; can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking startup messages can be suppressed ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... NONE
* checking for unstated dependencies in &#39;tests&#39; ... OK
* checking tests ...
  Running ‘TestPackage.R’
 OK
* checking PDF version of manual ... OK
* DONE

Status: OK

答案2

得分: 1

您可以通过在检查包是否可用的if语句内定义您的方法,并使用utils::globalVariables()来避免R CMD检查关于未定义全局函数或变量的提示来实现这一点。

修改另一个包的命名空间通常是不允许的根据CRAN政策,并且尝试直接分配一个函数给dplyr::select.custom可能不会被接受。

因此,思路是将该方法分配给全局环境中的select.custom,这将允许它作为通用的select()函数的方法来使用,如果dplyr已加载。关键点在于您不直接修改dplyr的命名空间。

您需要调整您的custom.R文件:

# 自定义select函数
#
# @export
custom_select <- function(data, select) {
  print("Hello, world!")
}

if (requireNamespace("dplyr", quietly = TRUE)) {
  select.custom <- function(data, select) {
    custom_select(data, select)
  }
  utils::globalVariables("select.custom")
}

对于NAMESPACE文件,您可以避免从dplyr导入任何内容:

# 由roxygen2生成:不要手动编辑

export(custom_select)

在您的DESCRIPTION文件中,您应该在Suggests下列出dplyr

Suggests:
    dplyr

这样,如果用户加载了dplyrselect.custom方法将可用。如果未加载dplyr,用户仍然可以使用custom_select()函数。这种方法将dplyr排除在硬依赖项之外,同时仍然提供了dplyr通用方法(如果可用)。

这将在您的软件包的命名空间中创建一个名为select.custom的函数,该函数将在加载dplyr软件包时用作select()的方法。请注意,这不会修改dplyr的命名空间本身。

此外,建议文档化select.custom功能仅在安装并加载了dplyr时可用,并且dplyr是一个建议的,而不是必需的依赖项。

在您的DESCRIPTION文件中,您应该在Suggests下列出dplyr(链接)

Suggests:
    dplyr

这种方法应该符合CRAN政策,因为您没有修改另一个软件包的命名空间,并且根据dplyr软件包的可用性有条件地定义了方法。


话虽如此,Mikael Jagan评论中指出:

> 警告:如果您在安装时根据建议的包是否可用而运行的事情上要非常小心——这包括R代码文件中的顶级代码、.onLoad函数和S4类和方法的定义。
>
> 问题在于,一旦建议包的命名空间被加载,对它的引用可能会被捕获在安装的软件包中(最常见的是在S4方法中),但是当使用安装的软件包时,建议的软件包可能不可用(尤其是对于不同计算机上的二进制软件包可能如此)。
>
> 更糟糕的是,问题可能不仅仅局限于您的软件包,因为导入您的任何软件包时都会加载建议的软件包的命名空间,因此也可能在那里被捕获。

英文:

You can do this by defining your method within an if statement that checks if the package is available, and using utils::globalVariables() to avoid R CMD check notes about undefined global functions or variables.

Altering the namespace of another package is generally not allowed according to CRAN policies, and trying to assign a function directly to dplyr::select.custom would likely not be acceptable.

The idea is therefore to assign the method to select.custom in the global environment, which will allow it to be used as a method for the generic select() function if dplyr is loaded. The key point is that you are not modifying the dplyr namespace directly.

You would need to adjust your custom.R file:

#&#39; Custom select function
#&#39;
#&#39; @export
custom_select &lt;- function(data, select) {
  print(&quot;Hello, world!&quot;)
}

if (requireNamespace(&quot;dplyr&quot;, quietly = TRUE)) {
  select.custom &lt;- function(data, select) {
    custom_select(data, select)
  }
  utils::globalVariables(&quot;select.custom&quot;)
}

And for the NAMESPACE file, you can avoid importing anything from dplyr:

# Generated by roxygen2: do not edit by hand

export(custom_select)

In your DESCRIPTION file, you should list dplyr under Suggests:

Suggests:
    dplyr

This way, if the user has dplyr loaded, the select.custom method will be available. If dplyr is not loaded, the user can still use the custom_select() function. This approach keeps dplyr out of the hard dependencies while still providing methods for dplyr's generics if it is available.

That will create a function select.custom in your package's namespace, which will be used as a method for select() when the dplyr package is loaded. Note that this does not modify the dplyr namespace itself.

Also, it is a good idea to document that the select.custom functionality is only available if dplyr is installed and loaded, and that dplyr is a suggested, not required, dependency.

In your DESCRIPTION file, you should list dplyr under Suggests:

Suggests:
    dplyr

This approach should be compliant with CRAN policies, as you are not altering another package's namespace and are conditionally defining methods based on the availability of the dplyr package.


That being said, Mikael Jagan points out in the comments to "Writing R Extensions / Creating R packages / Package structure / Package Dependencies / Suggested packages"

> WARNING: Be extremely careful if you do things which would be run at installation time depending on whether suggested packages are available or not—this includes top-level code in R code files, .onLoad functions and the definitions of S4 classes and methods.
>
> The problem is that once a namespace of a suggested package is loaded, references to it may be captured in the installed package (most commonly in S4 methods), but the suggested package may not be available when the installed package is used (which especially for binary packages might be on a different machine).
>
> Even worse, the problems might not be confined to your package, for the namespaces of your suggested packages will also be loaded whenever any package which imports yours is installed and so may be captured there.

huangapple
  • 本文由 发表于 2023年6月14日 23:53:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475424.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定