2023年2月7日 01:36:46go评论88阅读模式

英文:

Misunderstanding the use of 'apply'

问题

我有一个函数：

myFun <- function(x, y)
{
}

它的目的是处理数据框的一列：

myFun(dataFrame$Column, anotherPrameterValue)

dataFrame$Column 是一个具有4个水平的因子。函数很好地识别它并正常工作。我附上了在函数内部的调试器中的环境数据的图像（在第一行设置断点时）。

如果通过索引传递它，它也有效：

myFun(dataFrame[1], anotherPrameterValue)

但是，如果我这样编码：

apply(dataFrame, 2, myFun, y = anotherParameterValue)

传递给函数的 x 数据非常不同：

我想这可能与我对 apply 的理解有关...

如果你需要我的函数内部的代码，请告诉我，但我认为这可能不是必要的，因为问题似乎出现在参数传递的数据中。

英文:

I have a function:

myFun &lt;- function (x, y)
{
}

It's intended to process a column of a dataframe

myFun(dataFrame$Column, anotherPrameterValue)

dataFrame$Column is a Factor with 4 levels. It's well recognized by the function and works great. I attach image of environment data from debugger (breakpoint inside the function, the first line)

It also works if passed by index:

myFun(dataFrame[1], anotherPrameterValue)

But, if I code:

apply(dataFrame, 2, myFun, y = anotherParameterValue)

The data passed to the function in 'x' is very different:

I suppose it must be something I'm not understanding in 'apply'...

If you need the code inside my function, tell me, but I think it's not neccesary, as the problem shows in the data received through parameters.

答案1

得分: 1

如评论中所解释的，apply 适用于 matrix 类型的对象。在这个过程中，R 将尝试将您的数据框输入静默转换为矩阵。

一个工作示例：

set.seed(42)
quux &lt;- data.frame(int1=sample(1000,3), int2=sample(1000,3), num3=runif(3), num4=runif(3)) |&gt;
  transform(fctr5 = factor(int1), chr6=as.character(int2))
quux
#   int1 int2      num3      num4 fctr5 chr6
# 1  561  153 0.7365883 0.7050648   561  153
# 2  997   74 0.1346666 0.4577418   997   74
# 3  321  228 0.6569923 0.7191123   321  228
myfun &lt;- function(z, y = 0) y + mean(z)
myfun(quux$int2, 1000)
# [1] 1151.667
apply(quux, 2, myfun, y = 1000)
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
#  int1  int2  num3  num4 fctr5  chr6 
#    NA    NA    NA    NA    NA    NA

如果我们调试 myfun 并查看正在发生的情况，我们立刻会看到一个问题：

debug(myfun)
apply(quux, 2, myfun, y = 1000)
# debugging in: FUN(newX[, i], ...)
# debug at #1: y + mean(z)
y
# [1] 1000
z
# [1] &quot;561&quot; &quot;997&quot; &quot;321&quot;

您可以通过每次调用 myfun 时继续进行调试，每次调用一次，每次操作一列。您会发现它们都是 character 类型。

似乎“显而易见”不能在字符串上执行计算，有时某些数学运算可以在 factor 上运行（不适用于 mean），但它们不应该（因为根据函数的不同，它可能在因子的整数编码上运行或在水平的字符串表示上运行，这是非常不同的事情）。

我们该如何修复这个问题？将数据框子集，以便仅操作类似数字的列。

isnum &lt;- sapply(quux, is.numeric)
isnum
#  int1  int2  num3  num4 fctr5  chr6 
#  TRUE  TRUE  TRUE  TRUE FALSE FALSE 
apply(quux[,isnum], 2, myfun, y = 1000)
#     int1     int2     num3     num4 
# 1626.333 1151.667 1000.509 1000.627

值得注意的是，apply 本身并不是必要的，我们也可以在这里使用 lapply 或 sapply，这取决于您打算如何处理返回值。例如，如果您只需要上述的平均值，可以使用：

sapply(quux[,isnum], myfun, y = 1000)
#     int1     int2     num3     num4 
# 1626.333 1151.667 1000.509 1000.627

但如果您想要替换数据框的值（出于某种原因...与我合作），可以这样做：

quux[isnum] &lt;- lapply(quux[isnum], myfun, y = 1000)
quux
#       int1     int2     num3     num4 fctr5 chr6
# 1 1626.333 1151.667 1000.509 1000.627   561  153
# 2 1626.333 1151.667 1000.509 1000.627   997   74
# 3 1626.333 1151.667 1000.509 1000.627   321  228

或者如果您想要将列附加到 quux，然后：

#（从原始 quux 开始）
isnum_ch &lt;- names(isnum)[isnum]
isnum_ch &lt;- paste0(isnum_ch, &quot;_new&quot;)
isnum_ch
# [1] &quot;int1_new&quot; &quot;int2_new&quot; &quot;num3_new&quot; &quot;num4_new&quot;
cbind(quux, setNames(lapply(quux[isnum], myfun, y = 500), isnum_ch))
#   int1 int2      num3      num4 fctr5 chr6 int1_new int2_new num3_new num4_new
# 1  561  153 0.7365883 0.7050648   561  153 1126.333 651.6667 500.5094 500.6273
# 2  997   74 0.1346666 0.4577418   997   74 1126.333 651.6667 500.5094 500.6273
# 3  321  228 0.6569923 0.7191123   321  228 1126.333 651.6667 500.5094 500.6273

英文:

As explained in the comments, apply is for objects of class matrix. R will happily/silently try to convert your frame input to a matrix while doing so.

A working example:

set.seed(42)
quux &lt;- data.frame(int1=sample(1000,3), int2=sample(1000,3), num3=runif(3), num4=runif(3)) |&gt;
  transform(fctr5 = factor(int1), chr6=as.character(int2))
quux
#   int1 int2      num3      num4 fctr5 chr6
# 1  561  153 0.7365883 0.7050648   561  153
# 2  997   74 0.1346666 0.4577418   997   74
# 3  321  228 0.6569923 0.7191123   321  228
myfun &lt;- function(z, y = 0) y + mean(z)
myfun(quux$int2, 1000)
# [1] 1151.667
apply(quux, 2, myfun, y = 1000)
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
#  int1  int2  num3  num4 fctr5  chr6 
#    NA    NA    NA    NA    NA    NA

If we debug myfun and step into what's going on, we'll immediately see a problem:

debug(myfun)
apply(quux, 2, myfun, y = 1000)
# debugging in: FUN(newX[, i], ...)
# debug at #1: y + mean(z)
y
# [1] 1000
z
# [1] &quot;561&quot; &quot;997&quot; &quot;321&quot;

You can continue through each call to myfun, once per column. You'll find that they are all class character.

It seems "obvious" that one cannot calculate something on the strings, and sometimes some math-operations can work on factors (not with mean) but they shouldn't (because depending on the function, it might work on the integer-encoding of the factor or the string-representations of the levels, very different things).

How do we fix this? Subset the frame so that you're only operating on the number-like columns.

isnum &lt;- sapply(quux, is.numeric)
isnum
#  int1  int2  num3  num4 fctr5  chr6 
#  TRUE  TRUE  TRUE  TRUE FALSE FALSE 
apply(quux[,isnum], 2, myfun, y = 1000)
#     int1     int2     num3     num4 
# 1626.333 1151.667 1000.509 1000.627

FYI, apply itself is not necessary, we can also use lapply or sapply here, depending on what you're planning on doing with the return value. For example, if you just need the averages as above, use

sapply(quux[,isnum], myfun, y = 1000)
#     int1     int2     num3     num4 
# 1626.333 1151.667 1000.509 1000.627

But if you want to replace the frames values (for some reason ... work with me), one might do:

quux[isnum] &lt;- lapply(quux[isnum], myfun, y = 1000)
quux
#       int1     int2     num3     num4 fctr5 chr6
# 1 1626.333 1151.667 1000.509 1000.627   561  153
# 2 1626.333 1151.667 1000.509 1000.627   997   74
# 3 1626.333 1151.667 1000.509 1000.627   321  228

Or if you wanted to append the columns to quux, then

# (starting with the original quux)
isnum_ch &lt;- names(isnum)[isnum]
isnum_ch &lt;- paste0(isnum_ch, &quot;_new&quot;)
isnum_ch
# [1] &quot;int1_new&quot; &quot;int2_new&quot; &quot;num3_new&quot; &quot;num4_new&quot;
cbind(quux, setNames(lapply(quux[isnum], myfun, y = 500), isnum_ch))
#   int1 int2      num3      num4 fctr5 chr6 int1_new int2_new num3_new num4_new
# 1  561  153 0.7365883 0.7050648   561  153 1126.333 651.6667 500.5094 500.6273
# 2  997   74 0.1346666 0.4577418   997   74 1126.333 651.6667 500.5094 500.6273
# 3  321  228 0.6569923 0.7191123   321  228 1126.333 651.6667 500.5094 500.6273

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

对’apply’的使用产生了误解

问题

答案1

创建一个使用矢量化函数的新数据框。

function for horizontal stack bar with ggplot

使用 R 根据条件交换两列之间的数值。

复制一个工作表从一个文件到另一个使用R。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。