英文:
Misunderstanding the use of 'apply'
问题
我有一个函数:
myFun <- function(x, y)
{
}
它的目的是处理数据框的一列:
myFun(dataFrame$Column, anotherPrameterValue)
dataFrame$Column
是一个具有4个水平的因子。函数很好地识别它并正常工作。我附上了在函数内部的调试器中的环境数据的图像(在第一行设置断点时)。
如果通过索引传递它,它也有效:
myFun(dataFrame[1], anotherPrameterValue)
但是,如果我这样编码:
apply(dataFrame, 2, myFun, y = anotherParameterValue)
传递给函数的 x
数据非常不同:
我想这可能与我对 apply
的理解有关...
如果你需要我的函数内部的代码,请告诉我,但我认为这可能不是必要的,因为问题似乎出现在参数传递的数据中。
英文:
I have a function:
myFun <- function (x, y)
{
}
It's intended to process a column of a dataframe
myFun(dataFrame$Column, anotherPrameterValue)
dataFrame$Column is a Factor with 4 levels. It's well recognized by the function and works great. I attach image of environment data from debugger (breakpoint inside the function, the first line)
It also works if passed by index:
myFun(dataFrame[1], anotherPrameterValue)
But, if I code:
apply(dataFrame, 2, myFun, y = anotherParameterValue)
The data passed to the function in 'x' is very different:
I suppose it must be something I'm not understanding in 'apply'...
If you need the code inside my function, tell me, but I think it's not neccesary, as the problem shows in the data received through parameters.
答案1
得分: 1
如评论中所解释的,apply
适用于 matrix
类型的对象。在这个过程中,R 将尝试将您的数据框输入静默转换为矩阵。
一个工作示例:
set.seed(42)
quux <- data.frame(int1=sample(1000,3), int2=sample(1000,3), num3=runif(3), num4=runif(3)) |>
transform(fctr5 = factor(int1), chr6=as.character(int2))
quux
# int1 int2 num3 num4 fctr5 chr6
# 1 561 153 0.7365883 0.7050648 561 153
# 2 997 74 0.1346666 0.4577418 997 74
# 3 321 228 0.6569923 0.7191123 321 228
myfun <- function(z, y = 0) y + mean(z)
myfun(quux$int2, 1000)
# [1] 1151.667
apply(quux, 2, myfun, y = 1000)
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# int1 int2 num3 num4 fctr5 chr6
# NA NA NA NA NA NA
如果我们调试 myfun
并查看正在发生的情况,我们立刻会看到一个问题:
debug(myfun)
apply(quux, 2, myfun, y = 1000)
# debugging in: FUN(newX[, i], ...)
# debug at #1: y + mean(z)
y
# [1] 1000
z
# [1] "561" "997" "321"
您可以通过每次调用 myfun
时继续进行调试,每次调用一次,每次操作一列。您会发现它们都是 character
类型。
似乎“显而易见”不能在字符串上执行计算,有时某些数学运算可以在 factor
上运行(不适用于 mean
),但它们不应该(因为根据函数的不同,它可能在因子的整数编码上运行或在水平的字符串表示上运行,这是非常不同的事情)。
我们该如何修复这个问题?将数据框子集,以便仅操作类似数字的列。
isnum <- sapply(quux, is.numeric)
isnum
# int1 int2 num3 num4 fctr5 chr6
# TRUE TRUE TRUE TRUE FALSE FALSE
apply(quux[,isnum], 2, myfun, y = 1000)
# int1 int2 num3 num4
# 1626.333 1151.667 1000.509 1000.627
值得注意的是,apply
本身并不是必要的,我们也可以在这里使用 lapply
或 sapply
,这取决于您打算如何处理返回值。例如,如果您只需要上述的平均值,可以使用:
sapply(quux[,isnum], myfun, y = 1000)
# int1 int2 num3 num4
# 1626.333 1151.667 1000.509 1000.627
但如果您想要替换数据框的值(出于某种原因...与我合作),可以这样做:
quux[isnum] <- lapply(quux[isnum], myfun, y = 1000)
quux
# int1 int2 num3 num4 fctr5 chr6
# 1 1626.333 1151.667 1000.509 1000.627 561 153
# 2 1626.333 1151.667 1000.509 1000.627 997 74
# 3 1626.333 1151.667 1000.509 1000.627 321 228
或者如果您想要将列附加到 quux
,然后:
#(从原始 quux 开始)
isnum_ch <- names(isnum)[isnum]
isnum_ch <- paste0(isnum_ch, "_new")
isnum_ch
# [1] "int1_new" "int2_new" "num3_new" "num4_new"
cbind(quux, setNames(lapply(quux[isnum], myfun, y = 500), isnum_ch))
# int1 int2 num3 num4 fctr5 chr6 int1_new int2_new num3_new num4_new
# 1 561 153 0.7365883 0.7050648 561 153 1126.333 651.6667 500.5094 500.6273
# 2 997 74 0.1346666 0.4577418 997 74 1126.333 651.6667 500.5094 500.6273
# 3 321 228 0.6569923 0.7191123 321 228 1126.333 651.6667 500.5094 500.6273
英文:
As explained in the comments, apply
is for objects of class matrix
. R will happily/silently try to convert your frame input to a matrix while doing so.
A working example:
set.seed(42)
quux <- data.frame(int1=sample(1000,3), int2=sample(1000,3), num3=runif(3), num4=runif(3)) |>
transform(fctr5 = factor(int1), chr6=as.character(int2))
quux
# int1 int2 num3 num4 fctr5 chr6
# 1 561 153 0.7365883 0.7050648 561 153
# 2 997 74 0.1346666 0.4577418 997 74
# 3 321 228 0.6569923 0.7191123 321 228
myfun <- function(z, y = 0) y + mean(z)
myfun(quux$int2, 1000)
# [1] 1151.667
apply(quux, 2, myfun, y = 1000)
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# int1 int2 num3 num4 fctr5 chr6
# NA NA NA NA NA NA
If we debug myfun
and step into what's going on, we'll immediately see a problem:
debug(myfun)
apply(quux, 2, myfun, y = 1000)
# debugging in: FUN(newX[, i], ...)
# debug at #1: y + mean(z)
y
# [1] 1000
z
# [1] "561" "997" "321"
You can c
ontinue through each call to myfun
, once per column. You'll find that they are all class character
.
It seems "obvious" that one cannot calculate something on the strings, and sometimes some math-operations can work on factor
s (not with mean
) but they shouldn't (because depending on the function, it might work on the integer
-encoding of the factor or the string-representations of the levels, very different things).
How do we fix this? Subset the frame so that you're only operating on the number-like columns.
isnum <- sapply(quux, is.numeric)
isnum
# int1 int2 num3 num4 fctr5 chr6
# TRUE TRUE TRUE TRUE FALSE FALSE
apply(quux[,isnum], 2, myfun, y = 1000)
# int1 int2 num3 num4
# 1626.333 1151.667 1000.509 1000.627
FYI, apply
itself is not necessary, we can also use lapply
or sapply
here, depending on what you're planning on doing with the return value. For example, if you just need the averages as above, use
sapply(quux[,isnum], myfun, y = 1000)
# int1 int2 num3 num4
# 1626.333 1151.667 1000.509 1000.627
But if you want to replace the frames values (for some reason ... work with me), one might do:
quux[isnum] <- lapply(quux[isnum], myfun, y = 1000)
quux
# int1 int2 num3 num4 fctr5 chr6
# 1 1626.333 1151.667 1000.509 1000.627 561 153
# 2 1626.333 1151.667 1000.509 1000.627 997 74
# 3 1626.333 1151.667 1000.509 1000.627 321 228
Or if you wanted to append the columns to quux
, then
# (starting with the original quux)
isnum_ch <- names(isnum)[isnum]
isnum_ch <- paste0(isnum_ch, "_new")
isnum_ch
# [1] "int1_new" "int2_new" "num3_new" "num4_new"
cbind(quux, setNames(lapply(quux[isnum], myfun, y = 500), isnum_ch))
# int1 int2 num3 num4 fctr5 chr6 int1_new int2_new num3_new num4_new
# 1 561 153 0.7365883 0.7050648 561 153 1126.333 651.6667 500.5094 500.6273
# 2 997 74 0.1346666 0.4577418 997 74 1126.333 651.6667 500.5094 500.6273
# 3 321 228 0.6569923 0.7191123 321 228 1126.333 651.6667 500.5094 500.6273
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论