英文:
Is there an sf (or df) based alternative to kernelUD to estimate bivariate normal kernel density?
问题
我正在从sp切换到sf,但有一些分析需要使用adehabitatHR
中的kernelUD
,它要求我的数据是SpatialPoints
。是否有一个不使用sp
的等效方法,也许可以使用sf
或常规数据框(dataframe),并提供类似的带宽选项?
我的原始代码示例中使用Genetta genetta:
library(adehabitatHR)
G_genetta <- as.data.frame(matrix(c(0.5519758, 0.27524548,
0.5725632, 0.12309273,
0.5547747, 0.06100429,
0.6110925, 0.16211416,
0.5951087, 0.09316814,
0.5333567, 0.11673812,
0.5855461, 0.11170616,
0.5221387, 0.11061583,
0.5848452, 0.17213175),
ncol = 2, byrow = TRUE))
mask_xy_grid <- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) <- ~ x + y
gridded(mask_xy_grid) <- TRUE
ade_G_genetta <- kernelUD(SpatialPoints(G_genetta),
h = "href",
grid = mask_xy_grid,
kern = "bivnorm")
plot(ade_G_genetta)
我知道可以使用ks
包,如下所示:
library(ks)
mask_xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
kde_G_genetta <- kde(G_genetta,
eval.points = mask_xy)
# 获取类似的绘图
kde_G_genetta <- SpatialPixelsDataFrame(points = kde_G_genetta$eval.points,
data = data.frame(est = kde_G_genetta$estimate))
plot(kde_G_genetta)
但是这个方法不提供与kernelUD
相同的带宽选项。我特别不理解为什么kernelUD
接受一个单一的标量值作为"h",而kde
在处理多变量问题时却接受矩阵。此外,kde
似乎无法很好地处理有限的采样点,例如Alcelaphus buselaphus,会出现以下错误:
Error in chol.default(S) :
the leading minor of order 2 is not positive definite
A_buselaphus <- as.data.frame(matrix(c(0.5109837, 0.1247711,
0.5109837, 0.1247711,
0.5893287, 0.1613403,
0.5893287, 0.1613403,
0.5893287, 0.1613403),
ncol = 2, byrow = TRUE))
# 使用kernelUD
ade_A_buselaphus <- kernelUD(SpatialPoints(A_buselaphus),
h = "href",
grid = mask_xy_grid,
kern = "bivnorm")
plot(ade_A_buselaphus)
# 使用kde
kde_A_buselaphus <- kde(A_buselaphus,
eval.points = mask_xy)
# 获取类似的绘图
kde_A_buselaphus <- SpatialPixelsDataFrame(points = kde_A_buselaphus$eval.points,
data = data.frame(est = kde_A_buselaphus$estimate))
plot(kde_A_buselaphus)
我可以使用MASS,但遇到与kernelUD
相同的带宽问题,因为它需要"x和y方向的带宽向量":
library(MASS)
mask.xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
MASS_G_genetta <- kde2d(G_genetta$V1,
G_genetta$V2,
n = c(100, 100),
lims = c(range(mask.xy$x), range(mask.xy$y)))
# 获取类似的绘图
MASS_G_genetta <- SpatialPixelsDataFrame(points = mask.xy,
data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# 同一坐标轴
plot(ade_G_genetta, zlim = c(0,75))
plot(kde_G_genetta, zlim = c(0,75))
plot(MASS_G_genetta, zlim = c(0,75))
英文:
I am switching from sp to sf but have some analysis which uses kernelUD
from adehabitatHR
which requires my data to be SpatialPoints
. Is there an equivalent which does not use sp
, perhaps instead using sf
or a regular dataframe, which offers similar bandwidth options?
A snippet of my original code using Genetta genetta as an example:
library(adehabitatHR)
G_genetta <- as.data.frame(matrix(c(0.5519758, 0.27524548,
0.5725632, 0.12309273,
0.5547747, 0.06100429,
0.6110925, 0.16211416,
0.5951087, 0.09316814,
0.5333567, 0.11673812,
0.5855461, 0.11170616,
0.5221387, 0.11061583,
0.5848452, 0.17213175),
ncol = 2, byrow = TRUE))
mask_xy_grid <- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) <- ~ x + y
gridded(mask_xy_grid) <- TRUE
ade_G_genetta <- kernelUD(SpatialPoints(G_genetta),
h = "href",
grid = mask_xy_grid,
kern = "bivnorm")
plot(ade_G_genetta)
I am aware of the ks
package which I could use as follows:
library(ks)
mask_xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
kde_G_genetta <- kde(G_genetta,
eval.points = mask_xy)
# lazily get comparable plot
kde_G_genetta <- SpatialPixelsDataFrame(points = kde_G_genetta$eval.points,
data = data.frame(est = kde_G_genetta$estimate))
plot(kde_G_genetta)
but this doesn't offer the same bandwidth options as kernelUD
. I particularly don't understand why kernelUD
takes a single scalar value for "h" whereas kde
takes a matrix when they are both applied to a multivariate problem. kde
also doesn't seem to cope well with limited sampling points, e.g. Alcelaphus buselaphus which gives the error:
Error in chol.default(S) :
the leading minor of order 2 is not positive definite
A_buselaphus <- as.data.frame(matrix(c(0.5109837, 0.1247711,
0.5109837, 0.1247711,
0.5893287, 0.1613403,
0.5893287, 0.1613403,
0.5893287, 0.1613403),
ncol = 2, byrow = TRUE))
# using kernelUD
ade_A_buselaphus <- kernelUD(SpatialPoints(A_buselaphus),
h = "href",
grid = mask_xy_grid,
kern = "bivnorm")
plot(ade_A_buselaphus)
# using kde
kde_A_buselaphus <- kde(A_buselaphus,
eval.points = mask_xy)
# lazily get comparable plot
kde_A_buselaphus <- SpatialPixelsDataFrame(points = kde_A_buselaphus$eval.points,
data = data.frame(est = kde_A_buselaphus$estimate))
plot(kde_A_buselaphus)
I could use MASS, but I run into the same bandwidth problem as it takes a "vector of bandwidths for x and y directions":
library(MASS)
mask.xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
MASS_G_genetta <- kde2d(G_genetta$V1,
G_genetta$V2,
n = c(100, 100),
lims = c(range(mask.xy$x), range(mask.xy$y)))
# lazily get comparable plots
MASS_G_genetta <- SpatialPixelsDataFrame(points = mask.xy,
data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# same axis
plot(ade_G_genetta, zlim = c(0,75))
plot(kde_G_genetta, zlim = c(0,75))
plot(MASS_G_genetta, zlim = c(0,75))
答案1
得分: 0
在kernelUD
的源代码中进行了一些研究,我找到了一个可以接受的,尽管不完整的解决方案。我只查看了与使用自定义方法计算h
的双变量正态核有关的文档和源代码。
为了找到解决方案,我沿着CRAN存储库中的一系列函数进行了追踪:
kernelUD
-> .kernelUDs
-> void kernelhr
-> void epa
这导致我在epa函数中找到了以下注释:
/* 仅保留不远于当前像素的4*fen的点 */
对于双变量正态核,我可以通用地写成如下形式:
(sqrt(0.5*(var(species_xy[[1]]) + var(species_xy[[2]]))))*(nrow(species_xy)^-(1/6))
由于MASS
中的kde2d
需要一个标量向量,我可以为每个维度提供h*4。因此,Genetta genetta的解决方案如下:
library(adehabitatHR)
library(MASS)
# 使用adehabitatHR
G_genetta <- as.data.frame(matrix(c(0.5519758, 0.27524548,
0.5725632, 0.12309273,
0.5547747, 0.06100429,
0.6110925, 0.16211416,
0.5951087, 0.09316814,
0.5333567, 0.11673812,
0.5855461, 0.11170616,
0.5221387, 0.11061583,
0.5848452, 0.17213175),
ncol = 2, byrow = TRUE))
mask_xy_grid <- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) <- ~ x + y
gridded(mask_xy_grid) <- TRUE
ade_G_genetta <- kernelUD(SpatialPoints(G_genetta),
h = "href",
grid = mask_xy_grid,
kern = "bivnorm")
plot(ade_G_genetta)
# 使用MASS
mask.xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
H <- (sqrt(0.5*(var(G_genetta[[1]]) + var(G_genetta[[2]]))))*(nrow(G_genetta)^-(1/6))
MASS_G_genetta <- kde2d(G_genetta$V1,
G_genetta$V2,
n = c(100, 100),
h = c(H*4, H*4),
lims = c(range(mask.xy$x), range(mask.xy$y)))
# 获取可比较的绘图
MASS_G_genetta <- SpatialPixelsDataFrame(points = mask.xy,
data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# 确认这与kernelUD输出中的相同h:
kde_G_genetta@h[["h"]]
H
kde_G_genetta@h[["h"]] == H
输出中的确切值略有不同,可能是两种方法之间精度差异的原因。两个对象的快速摘要显示它们在功能上是相同的,并支持它们绘图的相似性。
summary(kde_G_genetta@data[["ud"]])
summary(MASS_G_genetta$est)
英文:
Having done some digging into the source code of kernelUD
I have found a tolerable, albeit incomplete, solution. I have only been looking within documentation and source code which relates to a bivariate normal kernel using the ad hoc method for calculating h
.
To find the solution I followed a trail of functions within the CRAN repo:
kernelUD
-> .kernelUDs
-> void kernelhr
-> void epa
which led me to this comment within the epa function:
/* Keep only the points no further than 4*fen of the current pixel */
fen
equates to h
which according to the documentation is by default calculated with:
for a bivariate normal kernel. I can write this generically as follows:
(sqrt(0.5*(var(species_xy[[1]]) + var(species_xy[[2]]))))*(nrow(species_xy)^-(1/6))
Since kde2d
from MASS
takes a scalar vector, I can supply it with h*4 for each dimension. The solution for Genetta genetta would therefore be:
library(adehabitatHR)
library(MASS)
# using adehabitatHR
G_genetta <- as.data.frame(matrix(c(0.5519758, 0.27524548,
0.5725632, 0.12309273,
0.5547747, 0.06100429,
0.6110925, 0.16211416,
0.5951087, 0.09316814,
0.5333567, 0.11673812,
0.5855461, 0.11170616,
0.5221387, 0.11061583,
0.5848452, 0.17213175),
ncol = 2, byrow = TRUE))
mask_xy_grid <- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) <- ~ x + y
gridded(mask_xy_grid) <- TRUE
ade_G_genetta <- kernelUD(SpatialPoints(G_genetta),
h = "href",
grid = mask_xy_grid,
kern = "bivnorm")
plot(ade_G_genetta)
# using MASS
mask.xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
H <- (sqrt(0.5*(var(G_genetta[[1]]) + var(G_genetta[[2]]))))*(nrow(G_genetta)^-(1/6))
MASS_G_genetta <- kde2d(G_genetta$V1,
G_genetta$V2,
n = c(100, 100),
h = c(H*4, H*4),
lims = c(range(mask.xy$x), range(mask.xy$y)))
# lazily get comparable plots
MASS_G_genetta <- SpatialPixelsDataFrame(points = mask.xy,
data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# to confirm this is the same h as the kernelUD output:
kde_G_genetta@h[["h"]]
H
kde_G_genetta@h[["h"]] == H
The exact values in the output differ minorly, presumably due to differences in precision between the two methods. A quick summary of the two objects shows us they are functionally the same and backs up the similarity in their plots.
summary(kde_G_genetta@data[["ud"]]
summary(MASS_G_genetta$est)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论