是否有基于sf(或df)的替代方法来估计双变量正态核密度?

huangapple go评论55阅读模式
英文:

Is there an sf (or df) based alternative to kernelUD to estimate bivariate normal kernel density?

问题

我正在从sp切换到sf,但有一些分析需要使用adehabitatHR中的kernelUD,它要求我的数据是SpatialPoints。是否有一个不使用sp的等效方法,也许可以使用sf或常规数据框(dataframe),并提供类似的带宽选项?

我的原始代码示例中使用Genetta genetta:

library(adehabitatHR)

G_genetta <- as.data.frame(matrix(c(0.5519758, 0.27524548,
                                    0.5725632, 0.12309273,
                                    0.5547747, 0.06100429,
                                    0.6110925, 0.16211416,
                                    0.5951087, 0.09316814,
                                    0.5333567, 0.11673812,
                                    0.5855461, 0.11170616,
                                    0.5221387, 0.11061583,
                                    0.5848452, 0.17213175), 
                                  ncol = 2, byrow = TRUE))

mask_xy_grid <- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) <- ~ x + y
gridded(mask_xy_grid) <- TRUE

ade_G_genetta <- kernelUD(SpatialPoints(G_genetta), 
                          h = "href", 
                          grid = mask_xy_grid, 
                          kern = "bivnorm")

plot(ade_G_genetta)

我知道可以使用ks包,如下所示:

library(ks)

mask_xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))

kde_G_genetta <- kde(G_genetta,
                     eval.points = mask_xy)

# 获取类似的绘图
kde_G_genetta <- SpatialPixelsDataFrame(points = kde_G_genetta$eval.points, 
                                        data = data.frame(est = kde_G_genetta$estimate))

plot(kde_G_genetta)

但是这个方法不提供与kernelUD相同的带宽选项。我特别不理解为什么kernelUD接受一个单一的标量值作为"h",而kde在处理多变量问题时却接受矩阵。此外,kde似乎无法很好地处理有限的采样点,例如Alcelaphus buselaphus,会出现以下错误:

Error in chol.default(S) : 
  the leading minor of order 2 is not positive definite
A_buselaphus <- as.data.frame(matrix(c(0.5109837, 0.1247711,
                                       0.5109837, 0.1247711,
                                       0.5893287, 0.1613403,
                                       0.5893287, 0.1613403,
                                       0.5893287, 0.1613403), 
                                     ncol = 2, byrow = TRUE))

# 使用kernelUD
ade_A_buselaphus <- kernelUD(SpatialPoints(A_buselaphus), 
                             h = "href", 
                             grid = mask_xy_grid, 
                             kern = "bivnorm")

plot(ade_A_buselaphus)

# 使用kde
kde_A_buselaphus <- kde(A_buselaphus,
                        eval.points = mask_xy)

# 获取类似的绘图
kde_A_buselaphus <- SpatialPixelsDataFrame(points = kde_A_buselaphus$eval.points, 
                                           data = data.frame(est = kde_A_buselaphus$estimate))

plot(kde_A_buselaphus)

我可以使用MASS,但遇到与kernelUD相同的带宽问题,因为它需要"x和y方向的带宽向量":

library(MASS)

mask.xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))

MASS_G_genetta <- kde2d(G_genetta$V1,
                        G_genetta$V2,
                        n = c(100, 100),
                        lims = c(range(mask.xy$x), range(mask.xy$y)))

# 获取类似的绘图
MASS_G_genetta <- SpatialPixelsDataFrame(points = mask.xy, 
                                         data = data.frame(est = melt(MASS_G_genetta$z)$value))

plot(MASS_G_genetta)

# 同一坐标轴
plot(ade_G_genetta, zlim = c(0,75))
plot(kde_G_genetta, zlim = c(0,75))
plot(MASS_G_genetta, zlim = c(0,75))
英文:

I am switching from sp to sf but have some analysis which uses kernelUD from adehabitatHR which requires my data to be SpatialPoints. Is there an equivalent which does not use sp, perhaps instead using sf or a regular dataframe, which offers similar bandwidth options?

A snippet of my original code using Genetta genetta as an example:

library(adehabitatHR)

G_genetta &lt;- as.data.frame(matrix(c(0.5519758, 0.27524548,
                                    0.5725632, 0.12309273,
                                    0.5547747, 0.06100429,
                                    0.6110925, 0.16211416,
                                    0.5951087, 0.09316814,
                                    0.5333567, 0.11673812,
                                    0.5855461, 0.11170616,
                                    0.5221387, 0.11061583,
                                    0.5848452, 0.17213175), 
                                  ncol = 2, byrow = TRUE))

mask_xy_grid &lt;- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) &lt;- ~ x + y
gridded(mask_xy_grid) &lt;- TRUE

ade_G_genetta &lt;- kernelUD(SpatialPoints(G_genetta), 
                          h = &quot;href&quot;, 
                          grid = mask_xy_grid, 
                          kern = &quot;bivnorm&quot;)

plot(ade_G_genetta)

I am aware of the ks package which I could use as follows:

library(ks)

mask_xy &lt;- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))

kde_G_genetta &lt;- kde(G_genetta,
                     eval.points = mask_xy)

# lazily get comparable plot
kde_G_genetta &lt;- SpatialPixelsDataFrame(points = kde_G_genetta$eval.points, 
                                        data = data.frame(est = kde_G_genetta$estimate))

plot(kde_G_genetta)

but this doesn't offer the same bandwidth options as kernelUD. I particularly don't understand why kernelUD takes a single scalar value for "h" whereas kde takes a matrix when they are both applied to a multivariate problem. kde also doesn't seem to cope well with limited sampling points, e.g. Alcelaphus buselaphus which gives the error:

Error in chol.default(S) : 
  the leading minor of order 2 is not positive definite
A_buselaphus &lt;- as.data.frame(matrix(c(0.5109837, 0.1247711,
                                       0.5109837, 0.1247711,
                                       0.5893287, 0.1613403,
                                       0.5893287, 0.1613403,
                                       0.5893287, 0.1613403), 
                                     ncol = 2, byrow = TRUE))

# using kernelUD
ade_A_buselaphus &lt;- kernelUD(SpatialPoints(A_buselaphus), 
                             h = &quot;href&quot;, 
                             grid = mask_xy_grid, 
                             kern = &quot;bivnorm&quot;)

plot(ade_A_buselaphus)

# using kde
kde_A_buselaphus &lt;- kde(A_buselaphus,
                        eval.points = mask_xy)

# lazily get comparable plot
kde_A_buselaphus &lt;- SpatialPixelsDataFrame(points = kde_A_buselaphus$eval.points, 
                                           data = data.frame(est = kde_A_buselaphus$estimate))

plot(kde_A_buselaphus)

I could use MASS, but I run into the same bandwidth problem as it takes a "vector of bandwidths for x and y directions":

library(MASS)

mask.xy &lt;- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))

MASS_G_genetta &lt;- kde2d(G_genetta$V1,
                        G_genetta$V2,
                        n = c(100, 100),
                        lims = c(range(mask.xy$x), range(mask.xy$y)))

# lazily get comparable plots
MASS_G_genetta &lt;- SpatialPixelsDataFrame(points = mask.xy, 
                                         data = data.frame(est = melt(MASS_G_genetta$z)$value))

plot(MASS_G_genetta)

# same axis
plot(ade_G_genetta, zlim = c(0,75))
plot(kde_G_genetta, zlim = c(0,75))
plot(MASS_G_genetta, zlim = c(0,75))

答案1

得分: 0

kernelUD的源代码中进行了一些研究,我找到了一个可以接受的,尽管不完整的解决方案。我只查看了与使用自定义方法计算h的双变量正态核有关的文档和源代码。

为了找到解决方案,我沿着CRAN存储库中的一系列函数进行了追踪:
kernelUD -> .kernelUDs -> void kernelhr -> void epa

这导致我在epa函数中找到了以下注释:
/* 仅保留不远于当前像素的4*fen的点 */

fen等于h,根据文档,默认情况下通过以下方式计算:
是否有基于sf(或df)的替代方法来估计双变量正态核密度?

对于双变量正态核,我可以通用地写成如下形式:
(sqrt(0.5*(var(species_xy[[1]]) + var(species_xy[[2]]))))*(nrow(species_xy)^-(1/6))

由于MASS中的kde2d需要一个标量向量,我可以为每个维度提供h*4。因此,Genetta genetta的解决方案如下:

library(adehabitatHR)
library(MASS)

# 使用adehabitatHR
G_genetta <- as.data.frame(matrix(c(0.5519758, 0.27524548,
                                    0.5725632, 0.12309273,
                                    0.5547747, 0.06100429,
                                    0.6110925, 0.16211416,
                                    0.5951087, 0.09316814,
                                    0.5333567, 0.11673812,
                                    0.5855461, 0.11170616,
                                    0.5221387, 0.11061583,
                                    0.5848452, 0.17213175), 
                                  ncol = 2, byrow = TRUE))

mask_xy_grid <- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) <- ~ x + y
gridded(mask_xy_grid) <- TRUE

ade_G_genetta <- kernelUD(SpatialPoints(G_genetta), 
                          h = "href", 
                          grid = mask_xy_grid, 
                          kern = "bivnorm")

plot(ade_G_genetta)

# 使用MASS
mask.xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))

H <- (sqrt(0.5*(var(G_genetta[[1]]) + var(G_genetta[[2]]))))*(nrow(G_genetta)^-(1/6))

MASS_G_genetta <- kde2d(G_genetta$V1,
                        G_genetta$V2,
                        n = c(100, 100),
                        h = c(H*4, H*4),
                        lims = c(range(mask.xy$x), range(mask.xy$y)))

# 获取可比较的绘图
MASS_G_genetta <- SpatialPixelsDataFrame(points = mask.xy, 
                                         data = data.frame(est = melt(MASS_G_genetta$z)$value))

plot(MASS_G_genetta)

# 确认这与kernelUD输出中的相同h:
kde_G_genetta@h[["h"]]
H
kde_G_genetta@h[["h"]] == H

输出中的确切值略有不同,可能是两种方法之间精度差异的原因。两个对象的快速摘要显示它们在功能上是相同的,并支持它们绘图的相似性。

summary(kde_G_genetta@data[["ud"]])
summary(MASS_G_genetta$est)
英文:

Having done some digging into the source code of kernelUD I have found a tolerable, albeit incomplete, solution. I have only been looking within documentation and source code which relates to a bivariate normal kernel using the ad hoc method for calculating h.

To find the solution I followed a trail of functions within the CRAN repo:
kernelUD -> .kernelUDs -> void kernelhr -> void epa

which led me to this comment within the epa function:
/* Keep only the points no further than 4*fen of the current pixel */

fen equates to h which according to the documentation is by default calculated with:
是否有基于sf(或df)的替代方法来估计双变量正态核密度?

for a bivariate normal kernel. I can write this generically as follows:
(sqrt(0.5*(var(species_xy[[1]]) + var(species_xy[[2]]))))*(nrow(species_xy)^-(1/6))

Since kde2d from MASS takes a scalar vector, I can supply it with h*4 for each dimension. The solution for Genetta genetta would therefore be:

library(adehabitatHR)
library(MASS)
# using adehabitatHR
G_genetta &lt;- as.data.frame(matrix(c(0.5519758, 0.27524548,
0.5725632, 0.12309273,
0.5547747, 0.06100429,
0.6110925, 0.16211416,
0.5951087, 0.09316814,
0.5333567, 0.11673812,
0.5855461, 0.11170616,
0.5221387, 0.11061583,
0.5848452, 0.17213175), 
ncol = 2, byrow = TRUE))
mask_xy_grid &lt;- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) &lt;- ~ x + y
gridded(mask_xy_grid) &lt;- TRUE
ade_G_genetta &lt;- kernelUD(SpatialPoints(G_genetta), 
h = &quot;href&quot;, 
grid = mask_xy_grid, 
kern = &quot;bivnorm&quot;)
plot(ade_G_genetta)
# using MASS
mask.xy &lt;- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
H &lt;- (sqrt(0.5*(var(G_genetta[[1]]) + var(G_genetta[[2]]))))*(nrow(G_genetta)^-(1/6))
MASS_G_genetta &lt;- kde2d(G_genetta$V1,
G_genetta$V2,
n = c(100, 100),
h = c(H*4, H*4),
lims = c(range(mask.xy$x), range(mask.xy$y)))
# lazily get comparable plots
MASS_G_genetta &lt;- SpatialPixelsDataFrame(points = mask.xy, 
data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# to confirm this is the same h as the kernelUD output:
kde_G_genetta@h[[&quot;h&quot;]]
H
kde_G_genetta@h[[&quot;h&quot;]] == H

The exact values in the output differ minorly, presumably due to differences in precision between the two methods. A quick summary of the two objects shows us they are functionally the same and backs up the similarity in their plots.

summary(kde_G_genetta@data[[&quot;ud&quot;]]
summary(MASS_G_genetta$est)

huangapple
  • 本文由 发表于 2023年7月13日 23:15:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76680983.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定