2023年7月13日 23:15:48go评论93阅读模式

英文:

Is there an sf (or df) based alternative to kernelUD to estimate bivariate normal kernel density?

问题

我正在从sp切换到sf，但有一些分析需要使用adehabitatHR中的kernelUD，它要求我的数据是SpatialPoints。是否有一个不使用sp的等效方法，也许可以使用sf或常规数据框（dataframe），并提供类似的带宽选项？

我的原始代码示例中使用Genetta genetta：

library(adehabitatHR)
G_genetta <- as.data.frame(matrix(c(0.5519758, 0.27524548,
                                    0.5725632, 0.12309273,
                                    0.5547747, 0.06100429,
                                    0.6110925, 0.16211416,
                                    0.5951087, 0.09316814,
                                    0.5333567, 0.11673812,
                                    0.5855461, 0.11170616,
                                    0.5221387, 0.11061583,
                                    0.5848452, 0.17213175), 
                                  ncol = 2, byrow = TRUE))
mask_xy_grid <- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) <- ~ x + y
gridded(mask_xy_grid) <- TRUE
ade_G_genetta <- kernelUD(SpatialPoints(G_genetta), 
                          h = "href", 
                          grid = mask_xy_grid, 
                          kern = "bivnorm")
plot(ade_G_genetta)

我知道可以使用ks包，如下所示：

library(ks)
mask_xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
kde_G_genetta <- kde(G_genetta,
                     eval.points = mask_xy)
# 获取类似的绘图
kde_G_genetta <- SpatialPixelsDataFrame(points = kde_G_genetta$eval.points, 
                                        data = data.frame(est = kde_G_genetta$estimate))
plot(kde_G_genetta)

但是这个方法不提供与kernelUD相同的带宽选项。我特别不理解为什么kernelUD接受一个单一的标量值作为"h"，而kde在处理多变量问题时却接受矩阵。此外，kde似乎无法很好地处理有限的采样点，例如Alcelaphus buselaphus，会出现以下错误：

Error in chol.default(S) : 
  the leading minor of order 2 is not positive definite

A_buselaphus <- as.data.frame(matrix(c(0.5109837, 0.1247711,
                                       0.5109837, 0.1247711,
                                       0.5893287, 0.1613403,
                                       0.5893287, 0.1613403,
                                       0.5893287, 0.1613403), 
                                     ncol = 2, byrow = TRUE))
# 使用kernelUD
ade_A_buselaphus <- kernelUD(SpatialPoints(A_buselaphus), 
                             h = "href", 
                             grid = mask_xy_grid, 
                             kern = "bivnorm")
plot(ade_A_buselaphus)
# 使用kde
kde_A_buselaphus <- kde(A_buselaphus,
                        eval.points = mask_xy)
# 获取类似的绘图
kde_A_buselaphus <- SpatialPixelsDataFrame(points = kde_A_buselaphus$eval.points, 
                                           data = data.frame(est = kde_A_buselaphus$estimate))
plot(kde_A_buselaphus)

我可以使用MASS，但遇到与kernelUD相同的带宽问题，因为它需要"x和y方向的带宽向量"：

library(MASS)
mask.xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
MASS_G_genetta <- kde2d(G_genetta$V1,
                        G_genetta$V2,
                        n = c(100, 100),
                        lims = c(range(mask.xy$x), range(mask.xy$y)))
# 获取类似的绘图
MASS_G_genetta <- SpatialPixelsDataFrame(points = mask.xy, 
                                         data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# 同一坐标轴
plot(ade_G_genetta, zlim = c(0,75))
plot(kde_G_genetta, zlim = c(0,75))
plot(MASS_G_genetta, zlim = c(0,75))

英文:

I am switching from sp to sf but have some analysis which uses kernelUD from adehabitatHR which requires my data to be SpatialPoints. Is there an equivalent which does not use sp, perhaps instead using sf or a regular dataframe, which offers similar bandwidth options?

A snippet of my original code using Genetta genetta as an example:

library(adehabitatHR)
G_genetta &lt;- as.data.frame(matrix(c(0.5519758, 0.27524548,
                                    0.5725632, 0.12309273,
                                    0.5547747, 0.06100429,
                                    0.6110925, 0.16211416,
                                    0.5951087, 0.09316814,
                                    0.5333567, 0.11673812,
                                    0.5855461, 0.11170616,
                                    0.5221387, 0.11061583,
                                    0.5848452, 0.17213175), 
                                  ncol = 2, byrow = TRUE))
mask_xy_grid &lt;- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) &lt;- ~ x + y
gridded(mask_xy_grid) &lt;- TRUE
ade_G_genetta &lt;- kernelUD(SpatialPoints(G_genetta), 
                          h = &quot;href&quot;, 
                          grid = mask_xy_grid, 
                          kern = &quot;bivnorm&quot;)
plot(ade_G_genetta)

I am aware of the ks package which I could use as follows:

library(ks)
mask_xy &lt;- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
kde_G_genetta &lt;- kde(G_genetta,
                     eval.points = mask_xy)
# lazily get comparable plot
kde_G_genetta &lt;- SpatialPixelsDataFrame(points = kde_G_genetta$eval.points, 
                                        data = data.frame(est = kde_G_genetta$estimate))
plot(kde_G_genetta)

but this doesn't offer the same bandwidth options as kernelUD. I particularly don't understand why kernelUD takes a single scalar value for "h" whereas kde takes a matrix when they are both applied to a multivariate problem. kde also doesn't seem to cope well with limited sampling points, e.g. Alcelaphus buselaphus which gives the error:

Error in chol.default(S) : 
  the leading minor of order 2 is not positive definite

A_buselaphus &lt;- as.data.frame(matrix(c(0.5109837, 0.1247711,
                                       0.5109837, 0.1247711,
                                       0.5893287, 0.1613403,
                                       0.5893287, 0.1613403,
                                       0.5893287, 0.1613403), 
                                     ncol = 2, byrow = TRUE))
# using kernelUD
ade_A_buselaphus &lt;- kernelUD(SpatialPoints(A_buselaphus), 
                             h = &quot;href&quot;, 
                             grid = mask_xy_grid, 
                             kern = &quot;bivnorm&quot;)
plot(ade_A_buselaphus)
# using kde
kde_A_buselaphus &lt;- kde(A_buselaphus,
                        eval.points = mask_xy)
# lazily get comparable plot
kde_A_buselaphus &lt;- SpatialPixelsDataFrame(points = kde_A_buselaphus$eval.points, 
                                           data = data.frame(est = kde_A_buselaphus$estimate))
plot(kde_A_buselaphus)

I could use MASS, but I run into the same bandwidth problem as it takes a "vector of bandwidths for x and y directions":

library(MASS)
mask.xy &lt;- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
MASS_G_genetta &lt;- kde2d(G_genetta$V1,
                        G_genetta$V2,
                        n = c(100, 100),
                        lims = c(range(mask.xy$x), range(mask.xy$y)))
# lazily get comparable plots
MASS_G_genetta &lt;- SpatialPixelsDataFrame(points = mask.xy, 
                                         data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# same axis
plot(ade_G_genetta, zlim = c(0,75))
plot(kde_G_genetta, zlim = c(0,75))
plot(MASS_G_genetta, zlim = c(0,75))

答案1

得分: 0

在kernelUD的源代码中进行了一些研究，我找到了一个可以接受的，尽管不完整的解决方案。我只查看了与使用自定义方法计算h的双变量正态核有关的文档和源代码。

为了找到解决方案，我沿着CRAN存储库中的一系列函数进行了追踪：
kernelUD -> .kernelUDs -> void kernelhr -> void epa

这导致我在epa函数中找到了以下注释：
/* 仅保留不远于当前像素的4*fen的点 */

fen等于h，根据文档，默认情况下通过以下方式计算：

对于双变量正态核，我可以通用地写成如下形式：
(sqrt(0.5*(var(species_xy[[1]]) + var(species_xy[[2]]))))*(nrow(species_xy)^-(1/6))

由于MASS中的kde2d需要一个标量向量，我可以为每个维度提供h*4。因此，Genetta genetta的解决方案如下：

library(adehabitatHR)
library(MASS)
# 使用adehabitatHR
G_genetta <- as.data.frame(matrix(c(0.5519758, 0.27524548,
                                    0.5725632, 0.12309273,
                                    0.5547747, 0.06100429,
                                    0.6110925, 0.16211416,
                                    0.5951087, 0.09316814,
                                    0.5333567, 0.11673812,
                                    0.5855461, 0.11170616,
                                    0.5221387, 0.11061583,
                                    0.5848452, 0.17213175), 
                                  ncol = 2, byrow = TRUE))
mask_xy_grid <- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) <- ~ x + y
gridded(mask_xy_grid) <- TRUE
ade_G_genetta <- kernelUD(SpatialPoints(G_genetta), 
                          h = "href", 
                          grid = mask_xy_grid, 
                          kern = "bivnorm")
plot(ade_G_genetta)
# 使用MASS
mask.xy <- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
H <- (sqrt(0.5*(var(G_genetta[[1]]) + var(G_genetta[[2]]))))*(nrow(G_genetta)^-(1/6))
MASS_G_genetta <- kde2d(G_genetta$V1,
                        G_genetta$V2,
                        n = c(100, 100),
                        h = c(H*4, H*4),
                        lims = c(range(mask.xy$x), range(mask.xy$y)))
# 获取可比较的绘图
MASS_G_genetta <- SpatialPixelsDataFrame(points = mask.xy, 
                                         data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# 确认这与kernelUD输出中的相同h：
kde_G_genetta@h[["h"]]
H
kde_G_genetta@h[["h"]] == H

输出中的确切值略有不同，可能是两种方法之间精度差异的原因。两个对象的快速摘要显示它们在功能上是相同的，并支持它们绘图的相似性。

summary(kde_G_genetta@data[["ud"]])
summary(MASS_G_genetta$est)

英文:

Having done some digging into the source code of kernelUD I have found a tolerable, albeit incomplete, solution. I have only been looking within documentation and source code which relates to a bivariate normal kernel using the ad hoc method for calculating h.

To find the solution I followed a trail of functions within the CRAN repo:
kernelUD -> .kernelUDs -> void kernelhr -> void epa

which led me to this comment within the epa function:
/* Keep only the points no further than 4*fen of the current pixel */

fen equates to h which according to the documentation is by default calculated with:

for a bivariate normal kernel. I can write this generically as follows:
(sqrt(0.5*(var(species_xy[[1]]) + var(species_xy[[2]]))))*(nrow(species_xy)^-(1/6))

Since kde2d from MASS takes a scalar vector, I can supply it with h*4 for each dimension. The solution for Genetta genetta would therefore be:

library(adehabitatHR)
library(MASS)
# using adehabitatHR
G_genetta &lt;- as.data.frame(matrix(c(0.5519758, 0.27524548,
0.5725632, 0.12309273,
0.5547747, 0.06100429,
0.6110925, 0.16211416,
0.5951087, 0.09316814,
0.5333567, 0.11673812,
0.5855461, 0.11170616,
0.5221387, 0.11061583,
0.5848452, 0.17213175), 
ncol = 2, byrow = TRUE))
mask_xy_grid &lt;- expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01))
coordinates(mask_xy_grid) &lt;- ~ x + y
gridded(mask_xy_grid) &lt;- TRUE
ade_G_genetta &lt;- kernelUD(SpatialPoints(G_genetta), 
h = &quot;href&quot;, 
grid = mask_xy_grid, 
kern = &quot;bivnorm&quot;)
plot(ade_G_genetta)
# using MASS
mask.xy &lt;- as.data.frame(expand.grid(x = seq(0.01, 1, by = 0.01), y = seq(0.01, 1, by = 0.01)))
H &lt;- (sqrt(0.5*(var(G_genetta[[1]]) + var(G_genetta[[2]]))))*(nrow(G_genetta)^-(1/6))
MASS_G_genetta &lt;- kde2d(G_genetta$V1,
G_genetta$V2,
n = c(100, 100),
h = c(H*4, H*4),
lims = c(range(mask.xy$x), range(mask.xy$y)))
# lazily get comparable plots
MASS_G_genetta &lt;- SpatialPixelsDataFrame(points = mask.xy, 
data = data.frame(est = melt(MASS_G_genetta$z)$value))
plot(MASS_G_genetta)
# to confirm this is the same h as the kernelUD output:
kde_G_genetta@h[[&quot;h&quot;]]
H
kde_G_genetta@h[[&quot;h&quot;]] == H

The exact values in the output differ minorly, presumably due to differences in precision between the two methods. A quick summary of the two objects shows us they are functionally the same and backs up the similarity in their plots.

summary(kde_G_genetta@data[[&quot;ud&quot;]]
summary(MASS_G_genetta$est)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

是否有基于sf（或df）的替代方法来估计双变量正态核密度？

问题

答案1

SpatRaster粗糙度指数方程（地形，v =“roughness”）是如何工作的？

R: 模拟鱼塘的人口

如何根据列中特定的字符串重新塑造或转置数据框在R中？

声学复杂度指数时间序列输出

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。