英文:
R Spatstat: Identify nearest neighbours for further use
问题
我有一个包含488个GPS点(经度和纬度)的数据框。对于这488个点,我想找到它们的两个最近邻居。
到目前为止,我已经创建了一个点模式对象,并计算了距离最近的两个点(如下所示)。但是,我想进一步,能够通过它们在原始数据集中的ID来识别这些最近的点。
目前,我的脚本工作方式如下:
# 1. 将x和y坐标存储在两个向量中
lon <- data$longitude
lat <- data$latitude
# 2. 创建两个具有包含所有点的三角形维度的向量xrange和yrange
xrange <- range(lon, na.rm=T)
yrange <- range(lat, na.rm=T)
# 3. 创建ppp
lf <- ppp(lon, lat, xrange, yrange)
plot(lf)
nndist(lf, k = 1:2)
给我返回的结果(前5行示例)是:
dist.1 dist.2
[1,] 1.426925e-03 0.0017007414
[2,] 1.017287e-03 0.0015574895
[3,] 6.502012e-04 0.0010172867
[4,] 6.502012e-04 0.0007202307
[5,] 7.202307e-04 0.0010472445
但是我想要能够将这些结果链接回原始数据集中的"hhid",类似于以下方式:
hhid dist.1 dist.1.hhid dist.2 dist.1.hhid
1 1.426925e-03 7 0.0017007414 3
2 1.017287e-03 6 0.0015574895 4
3 6.502012e-04 10 0.0010172867 5
4 6.502012e-04 2 0.0007202307 8
5 7.202307e-04 1 0.0010472445 13
原始数据集的前20行如下:
structure(list(hhid = c(2004L, 2006L, 2009L, 2012L, 2013L, 2020L,
2022L, 2023L, 2028L, 2029L, 2035L, 2036L, 2043L, 2046L, 2047L,
2059L, 2062L, 2063L, 2065L, 2066L), longitude = c(-1.478302479,
-1.477469802, -1.476488709, -1.476146936, -1.47547996, -1.475799441,
-1.475903392, -1.476232767, -1.476053953, -1.477196693, -1.476906657,
-1.478778243, -1.480723381, -1.433436394, -1.433033824, -1.428791046,
-1.431989908, -1.432058454, -1.43134892, -1.430848002), latitude = c(12.10552216,
12.10700512, 12.10673618, 12.10618305, 12.10645485, 12.10846806,
12.1080761, 12.10830975, 12.11114883, 12.11076546, 12.11197853,
12.11345387, 12.10725021, 12.1183548, 12.11699867, 12.11466122,
12.1154108, 12.11545277, 12.11554337, 12.11567497)), row.names = c(NA,
20L), class = "data.frame")
希望这可以帮助你将结果链接回原始数据集。
英文:
I have a dataframe with 488 GPS points (long and lat). For each 488 points I would like to find their 2 closest neighbours.
So far I have created a point pattern object and computed the distance from the nearest two points (below). However, I would like to go a step further and be able to identify these nearest points by their ID from the original dataset.
Currently, my script works like:
# 1. store x and y coords in two vectors
lon <- data$longitude
lat <- data$latitude
# 2. create two vectors xrange and yrange with dimensions of triangle that contain all points
xrange <- range(lon, na.rm=T)
yrange <- range(lat, na.rm=T)
# 3. create ppp
lf <- ppp(lon, lat, xrange, yrange)
plot(lf)
nndist(lf, k = 1:2)
Giving me (example of top 5 results):
dist.1 dist.2
[1,] 1.426925e-03 0.0017007414
[2,] 1.017287e-03 0.0015574895
[3,] 6.502012e-04 0.0010172867
[4,] 6.502012e-04 0.0007202307
[5,] 7.202307e-04 0.0010472445
But I would like to be able to link this back to the "hhid" from the original dataset to something like this:
hhid dist.1 dist.1.hhid dist.2 dist.1.hhid
1 1.426925e-03 7 0.0017007414 3
2 1.017287e-03 6 0.0015574895 4
3 6.502012e-04 10 0.0010172867 5
4 6.502012e-04 2 0.0007202307 8
5 7.202307e-04 1 0.0010472445 13
First 20 rows of original dataset :
structure(list(hhid = c(2004L, 2006L, 2009L, 2012L, 2013L, 2020L,
2022L, 2023L, 2028L, 2029L, 2035L, 2036L, 2043L, 2046L, 2047L,
2059L, 2062L, 2063L, 2065L, 2066L), longitude = c(-1.478302479,
-1.477469802, -1.476488709, -1.476146936, -1.47547996, -1.475799441,
-1.475903392, -1.476232767, -1.476053953, -1.477196693, -1.476906657,
-1.478778243, -1.480723381, -1.433436394, -1.433033824, -1.428791046,
-1.431989908, -1.432058454, -1.43134892, -1.430848002), latitude = c(12.10552216,
12.10700512, 12.10673618, 12.10618305, 12.10645485, 12.10846806,
12.1080761, 12.10830975, 12.11114883, 12.11076546, 12.11197853,
12.11345387, 12.10725021, 12.1183548, 12.11699867, 12.11466122,
12.1154108, 12.11545277, 12.11554337, 12.11567497)), row.names = c(NA,
20L), class = "data.frame")
答案1
得分: 2
自从您使用了spatstat
命令nndist
来计算最近邻距离,您可以直接使用相应的命令nnwhich
来获取最近邻的标识符(序列号)。
然而,纬度和经度坐标不能视为x和y坐标。即使在地球的一个小区域内,一单位的纬度与一单位的经度并不相等。
未来版本的spatstat
将处理纬度/经度坐标。目前,您可能最好使用sp
和rgeos
包,如jpsmith在另一个答案中所描述的。
英文:
Since you used the spatstat
command nndist
to compute the nearest-neighbour distances, you could just use the corresponding command nnwhich
to obtain the identifiers (serial numbers) of the nearest neighbours.
However, latitude and longitude coordinates cannot be treated as x and y coordinates. Even in a small region of the globe, one unit of latitude is not the same distance as one unit of longitude.
Future versions of spatstat
will handle latitude/longitude coordinates. For now, you are probably better advised to use the sp
and rgeos
packages as described in the separate answer by jpsmith.
答案2
得分: 2
如果您计划在进一步分析中使用spatstat
,您首先需要投影到平面坐标,然后可以使用nnwhich()
,如@adrian-baddely所述。请注意,没有通用最佳坐标参考系统(CRS)可供使用,因此需要您进行选择。有一个名为crsuggest
的R软件包可以帮助您找到相关的CRS。在这里我使用覆盖布基纳法索的UTM区域30N(srid=32630
):
# 代码部分,不要翻译
在spatstat
中的“序列号”将是给定点在列表中的顺序的索引。下面是这些索引以及hhid
:
# 代码部分,不要翻译
英文:
If you plan to use spatstat
for further analysis you need to project to
planar coordinates first and then you can use nnwhich()
as described by
@adrian-baddely. Note that there is no universaly best coordinate reference
system (CRS) to use, so it requires a choice from you. There is an R package
crsuggest
that can help you find relevant ones. Here I use UTM zone 30N
which covers all of Burkina Faso as far as I can tell (srid=32630
):
library(sf)
library(spatstat)
hhid <- c(2004L, 2006L, 2009L, 2012L, 2013L, 2020L,
2022L, 2023L, 2028L, 2029L, 2035L, 2036L, 2043L, 2046L, 2047L,
2059L, 2062L, 2063L, 2065L, 2066L)
longitude <- c(-1.478302479,
-1.477469802, -1.476488709, -1.476146936, -1.47547996, -1.475799441,
-1.475903392, -1.476232767, -1.476053953, -1.477196693, -1.476906657,
-1.478778243, -1.480723381, -1.433436394, -1.433033824, -1.428791046,
-1.431989908, -1.432058454, -1.43134892, -1.430848002)
latitude <- c(12.10552216,
12.10700512, 12.10673618, 12.10618305, 12.10645485, 12.10846806,
12.1080761, 12.10830975, 12.11114883, 12.11076546, 12.11197853,
12.11345387, 12.10725021, 12.1183548, 12.11699867, 12.11466122,
12.1154108, 12.11545277, 12.11554337, 12.11567497)
data <- data.frame(hhid, longitude, latitude)
dat_gps <- st_as_sf(data, crs = 4326, coords = c("longitude", "latitude"))
dat_proj <- st_transform(dat_gps, crs = 32630)
dat_ppp <- as.ppp(dat_proj)
ids <- marks(dat_ppp)
nd <- nndist(dat_ppp, k = 1:2)
nw <- nnwhich(dat_ppp, k = 1:2)
The “serial number” in spatstat
will just be the index of the given point
in the order the points are listed. Below are these as well as hhid
:
data.frame(round(nd,2), nw, hhid = ids, dist.1.hhid = ids[nw[,1]], dist.2.hhid = ids[nw[,2]])
#> dist.1 dist.2 which.1 which.2 hhid dist.1.hhid dist.2.hhid
#> 1 187.42 238.78 2 3 2004 2006 2009
#> 2 110.86 170.31 3 4 2006 2009 2012
#> 3 71.61 110.86 4 2 2009 2012 2006
#> 4 71.61 78.58 3 5 2012 2009 2013
#> 5 78.58 114.13 4 3 2013 2012 2009
#> 6 44.81 50.31 7 8 2020 2022 2023
#> 7 44.20 44.81 8 6 2022 2023 2020
#> 8 44.20 50.31 7 6 2023 2022 2020
#> 9 130.53 131.42 11 10 2028 2035 2029
#> 10 131.42 137.85 9 11 2029 2028 2035
#> 11 130.53 137.85 9 10 2035 2028 2029
#> 12 261.03 343.62 11 10 2036 2035 2029
#> 13 325.55 355.20 1 2 2043 2004 2006
#> 14 156.28 354.33 15 18 2046 2047 2063
#> 15 156.28 201.28 14 18 2047 2046 2063
#> 16 250.42 295.03 20 19 2059 2066 2065
#> 17 8.79 71.30 18 19 2062 2063 2065
#> 18 8.79 77.88 17 19 2063 2062 2065
#> 19 56.44 71.30 20 17 2065 2066 2062
#> 20 56.44 127.69 19 17 2066 2065 2062
答案3
得分: 0
这似乎是对此处提出的问题的扩展。在该问题的已接受答案的基础上,扩展到您特定的情况,考虑最接近的两个邻居,您可以执行以下操作:
library(sp)
library(rgeos)
# 问题中的数据结构赋值为 "df"
spatialDF <- df
coordinates(spatialDF) <- ~longitude + latitude
dists <- gDistance(spatialDF, byid = TRUE)
min.2dists <- apply(dists, 1, function(x) order(x, decreasing = FALSE)[2:3])
# 最近的邻居
df$hhid1 <- df[min.2dists[1,], "hhid"]
df$dist1 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[2])
# 第二近的邻居
df$hhid2 <- df[min.2dists[2,], "hhid"]
df$dist2 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[3])
输出:
# hhid longitude latitude hhid1 dist1 hhid2 dist2
# 1 2004 -1.478302 12.10552 2006 1.700741e-03 2009 0.0021825687
# 2 2006 -1.477470 12.10701 2009 1.017287e-03 2012 0.0015574895
# 3 2009 -1.476489 12.10674 2012 6.502012e-04 2006 0.0010172867
# 4 2012 -1.476147 12.10618 2009 6.502012e-04 2013 0.0007202307
# 5 2013 -1.475480 12.10645 2012 7.202307e-04 2009 0.0010472445
# ...
英文:
This seems to be good an extension of the question posed here. Building off of that question's accepted answer to extend to your specific situation examining the closest two neighbors, you could do:
library(sp)
library(rgeos)
# dput structure in question assigned as "df"
spatialDF <- df
coordinates(spatialDF) <- ~longitude + latitude
dists <- gDistance(spatialDF, byid = TRUE)
min.2dists <- apply(dists, 1, function(x) order(x, decreasing = FALSE)[2:3])
# closest
df$hhid1 <- df[min.2dists[1,],"hhid"]
df$dist1 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[2])
# second closest
df$hhid2 <- df[min.2dists[2,],"hhid"]
df$dist2 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[3])
Output:
# hhid longitude latitude hhid1 dist1 hhid2 dist2
# 1 2004 -1.478302 12.10552 2006 1.700741e-03 2009 0.0021825687
# 2 2006 -1.477470 12.10701 2009 1.017287e-03 2012 0.0015574895
# 3 2009 -1.476489 12.10674 2012 6.502012e-04 2006 0.0010172867
# 4 2012 -1.476147 12.10618 2009 6.502012e-04 2013 0.0007202307
# 5 2013 -1.475480 12.10645 2012 7.202307e-04 2009 0.0010472445
# ...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论