英文:
How to sample from a spatial grid based on weights?
问题
现在,我想将sim_data
中的每个观测随机分配到spatial_data
中的一个单元格,其中success
的值为1
的观测更有可能分配到具有较高sim_1
值的单元格。
谢谢!
英文:
I am running some simulations, and want to generate spatially-autocorrelated data. May be going about this completely the wrong way...
First, I have generated a dataset of observations, with the variable success
varying by age
and country
. As this is quite large, this is what it looks like:
library(tidyverse)
library(sf)
library(sfdep)
#read in the dataset of simulated data
sim_data <- read_rds("sim_data.rds")
#summarise what the data look like
sim_data %>%
head(20)
#> countries age prevalence success age_z
#> 1 Malawi 3.946353 0.16114073 1 0.94643938
#> 2 Malawi 1.538666 0.10983733 0 -1.46124749
#> 3 Malawi 3.627969 0.07027045 0 0.62805528
#> 4 Malawi 3.820259 0.15889337 0 0.82034525
#> 5 Malawi 2.830967 0.13286989 0 -0.16894678
#> 6 Malawi 3.876449 0.07469908 0 0.87653512
#> 7 Malawi 4.738689 0.17526246 0 1.73877511
#> 8 Malawi 2.021715 0.11844667 0 -0.97819858
#> 9 Malawi 2.849171 0.05639000 1 -0.15074259
#> 10 Malawi 4.760058 0.17564332 0 1.76014421
#> 11 Malawi 4.912906 0.16997609 0 1.91299183
#> 12 Malawi 1.469949 0.03180826 0 -1.52996444
#> 13 Malawi 2.899988 0.14249146 1 -0.09992556
#> 14 Malawi 3.241331 0.14018377 1 0.24141710
#> 15 Malawi 4.616126 0.08788227 0 1.61621167
#> 16 Malawi 1.554841 0.11851702 0 -1.44507321
#> 17 Malawi 4.955567 0.17073644 0 1.95565303
#> 18 Malawi 4.786673 0.09092192 0 1.78675905
#> 19 Malawi 1.329750 0.11450525 0 -1.67016365
#> 20 Malawi 3.056847 0.13689573 0 0.05693326
<sup>Created on 2023-08-04 with reprex v2.0.2</sup>
Now I create a spatial grid, and generate some spatially-autocorrelated data (with lots of help from: https://josiahparry.com/posts/csr.html)
#make a spatial grid
grid <- st_make_grid(cellsize = c(1, 1), n = 10, offset = c(0, 0)) |>
as_tibble() |>
st_as_sf() |>
mutate(
id = row_number(),
nb = st_contiguity(geometry),
wt = st_weights(nb)
)
#generate some spatially correlated data
nb <- grid[["nb"]]
wt <- grid[["wt"]]
x <- geostan::sim_sar(w = wt_as_matrix(nb, wt), rho = 0.78)
#function to simulate spatial weights by permutation
#example taken from here: https://josiahparry.com/posts/csr.html
permute_nb <- function(nb) {
# first get the cardinality
cards <- st_cardinalties(nb)
# instantiate empty list to fill
nb_perm <- vector(mode = "list", length = length(nb))
# instantiate an index
index <- seq_along(nb)
# iterate through and full nb_perm
for (i in index) {
# remove i from the index, then sample and assign
nb_perm[[i]] <- sample(index[-i], cards[i])
}
structure(nb_perm, class = "nb")
}
#we only simulate 1 weighted grid here for the example
nsim = 1
sim_spatial <- replicate(
nsim, {
nb_perm <- permute_nb(nb)
st_lag(x, nb_perm, st_weights(nb_perm))
}
)
#tidy up the grid
sim_spatial <- sim_spatial %>%
as.data.frame() %>%
rename_all(funs(paste0("sim_", str_remove(.,"V")))) %>%
mutate(id = row_number()) %>%
left_join(grid)
sim_spatial %>%
ggplot() +
geom_sf(aes(fill=sim_1, geometry=geometry)) +
scale_fill_viridis_c()
<!-- -->
<sup>Created on 2023-08-04 with reprex v2.0.2</sup>
Now, I would like to allocate each observation in sim_data
randomly to a cell in spatial_data
, with observations with a 1
value for success
more likely to be allocated to a cell with a higher sim_1
value.
Thanks!
答案1
得分: 1
以下是您要翻译的内容:
"It's difficult to know how you want the column sim_1
to be converted to weights (since some of them are negative), but for now let's just normalize them into the 0-1 range.
From the comments, it seems you have more rows in sim_data
than sim_spatial
, so you can have multiple points assigned within each grid square.
For better illustration, we will create a longer sim_data
frame with 1,000 rows. The only relevant column will be success
, so we will do:
set.seed(1)
sim_data <- data.frame(success = rbinom(1000, 1, 0.2))
Now we can group this data frame by success
, and sample grid squares according to the normalized weights, which will be inverted for success == 0
. We can then left join the sim_spatial
polygons, and take a random sample of a point within the relevant grid square.
sim_data <- sim_data %>%
group_by(success) %>%
mutate(id = sample(sim_spatial$id, n(), TRUE,
prob = if(mean(success) == 1) {
with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
} else {
1 - with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
})) %>%
left_join(sim_spatial, by = "id") %>%
rowwise() %>%
mutate(geometry = st_sample(geometry, 1))
Now if we plot the points over the grid, we will see that the success favor the highest values of sim_1
sim_spatial %>%
ggplot() +
geom_sf(aes(fill=sim_1, geometry = geometry)) +
geom_sf(aes(color = factor(success), geometry = geometry), data = sim_data) +
scale_fill_viridis_c() +
scale_color_manual("Success", values = c("black", "red"))
Furthermore, since we have left joined, we can see how the sim_1
value of the assigned geographical area affects the probability of success in our sim_data
ggplot(sim_data, aes(sim_1, success)) +
geom_smooth(method = glm, method.args = list(family = binomial)) +
scale_y_continuous("Probability of success", limits = c(0, 1))
英文:
It's difficult to know how you want the column sim_1
to be converted to weights (since some of them are negative), but for now let's just normalize them into the 0-1 range.
From the comments, it seems you have more rows in sim_data
than sim_spatial
, so you can have multiple points assigned within each grid square.
For better illustration, we will create a longer sim_data
frame with 1,000 rows. The only relevant column will be success
, so we will do:
set.seed(1)
sim_data <- data.frame(success = rbinom(1000, 1, 0.2))
Now we can group this data frame by success
, and sample grid squares according to the normalized weights, which will be inverted for success == 0
. We can then left join the sim_spatial
polygons, and take a random sample of a point within the relevant grid square.
sim_data <- sim_data %>%
group_by(success) %>%
mutate(id = sample(sim_spatial$id, n(), TRUE,
prob = if(mean(success) == 1) {
with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
} else {
1 - with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
})) %>%
left_join(sim_spatial, by = "id") %>%
rowwise() %>%
mutate(geometry = st_sample(geometry, 1))
Now if we plot the points over the grid, we will see that the success favour the highest values of sim_1
sim_spatial %>%
ggplot() +
geom_sf(aes(fill=sim_1, geometry = geometry)) +
geom_sf(aes(color = factor(success), geometry = geometry), data = sim_data) +
scale_fill_viridis_c() +
scale_color_manual("Success", values = c("black", "red"))
Furthermore, since we have left joined, we can see how the sim_1
value of the assigned geographical area affects the probability of success in our sim_data
ggplot(sim_data, aes(sim_1, success)) +
geom_smooth(method = glm, method.args = list(family = binomial)) +
scale_y_continuous("Probability of success", limits = c(0, 1))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论