如何根据权重从空间网格中进行抽样?

huangapple go评论119阅读模式
英文:

How to sample from a spatial grid based on weights?

问题

现在,我想将sim_data中的每个观测随机分配到spatial_data中的一个单元格,其中success的值为1的观测更有可能分配到具有较高sim_1值的单元格。

谢谢!

英文:

I am running some simulations, and want to generate spatially-autocorrelated data. May be going about this completely the wrong way...

First, I have generated a dataset of observations, with the variable success varying by age and country. As this is quite large, this is what it looks like:

library(tidyverse)
library(sf)
library(sfdep)

#read in the dataset of simulated data
sim_data <- read_rds("sim_data.rds")

#summarise what the data look like
sim_data %>% 
  head(20)
#>    countries      age prevalence success       age_z
#> 1     Malawi 3.946353 0.16114073       1  0.94643938
#> 2     Malawi 1.538666 0.10983733       0 -1.46124749
#> 3     Malawi 3.627969 0.07027045       0  0.62805528
#> 4     Malawi 3.820259 0.15889337       0  0.82034525
#> 5     Malawi 2.830967 0.13286989       0 -0.16894678
#> 6     Malawi 3.876449 0.07469908       0  0.87653512
#> 7     Malawi 4.738689 0.17526246       0  1.73877511
#> 8     Malawi 2.021715 0.11844667       0 -0.97819858
#> 9     Malawi 2.849171 0.05639000       1 -0.15074259
#> 10    Malawi 4.760058 0.17564332       0  1.76014421
#> 11    Malawi 4.912906 0.16997609       0  1.91299183
#> 12    Malawi 1.469949 0.03180826       0 -1.52996444
#> 13    Malawi 2.899988 0.14249146       1 -0.09992556
#> 14    Malawi 3.241331 0.14018377       1  0.24141710
#> 15    Malawi 4.616126 0.08788227       0  1.61621167
#> 16    Malawi 1.554841 0.11851702       0 -1.44507321
#> 17    Malawi 4.955567 0.17073644       0  1.95565303
#> 18    Malawi 4.786673 0.09092192       0  1.78675905
#> 19    Malawi 1.329750 0.11450525       0 -1.67016365
#> 20    Malawi 3.056847 0.13689573       0  0.05693326

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

Now I create a spatial grid, and generate some spatially-autocorrelated data (with lots of help from: https://josiahparry.com/posts/csr.html)


#make a spatial grid
grid &lt;- st_make_grid(cellsize = c(1, 1), n = 10, offset = c(0, 0)) |&gt; 
  as_tibble() |&gt; 
  st_as_sf() |&gt; 
  mutate(
    id = row_number(),
    nb = st_contiguity(geometry),
    wt = st_weights(nb)
    )

#generate some spatially correlated data
nb &lt;- grid[[&quot;nb&quot;]]
wt &lt;- grid[[&quot;wt&quot;]]

x &lt;-  geostan::sim_sar(w = wt_as_matrix(nb, wt), rho = 0.78)

#function to simulate spatial weights by permutation
#example taken from here: https://josiahparry.com/posts/csr.html
permute_nb &lt;- function(nb) {
  # first get the cardinality
  cards &lt;- st_cardinalties(nb)
  # instantiate empty list to fill
  nb_perm &lt;- vector(mode = &quot;list&quot;, length = length(nb))
  
  # instantiate an index
  index &lt;- seq_along(nb)
  
  # iterate through and full nb_perm
  for (i in index) {
    # remove i from the index, then sample and assign
    nb_perm[[i]] &lt;- sample(index[-i], cards[i])
  }
  
  structure(nb_perm, class = &quot;nb&quot;)
}

#we only simulate 1 weighted grid here for the example
nsim = 1 

sim_spatial &lt;- replicate(
  nsim, {
    nb_perm &lt;- permute_nb(nb)
    st_lag(x, nb_perm, st_weights(nb_perm))
  }
)

#tidy up the grid
sim_spatial &lt;- sim_spatial %&gt;%
  as.data.frame() %&gt;%
  rename_all(funs(paste0(&quot;sim_&quot;, str_remove(.,&quot;V&quot;)))) %&gt;%
  mutate(id = row_number()) %&gt;%
  left_join(grid)

sim_spatial %&gt;%
  ggplot() +
  geom_sf(aes(fill=sim_1, geometry=geometry)) +
  scale_fill_viridis_c()

如何根据权重从空间网格中进行抽样?<!-- -->

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

Now, I would like to allocate each observation in sim_data randomly to a cell in spatial_data, with observations with a 1 value for success more likely to be allocated to a cell with a higher sim_1 value.

Thanks!

答案1

得分: 1

以下是您要翻译的内容:

"It's difficult to know how you want the column sim_1 to be converted to weights (since some of them are negative), but for now let's just normalize them into the 0-1 range.

From the comments, it seems you have more rows in sim_data than sim_spatial, so you can have multiple points assigned within each grid square.

For better illustration, we will create a longer sim_data frame with 1,000 rows. The only relevant column will be success, so we will do:

set.seed(1)
sim_data <- data.frame(success = rbinom(1000, 1, 0.2))

Now we can group this data frame by success, and sample grid squares according to the normalized weights, which will be inverted for success == 0. We can then left join the sim_spatial polygons, and take a random sample of a point within the relevant grid square.

sim_data <- sim_data %>%
  group_by(success) %>%
  mutate(id = sample(sim_spatial$id, n(), TRUE,
                     prob = if(mean(success) == 1) {
                  with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
                     } else {
                  1 - with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
                  })) %>%
  left_join(sim_spatial, by = "id") %>%
  rowwise() %>%
  mutate(geometry = st_sample(geometry, 1))

Now if we plot the points over the grid, we will see that the success favor the highest values of sim_1

sim_spatial %>%
  ggplot() +
  geom_sf(aes(fill=sim_1, geometry = geometry)) +
  geom_sf(aes(color = factor(success), geometry = geometry), data = sim_data) +
  scale_fill_viridis_c() +
  scale_color_manual("Success", values = c("black", "red"))

如何根据权重从空间网格中进行抽样?

Furthermore, since we have left joined, we can see how the sim_1 value of the assigned geographical area affects the probability of success in our sim_data

ggplot(sim_data, aes(sim_1, success)) +
  geom_smooth(method = glm, method.args = list(family = binomial)) +
  scale_y_continuous("Probability of success", limits = c(0, 1))

如何根据权重从空间网格中进行抽样?

英文:

It's difficult to know how you want the column sim_1 to be converted to weights (since some of them are negative), but for now let's just normalize them into the 0-1 range.

From the comments, it seems you have more rows in sim_data than sim_spatial, so you can have multiple points assigned within each grid square.

For better illustration, we will create a longer sim_data frame with 1,000 rows. The only relevant column will be success, so we will do:

set.seed(1)
sim_data &lt;- data.frame(success = rbinom(1000, 1, 0.2))

Now we can group this data frame by success, and sample grid squares according to the normalized weights, which will be inverted for success == 0. We can then left join the sim_spatial polygons, and take a random sample of a point within the relevant grid square.

sim_data &lt;- sim_data %&gt;%
  group_by(success) %&gt;%
  mutate(id = sample(sim_spatial$id, n(), TRUE,
                     prob = if(mean(success) == 1) {
                  with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
                     } else {
                  1 - with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
                  })) %&gt;%
  left_join(sim_spatial, by = &quot;id&quot;) %&gt;%
  rowwise() %&gt;%
  mutate(geometry = st_sample(geometry, 1))

Now if we plot the points over the grid, we will see that the success favour the highest values of sim_1

sim_spatial %&gt;%
  ggplot() +
  geom_sf(aes(fill=sim_1, geometry = geometry)) +
  geom_sf(aes(color = factor(success), geometry = geometry), data = sim_data) +
  scale_fill_viridis_c() +
  scale_color_manual(&quot;Success&quot;, values = c(&quot;black&quot;, &quot;red&quot;))

如何根据权重从空间网格中进行抽样?

Furthermore, since we have left joined, we can see how the sim_1 value of the assigned geographical area affects the probability of success in our sim_data

ggplot(sim_data, aes(sim_1, success)) +
  geom_smooth(method = glm, method.args = list(family = binomial)) +
  scale_y_continuous(&quot;Probability of success&quot;, limits = c(0, 1))

如何根据权重从空间网格中进行抽样?

huangapple
  • 本文由 发表于 2023年8月4日 21:13:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76836275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定