如何根据权重从空间网格中进行抽样?

huangapple go评论156阅读模式
英文:

How to sample from a spatial grid based on weights?

问题

现在,我想将sim_data中的每个观测随机分配到spatial_data中的一个单元格,其中success的值为1的观测更有可能分配到具有较高sim_1值的单元格。

谢谢!

英文:

I am running some simulations, and want to generate spatially-autocorrelated data. May be going about this completely the wrong way...

First, I have generated a dataset of observations, with the variable success varying by age and country. As this is quite large, this is what it looks like:

  1. library(tidyverse)
  2. library(sf)
  3. library(sfdep)
  4. #read in the dataset of simulated data
  5. sim_data <- read_rds("sim_data.rds")
  6. #summarise what the data look like
  7. sim_data %>%
  8. head(20)
  9. #> countries age prevalence success age_z
  10. #> 1 Malawi 3.946353 0.16114073 1 0.94643938
  11. #> 2 Malawi 1.538666 0.10983733 0 -1.46124749
  12. #> 3 Malawi 3.627969 0.07027045 0 0.62805528
  13. #> 4 Malawi 3.820259 0.15889337 0 0.82034525
  14. #> 5 Malawi 2.830967 0.13286989 0 -0.16894678
  15. #> 6 Malawi 3.876449 0.07469908 0 0.87653512
  16. #> 7 Malawi 4.738689 0.17526246 0 1.73877511
  17. #> 8 Malawi 2.021715 0.11844667 0 -0.97819858
  18. #> 9 Malawi 2.849171 0.05639000 1 -0.15074259
  19. #> 10 Malawi 4.760058 0.17564332 0 1.76014421
  20. #> 11 Malawi 4.912906 0.16997609 0 1.91299183
  21. #> 12 Malawi 1.469949 0.03180826 0 -1.52996444
  22. #> 13 Malawi 2.899988 0.14249146 1 -0.09992556
  23. #> 14 Malawi 3.241331 0.14018377 1 0.24141710
  24. #> 15 Malawi 4.616126 0.08788227 0 1.61621167
  25. #> 16 Malawi 1.554841 0.11851702 0 -1.44507321
  26. #> 17 Malawi 4.955567 0.17073644 0 1.95565303
  27. #> 18 Malawi 4.786673 0.09092192 0 1.78675905
  28. #> 19 Malawi 1.329750 0.11450525 0 -1.67016365
  29. #> 20 Malawi 3.056847 0.13689573 0 0.05693326

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

Now I create a spatial grid, and generate some spatially-autocorrelated data (with lots of help from: https://josiahparry.com/posts/csr.html)

  1. #make a spatial grid
  2. grid &lt;- st_make_grid(cellsize = c(1, 1), n = 10, offset = c(0, 0)) |&gt;
  3. as_tibble() |&gt;
  4. st_as_sf() |&gt;
  5. mutate(
  6. id = row_number(),
  7. nb = st_contiguity(geometry),
  8. wt = st_weights(nb)
  9. )
  10. #generate some spatially correlated data
  11. nb &lt;- grid[[&quot;nb&quot;]]
  12. wt &lt;- grid[[&quot;wt&quot;]]
  13. x &lt;- geostan::sim_sar(w = wt_as_matrix(nb, wt), rho = 0.78)
  14. #function to simulate spatial weights by permutation
  15. #example taken from here: https://josiahparry.com/posts/csr.html
  16. permute_nb &lt;- function(nb) {
  17. # first get the cardinality
  18. cards &lt;- st_cardinalties(nb)
  19. # instantiate empty list to fill
  20. nb_perm &lt;- vector(mode = &quot;list&quot;, length = length(nb))
  21. # instantiate an index
  22. index &lt;- seq_along(nb)
  23. # iterate through and full nb_perm
  24. for (i in index) {
  25. # remove i from the index, then sample and assign
  26. nb_perm[[i]] &lt;- sample(index[-i], cards[i])
  27. }
  28. structure(nb_perm, class = &quot;nb&quot;)
  29. }
  30. #we only simulate 1 weighted grid here for the example
  31. nsim = 1
  32. sim_spatial &lt;- replicate(
  33. nsim, {
  34. nb_perm &lt;- permute_nb(nb)
  35. st_lag(x, nb_perm, st_weights(nb_perm))
  36. }
  37. )
  38. #tidy up the grid
  39. sim_spatial &lt;- sim_spatial %&gt;%
  40. as.data.frame() %&gt;%
  41. rename_all(funs(paste0(&quot;sim_&quot;, str_remove(.,&quot;V&quot;)))) %&gt;%
  42. mutate(id = row_number()) %&gt;%
  43. left_join(grid)
  44. sim_spatial %&gt;%
  45. ggplot() +
  46. geom_sf(aes(fill=sim_1, geometry=geometry)) +
  47. scale_fill_viridis_c()

如何根据权重从空间网格中进行抽样?<!-- -->

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

Now, I would like to allocate each observation in sim_data randomly to a cell in spatial_data, with observations with a 1 value for success more likely to be allocated to a cell with a higher sim_1 value.

Thanks!

答案1

得分: 1

以下是您要翻译的内容:

"It's difficult to know how you want the column sim_1 to be converted to weights (since some of them are negative), but for now let's just normalize them into the 0-1 range.

From the comments, it seems you have more rows in sim_data than sim_spatial, so you can have multiple points assigned within each grid square.

For better illustration, we will create a longer sim_data frame with 1,000 rows. The only relevant column will be success, so we will do:

  1. set.seed(1)
  2. sim_data <- data.frame(success = rbinom(1000, 1, 0.2))

Now we can group this data frame by success, and sample grid squares according to the normalized weights, which will be inverted for success == 0. We can then left join the sim_spatial polygons, and take a random sample of a point within the relevant grid square.

  1. sim_data <- sim_data %>%
  2. group_by(success) %>%
  3. mutate(id = sample(sim_spatial$id, n(), TRUE,
  4. prob = if(mean(success) == 1) {
  5. with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
  6. } else {
  7. 1 - with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
  8. })) %>%
  9. left_join(sim_spatial, by = "id") %>%
  10. rowwise() %>%
  11. mutate(geometry = st_sample(geometry, 1))

Now if we plot the points over the grid, we will see that the success favor the highest values of sim_1

  1. sim_spatial %>%
  2. ggplot() +
  3. geom_sf(aes(fill=sim_1, geometry = geometry)) +
  4. geom_sf(aes(color = factor(success), geometry = geometry), data = sim_data) +
  5. scale_fill_viridis_c() +
  6. scale_color_manual("Success", values = c("black", "red"))

如何根据权重从空间网格中进行抽样?

Furthermore, since we have left joined, we can see how the sim_1 value of the assigned geographical area affects the probability of success in our sim_data

  1. ggplot(sim_data, aes(sim_1, success)) +
  2. geom_smooth(method = glm, method.args = list(family = binomial)) +
  3. scale_y_continuous("Probability of success", limits = c(0, 1))

如何根据权重从空间网格中进行抽样?

英文:

It's difficult to know how you want the column sim_1 to be converted to weights (since some of them are negative), but for now let's just normalize them into the 0-1 range.

From the comments, it seems you have more rows in sim_data than sim_spatial, so you can have multiple points assigned within each grid square.

For better illustration, we will create a longer sim_data frame with 1,000 rows. The only relevant column will be success, so we will do:

  1. set.seed(1)
  2. sim_data &lt;- data.frame(success = rbinom(1000, 1, 0.2))

Now we can group this data frame by success, and sample grid squares according to the normalized weights, which will be inverted for success == 0. We can then left join the sim_spatial polygons, and take a random sample of a point within the relevant grid square.

  1. sim_data &lt;- sim_data %&gt;%
  2. group_by(success) %&gt;%
  3. mutate(id = sample(sim_spatial$id, n(), TRUE,
  4. prob = if(mean(success) == 1) {
  5. with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
  6. } else {
  7. 1 - with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
  8. })) %&gt;%
  9. left_join(sim_spatial, by = &quot;id&quot;) %&gt;%
  10. rowwise() %&gt;%
  11. mutate(geometry = st_sample(geometry, 1))

Now if we plot the points over the grid, we will see that the success favour the highest values of sim_1

  1. sim_spatial %&gt;%
  2. ggplot() +
  3. geom_sf(aes(fill=sim_1, geometry = geometry)) +
  4. geom_sf(aes(color = factor(success), geometry = geometry), data = sim_data) +
  5. scale_fill_viridis_c() +
  6. scale_color_manual(&quot;Success&quot;, values = c(&quot;black&quot;, &quot;red&quot;))

如何根据权重从空间网格中进行抽样?

Furthermore, since we have left joined, we can see how the sim_1 value of the assigned geographical area affects the probability of success in our sim_data

  1. ggplot(sim_data, aes(sim_1, success)) +
  2. geom_smooth(method = glm, method.args = list(family = binomial)) +
  3. scale_y_continuous(&quot;Probability of success&quot;, limits = c(0, 1))

如何根据权重从空间网格中进行抽样?

huangapple
  • 本文由 发表于 2023年8月4日 21:13:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76836275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定