2023年8月4日 21:13:52go评论156阅读模式

英文:

How to sample from a spatial grid based on weights?

问题

现在，我想将sim_data中的每个观测随机分配到spatial_data中的一个单元格，其中success的值为1的观测更有可能分配到具有较高sim_1值的单元格。

谢谢！

英文:

I am running some simulations, and want to generate spatially-autocorrelated data. May be going about this completely the wrong way...

First, I have generated a dataset of observations, with the variable success varying by age and country. As this is quite large, this is what it looks like:

library(tidyverse)
library(sf)
library(sfdep)
#read in the dataset of simulated data
sim_data &lt;- read_rds(&quot;sim_data.rds&quot;)
#summarise what the data look like
sim_data %&gt;% 
  head(20)
#&gt;    countries      age prevalence success       age_z
#&gt; 1     Malawi 3.946353 0.16114073       1  0.94643938
#&gt; 2     Malawi 1.538666 0.10983733       0 -1.46124749
#&gt; 3     Malawi 3.627969 0.07027045       0  0.62805528
#&gt; 4     Malawi 3.820259 0.15889337       0  0.82034525
#&gt; 5     Malawi 2.830967 0.13286989       0 -0.16894678
#&gt; 6     Malawi 3.876449 0.07469908       0  0.87653512
#&gt; 7     Malawi 4.738689 0.17526246       0  1.73877511
#&gt; 8     Malawi 2.021715 0.11844667       0 -0.97819858
#&gt; 9     Malawi 2.849171 0.05639000       1 -0.15074259
#&gt; 10    Malawi 4.760058 0.17564332       0  1.76014421
#&gt; 11    Malawi 4.912906 0.16997609       0  1.91299183
#&gt; 12    Malawi 1.469949 0.03180826       0 -1.52996444
#&gt; 13    Malawi 2.899988 0.14249146       1 -0.09992556
#&gt; 14    Malawi 3.241331 0.14018377       1  0.24141710
#&gt; 15    Malawi 4.616126 0.08788227       0  1.61621167
#&gt; 16    Malawi 1.554841 0.11851702       0 -1.44507321
#&gt; 17    Malawi 4.955567 0.17073644       0  1.95565303
#&gt; 18    Malawi 4.786673 0.09092192       0  1.78675905
#&gt; 19    Malawi 1.329750 0.11450525       0 -1.67016365
#&gt; 20    Malawi 3.056847 0.13689573       0  0.05693326

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

Now I create a spatial grid, and generate some spatially-autocorrelated data (with lots of help from: https://josiahparry.com/posts/csr.html)


#make a spatial grid
grid &lt;- st_make_grid(cellsize = c(1, 1), n = 10, offset = c(0, 0)) |&gt; 
  as_tibble() |&gt; 
  st_as_sf() |&gt; 
  mutate(
    id = row_number(),
    nb = st_contiguity(geometry),
    wt = st_weights(nb)
    )
#generate some spatially correlated data
nb &lt;- grid[[&quot;nb&quot;]]
wt &lt;- grid[[&quot;wt&quot;]]
x &lt;-  geostan::sim_sar(w = wt_as_matrix(nb, wt), rho = 0.78)
#function to simulate spatial weights by permutation
#example taken from here: https://josiahparry.com/posts/csr.html
permute_nb &lt;- function(nb) {
  # first get the cardinality
  cards &lt;- st_cardinalties(nb)
  # instantiate empty list to fill
  nb_perm &lt;- vector(mode = &quot;list&quot;, length = length(nb))
  
  # instantiate an index
  index &lt;- seq_along(nb)
  
  # iterate through and full nb_perm
  for (i in index) {
    # remove i from the index, then sample and assign
    nb_perm[[i]] &lt;- sample(index[-i], cards[i])
  }
  
  structure(nb_perm, class = &quot;nb&quot;)
}
#we only simulate 1 weighted grid here for the example
nsim = 1 
sim_spatial &lt;- replicate(
  nsim, {
    nb_perm &lt;- permute_nb(nb)
    st_lag(x, nb_perm, st_weights(nb_perm))
  }
)
#tidy up the grid
sim_spatial &lt;- sim_spatial %&gt;%
  as.data.frame() %&gt;%
  rename_all(funs(paste0(&quot;sim_&quot;, str_remove(.,&quot;V&quot;)))) %&gt;%
  mutate(id = row_number()) %&gt;%
  left_join(grid)
sim_spatial %&gt;%
  ggplot() +
  geom_sf(aes(fill=sim_1, geometry=geometry)) +
  scale_fill_viridis_c()

如何根据权重从空间网格中进行抽样？

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

Now, I would like to allocate each observation in sim_data randomly to a cell in spatial_data, with observations with a 1 value for success more likely to be allocated to a cell with a higher sim_1 value.

Thanks!

答案1

得分: 1

以下是您要翻译的内容：

"It's difficult to know how you want the column sim_1 to be converted to weights (since some of them are negative), but for now let's just normalize them into the 0-1 range.

From the comments, it seems you have more rows in sim_data than sim_spatial, so you can have multiple points assigned within each grid square.

For better illustration, we will create a longer sim_data frame with 1,000 rows. The only relevant column will be success, so we will do:

set.seed(1)
sim_data <- data.frame(success = rbinom(1000, 1, 0.2))

Now we can group this data frame by success, and sample grid squares according to the normalized weights, which will be inverted for success == 0. We can then left join the sim_spatial polygons, and take a random sample of a point within the relevant grid square.

sim_data <- sim_data %>%
  group_by(success) %>%
  mutate(id = sample(sim_spatial$id, n(), TRUE,
                     prob = if(mean(success) == 1) {
                  with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
                     } else {
                  1 - with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
                  })) %>%
  left_join(sim_spatial, by = "id") %>%
  rowwise() %>%
  mutate(geometry = st_sample(geometry, 1))

Now if we plot the points over the grid, we will see that the success favor the highest values of sim_1

sim_spatial %>%
  ggplot() +
  geom_sf(aes(fill=sim_1, geometry = geometry)) +
  geom_sf(aes(color = factor(success), geometry = geometry), data = sim_data) +
  scale_fill_viridis_c() +
  scale_color_manual("Success", values = c("black", "red"))

Furthermore, since we have left joined, we can see how the sim_1 value of the assigned geographical area affects the probability of success in our sim_data

ggplot(sim_data, aes(sim_1, success)) +
  geom_smooth(method = glm, method.args = list(family = binomial)) +
  scale_y_continuous("Probability of success", limits = c(0, 1))

英文:

It's difficult to know how you want the column sim_1 to be converted to weights (since some of them are negative), but for now let's just normalize them into the 0-1 range.

From the comments, it seems you have more rows in sim_data than sim_spatial, so you can have multiple points assigned within each grid square.

For better illustration, we will create a longer sim_data frame with 1,000 rows. The only relevant column will be success, so we will do:

set.seed(1)
sim_data &lt;- data.frame(success = rbinom(1000, 1, 0.2))

sim_data &lt;- sim_data %&gt;%
  group_by(success) %&gt;%
  mutate(id = sample(sim_spatial$id, n(), TRUE,
                     prob = if(mean(success) == 1) {
                  with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
                     } else {
                  1 - with(sim_spatial, (sim_1 - min(sim_1))/diff(range(sim_1)))
                  })) %&gt;%
  left_join(sim_spatial, by = &quot;id&quot;) %&gt;%
  rowwise() %&gt;%
  mutate(geometry = st_sample(geometry, 1))

Now if we plot the points over the grid, we will see that the success favour the highest values of sim_1

sim_spatial %&gt;%
  ggplot() +
  geom_sf(aes(fill=sim_1, geometry = geometry)) +
  geom_sf(aes(color = factor(success), geometry = geometry), data = sim_data) +
  scale_fill_viridis_c() +
  scale_color_manual(&quot;Success&quot;, values = c(&quot;black&quot;, &quot;red&quot;))

Furthermore, since we have left joined, we can see how the sim_1 value of the assigned geographical area affects the probability of success in our sim_data

ggplot(sim_data, aes(sim_1, success)) +
  geom_smooth(method = glm, method.args = list(family = binomial)) +
  scale_y_continuous(&quot;Probability of success&quot;, limits = c(0, 1))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据权重从空间网格中进行抽样？

问题

答案1

I have two vectors of indices (rows, columns) for a data.frame with corresponding values, whats the fastest way to build the new data.frame?

在R中计算单词子集的出现次数？

在 R 中更改 text_tokens 函数的输出

修改热图的距离和链接方式

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。