Nullmodel with presence absence data in vegan – R

huangapple go评论53阅读模式
英文:

Nullmodel with presence absence data in vegan - R

问题

我正在尝试在涉及带有出现-缺失数据的地块层次结构的beta多样性分区任务上运行一个nullmodel分析。我读到推荐使用“tswap”方法,但只有在我使用其他方法时,比如“r00”时才能运行。您能帮助我找出问题出在哪里吗?

library(parallel)
library(vegan)

# 设置数据 ----
# 群落
com <- matrix(rbinom(120, 1, 0.5),nrow=12)
# 地块层次结构
plt_hir <- data.frame(plot=c(1:nrow(com)),
                      treatment = c(rep("reg1_t1",nrow(com)/4),rep("reg1_t2",nrow(com)/4),rep("reg2_t1",nrow(com)/4),rep("reg2_t2",nrow(com)/4)),
                      region=c(rep("reg1",nrow(com)/2),rep("reg2",nrow(com)/2)),
                      country=rep("country",nrow(com)))
# 转化为数值
plt_hir <- data.frame(apply(plt_hir, 2, function(x) as.numeric(factor(x)))

# 多部分 ----

# 矩阵
y = com
# 层次矩阵
x = plt_hir
# 零模型计算的次数
nsimul = 100
# 要计算的零模型
method = "tswap"
# 获取并设置并行计算的核心数(不要使用所有核心,否则会减慢计算机)
cors <- detectCores() - 1

# 零模型 ----

# 定义零模型
nm <- nullmodel(com, method)

## 套接字类型的集群,适用于所有平台
cl <- makeCluster(cors)
clusterEvalQ(cl, library(vegan))
clusterExport(cl, c("simulate", "x", "nm"))
set.seed(3)
smlist <- parLapply(cl, 1:nsimul, function(i)  simulate(nm, nsim = 1, thin = 1))
stopCluster(cl)
smlist <- smbind(smlist, MARGIN=3)

我了解错误是由于未通过检查所致,但我不明白为什么?该方法应该适应出现-缺失矩阵,不是吗?感谢您的帮助!

英文:

I am trying to run a nullmodel analysis on a beta diversity partitioning task involving a hirarchy in the plots with presence absence data. I read that it is recomended to use the "tswap" method, however it runs only when I use other methods such as "r00". Could you help me finding out what the problem is here?

library(parallel)
library(vegan)

# Setting up data ----
# community
com &lt;- matrix(rbinom(120, 1, 0.5),nrow=12)
# plot hirachy 
plt_hir &lt;- data.frame(plot=c(1:nrow(com)),
                      treatment = c(rep(&quot;reg1_t1&quot;,nrow(com)/4),rep(&quot;reg1_t2&quot;,nrow(com)/4),rep(&quot;reg2_t1&quot;,nrow(com)/4),rep(&quot;reg2_t2&quot;,nrow(com)/4)),
                      region=c(rep(&quot;reg1&quot;,nrow(com)/2),rep(&quot;reg2&quot;,nrow(com)/2)),
                      country=rep(&quot;country&quot;,nrow(com)))
# to numeric
plt_hir &lt;- data.frame(apply(plt_hir, 2, function(x) as.numeric(factor(x))))


# Multipart ----

# matrix
y = com
# level matrix
x = plt_hir
# number of null model computations
nsimul = 100
# null model to be computed
method = &quot;tswap&quot;
# get and set number of cores for parallel computation (do not use all of them otherwise computer will be slowed)
cors &lt;- detectCores()-1


# NULL MODEL ----

# define null model
nm &lt;- nullmodel(com, method)

## socket type cluster, works on all platforms
cl &lt;- makeCluster(cors)
clusterEvalQ(cl, library(vegan))
clusterExport(cl, c(&quot;simulate&quot;, &quot;x&quot;, &quot;nm&quot;))
set.seed(3)
smlist &lt;- parLapply(cl, 1:nsimul, function(i)  simulate(nm, nsim = 1, thin = 1))
stopCluster(cl)
smlist &lt;- smbind(smlist, MARGIN=3)

I understand that the error is due to failed sanity checks but I do not get why? The method should be adapted to presenence absence matrices isn't it?
Thanks for your help!

答案1

得分: 0

如果您以strict = TRUE(默认)的参数调用smbind,函数将检查数据是否严格正确,这意味着列表的每个组合元素具有相同的起始索引、结束索引和相等的thin值。显然,您会失败,因为每个序列的长度为1,起始值不同。在smbind()中设置strict = FALSE将执行相同的检查,只会发出警告,但会合并元素。

不应该像这样调用试验交换(method = "tswap"):这是一种顺序方法,产生每个空模型矩阵都是从前一个导出的链。因此,您不应该根据模拟的数量来分割分析,而应该根据核心的数量,然后在每个核心中运行一系列模型。之后,每个核心的输出将具有相同的起始、结束和thin索引,smbind(..., strict = TRUE)也将正常工作。

注意1:检查如何为每个序列(核心)设置随机数种子。如果不小心,这些序列可能都从相同的种子开始,你将得到每个核心的链的副本。

注意2:您绝对应该设置thin值大于1,并在顺序方法中使用burnin,特别是在试验交换中。每一步只会略微更改输入数据,而在试验交换中,这种更改可能是什么都没有改变(请阅读其文档)。因此,您需要稀疏化(仅保存,例如,千分之一的步骤),并丢弃前几步,因为它们都从相同的输入矩阵开始,链的序列在很长时间内几乎是相同的。您应该删除这么多步骤,以使链中的初始状态是独立的。如果使用顺序方法,我建议您使用coda包的MCMC分析工具。

英文:

If you call smbind with argument strict = TRUE (default), function checks that data are strictly correct which means that every combined element of the list has same start index, end index and equal thin. Obviously you fail, as each sequence is of length 1 and starts at different values. Setting strict = FALSE in smbind() will perform the same checks, and only warn loudly but combine the elements.

You should not call trial swap (method = &quot;tswap&quot;) like that: it is a sequential method producing chains where each null model matrix is derived from a previous one. Therefore you should not split your analysis by the number of simulations, but by the number of cores, and run a sequence of models within each core. After that the output from each core will have the same start, end and thin indices and smbind(..., strict = TRUE) will also work.

Caveat 1: check how to set random number seed for each sequence (core). If you are not careful, the sequences may all start from the same seed and you get just copies of the chains from each core.

Caveat 2: You should absolutely set thin > 1 and also use burnin in sequential methods, in particular in trial swap. Each step will change the input data only a bit – and with trial swap that bit may be that nothing changes (read its documentation) – and therefore you need to thin (only save, say, one of thousand steps), and discard the first steps, because they all start from the same input matrix and sequences are almost identical for a long time. You should remove so many steps that the initial states in chains are independent. I suggest you apply the MCMC analysis tools of the coda package if you use a sequential method.

huangapple
  • 本文由 发表于 2023年3月15日 18:50:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75743678.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定