英文:
What are the difference between 'rexp(1000, 1)' and 'replicate(1000, rexp(1,1))' in R?
问题
我尝试使用参数为1的指数分布生成1000个数字。
在将种子值设置为1后,我尝试了rexp(1000, 1)
和replicate(1000, rexp(1, 1))
两种方法,但得到的两个向量的中位数不同。
我期望这两个表达式生成的向量是相同的,因为它们都是在相同的种子值下从相同的指数分布中抽样的。
rexp(1000, 1)
和replicate(1000, rexp(1, 1))
之间有什么区别?在实际中应该使用哪个?
这是我尝试的代码:
> options(digits = 2)
> set.seed(1)
>
> a <- rexp(1000, 1)
> b <- replicate(1000, rexp(1, 1))
>
> median(a)
[1] 0.73
> median(b)
[1] 0.68
希望这有帮助。
英文:
I am trying to generate 1000 numbers using exponential distribution with parameter 1.
After setting the seed value to 1, I tried both rexp(1000, 1)
and replicate(1000, rexp(1, 1))
, but the medians of the resulting two vectors are different.
I expected the vectors generated by the two expressions to be the same, because they were both sampled from the same exponential distribution under the same seed value.
What is the difference between rexp(1000, 1)
and replicate(1000, rexp(1, 1))
? Which should I use in practice?
Here is the code that I tried:
> options(digits = 2)
> set.seed(1)
>
> a <- rexp(1000, 1)
> b <- replicate(1000, rexp(1, 1))
>
> median(a)
[1] 0.73
> median(b)
[1] 0.68
答案1
得分: 5
问题在于在使用后随机种子会改变,因此当生成 b
时,您的种子为 1,与 a
不同。如果要使其与 a
相同,您必须在创建 b
之前重置种子。
set.seed(1)
a <- rexp(1000, 1)
set.seed(1)
b <- replicate(1000, rexp(1, 1))
median(a)
#> [1] 0.7346113
median(b)
#> [1] 0.7346113
至于应该使用哪一个,绝对是 rexp(1000, 1)
,因为这只生成一次对底层 C 代码的调用,而不是 1000 次调用。尽管从上面可以看出两个代码生成相同的结果,但简单的基准测试显示 rexp
大约快了 50 倍。
microbenchmark::microbenchmark(a = rexp(1000, 1),
b = replicate(1000, rexp(1, 1)))
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> a 32.501 33.5005 34.54794 34.101 34.701 42.301 100 a
#> b 1503.402 1539.0010 2043.20113 1569.451 1646.901 10051.202 100 b
创建于2023-02-27,使用 reprex v2.0.2
英文:
The problem here is that the random seed changes after it is used, so your seed of 1 is different when you generate b
. You have to reset the seed before you create b
if you want it to be the same as a
set.seed(1)
a <- rexp(1000, 1)
set.seed(1)
b <- replicate(1000, rexp(1, 1))
median(a)
#> [1] 0.7346113
median(b)
#> [1] 0.7346113
As for which you should use, it is definitely rexp(1000, 1)
, because this generates a single call to the underlying C code as opposed to 1000 calls. Although we can see from above that the two codes generate the same results, a simple benchmark shows that rexp
is about 50 times faster.
microbenchmark::microbenchmark(a = rexp(1000, 1),
b = replicate(1000, rexp(1, 1)))
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> a 32.501 33.5005 34.54794 34.101 34.701 42.301 100 a
#> b 1503.402 1539.0010 2043.20113 1569.451 1646.901 10051.202 100 b
<sup>Created on 2023-02-27 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论