R:验证抛硬币结果

huangapple go评论85阅读模式
英文:

R: Verifying The Results Of Coin Flips

问题

Here is the translated portion of your text:

"我正在使用R编程语言工作。

假设我有以下问题:

  • 有一枚硬币,如果它正面朝上,那么下一次翻转为正面的概率为0.6(如果是反面,下一次翻转也是0.6)
  • 一个班级里有100名学生
  • 每个学生随机翻转这枚硬币的次数
  • 第n个学生的最后一次翻转不会影响第n+1个学生的第一次翻转(即,当下一个学生翻转硬币时,第一次翻转有0.5的正反面概率,但这个学生的下一次翻转取决于上一次翻转)

我试图编写R代码来表示这个问题。

首先,我定义了变量:

  1. library(dplyr)
  2. library(stringr)
  3. # 生成数据
  4. set.seed(123)
  5. ids <- 1:100
  6. student_id <- sample(ids, 100000, replace = TRUE)
  7. coin_result <- character(1000)
  8. coin_result[1] <- sample(c("H", "T"), 1)

接下来,我尝试编写翻转过程:

  1. for (i in 2:length(coin_result)) {
  2. if (student_id[i] != student_id[i-1]) {
  3. coin_result[i] <- sample(c("H", "T"), 1)
  4. } else if (coin_result[i-1] == "H") {
  5. coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
  6. } else {
  7. coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
  8. }
  9. }
  10. # 整理数据
  11. my_data <- data.frame(student_id, coin_result)
  12. my_data <- my_data[order(my_data$student_id),]

最后,我尝试验证结果:

  1. my_data %>%
  2. group_by(student_id) %>%
  3. summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
  4. filter(!is.na(Sequence)) %>%
  5. count(Sequence)

尽管代码运行了,但我不认为我的代码是正确的 - 当我看结果时:

  1. # A tibble: 4 x 2
  2. Sequence n
  3. <chr> <int>
  4. 1 HH 23810
  5. 2 HT 25043
  6. 3 TH 25042
  7. 4 TT 26005

我认为如果我是正确的,HH 应该明显大于 HT,而 TT 应该明显大于 TH。

请问有人可以告诉我是否我做对了,以及如何纠正它吗?

谢谢!

英文:

I am working with the R programming language.

Suppose I have the following problem:

  • There is a coin where if it lands head then the probability of the next flip being heads is 0.6 (and if tails then the next flip being tails is also 0.6)
  • There are 100 students in a class
  • Each student flips this coin a random number of times
  • The last flip of student_n does not influence the first flip of student_n+1 (i.e. when the next student flips the coin, the first flip has 0.5 probability of heads or tails, but the next flip for this student depends on the previous flip)

I am trying to write R code to represent this problem.

First I defined the variables:

  1. library(dplyr)
  2. library(stringr)
  3. # generate data
  4. set.seed(123)
  5. ids &lt;- 1:100
  6. student_id &lt;- sample(ids, 100000, replace = TRUE)
  7. coin_result &lt;- character(1000)
  8. coin_result[1] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1)

Next, I tried to write the flipping process:

  1. for (i in 2:length(coin_result)) {
  2. if (student_id[i] != student_id[i-1]) {
  3. coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1)
  4. } else if (coin_result[i-1] == &quot;H&quot;) {
  5. coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1, prob = c(0.6, 0.4))
  6. } else {
  7. coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1, prob = c(0.4, 0.6))
  8. }
  9. }
  10. #tidy up
  11. my_data &lt;- data.frame(student_id, coin_result)
  12. my_data &lt;- my_data[order(my_data$student_id),]

Finally, I tried to verify the results:

  1. my_data %&gt;%
  2. group_by(student_id) %&gt;%
  3. summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = &#39;drop&#39;) %&gt;%
  4. filter(!is.na(Sequence)) %&gt;%
  5. count(Sequence)

Even though the code ran, I don't think my code is correct - when I look at the results:

  1. # A tibble: 4 x 2
  2. Sequence n
  3. &lt;chr&gt; &lt;int&gt;
  4. 1 HH 23810
  5. 2 HT 25043
  6. 3 TH 25042
  7. 4 TT 26005

I think if I was correct, HH should have been significantly greater than HT , and TT should have been significantly greater than TH.

Can someone please tell me if I have done this correctly and how to correct it?

Thanks!

答案1

得分: 1

I think you need to sort the student_id vector before the loop, so that your comparison of student_id[i] != student_id[i-1] would be valid. Otherwise, it's not catching consecutive flips from the same student.

结果似乎合理,其中HHTT一起占总翻转的60.4%。

  1. library(tidyverse)
  2. set.seed(123)
  3. ids <- 1:100
  4. # 仅以下一行已更改,所有其他行与您的代码相同
  5. student_id <- sort(sample(ids, 100000, replace = TRUE))
  6. coin_result <- character(1000)
  7. coin_result[1] <- sample(c("H", "T"), 1)
  8. for (i in 2:length(coin_result)) {
  9. if (student_id[i] != student_id[i-1]) {
  10. coin_result[i] <- sample(c("H", "T"), 1)
  11. } else if (coin_result[i-1] == "H") {
  12. coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
  13. } else {
  14. coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
  15. }
  16. }
  17. # 整理数据
  18. my_data <- data.frame(student_id, coin_result)
  19. my_data <- my_data[order(my_data$student_id),]
  20. my_data %>%
  21. group_by(student_id) %>%
  22. summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
  23. filter(!is.na(Sequence)) %>%
  24. count(Sequence)
  25. # 一个tibble: 4 × 2
  26. Sequence n
  27. 1 HH 29763
  28. 2 HT 19782
  29. 3 TH 19775
  30. 4 TT 30580
英文:

I think you need to sort the student_id vector before the loop, so that your comparison of student_id[i] != student_id[i-1] would be valid. Otherwise, it's not catching consecutive flips from the same student.

The result seems to make sense, where HH and TT together occupies 60.4% of the total flips.

  1. library(tidyverse)
  2. set.seed(123)
  3. ids &lt;- 1:100
  4. # only the following line was changed, all other lines are same as your code
  5. student_id &lt;- sort(sample(ids, 100000, replace = TRUE))
  6. coin_result &lt;- character(1000)
  7. coin_result[1] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1)
  8. for (i in 2:length(coin_result)) {
  9. if (student_id[i] != student_id[i-1]) {
  10. coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1)
  11. } else if (coin_result[i-1] == &quot;H&quot;) {
  12. coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1, prob = c(0.6, 0.4))
  13. } else {
  14. coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1, prob = c(0.4, 0.6))
  15. }
  16. }
  17. #tidy up
  18. my_data &lt;- data.frame(student_id, coin_result)
  19. my_data &lt;- my_data[order(my_data$student_id),]
  20. my_data %&gt;%
  21. group_by(student_id) %&gt;%
  22. summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = &#39;drop&#39;) %&gt;%
  23. filter(!is.na(Sequence)) %&gt;%
  24. count(Sequence)
  25. # A tibble: 4 &#215; 2
  26. Sequence n
  27. &lt;chr&gt; &lt;int&gt;
  28. 1 HH 29763
  29. 2 HT 19782
  30. 3 TH 19775
  31. 4 TT 30580

huangapple
  • 本文由 发表于 2023年5月7日 11:14:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76192042.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定