2023年5月7日 11:14:12go评论50阅读模式

英文:

R: Verifying The Results Of Coin Flips

问题

Here is the translated portion of your text:

"我正在使用R编程语言工作。

假设我有以下问题：

有一枚硬币，如果它正面朝上，那么下一次翻转为正面的概率为0.6（如果是反面，下一次翻转也是0.6）
一个班级里有100名学生
每个学生随机翻转这枚硬币的次数
第n个学生的最后一次翻转不会影响第n+1个学生的第一次翻转（即，当下一个学生翻转硬币时，第一次翻转有0.5的正反面概率，但这个学生的下一次翻转取决于上一次翻转）

我试图编写R代码来表示这个问题。

首先，我定义了变量：

library(dplyr)
library(stringr)

# 生成数据
set.seed(123)
ids <- 1:100
student_id <- sample(ids, 100000, replace = TRUE)
coin_result <- character(1000)
coin_result[1] <- sample(c("H", "T"), 1)

接下来，我尝试编写翻转过程：

for (i in 2:length(coin_result)) {
  if (student_id[i] != student_id[i-1]) {
    coin_result[i] <- sample(c("H", "T"), 1)
  } else if (coin_result[i-1] == "H") {
    coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
  } else {
    coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
  }
}

# 整理数据
my_data <- data.frame(student_id, coin_result)
my_data <- my_data[order(my_data$student_id),]

最后，我尝试验证结果：

my_data %>%
  group_by(student_id) %>%
  summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
  filter(!is.na(Sequence)) %>%
  count(Sequence)

尽管代码运行了，但我不认为我的代码是正确的 - 当我看结果时：

# A tibble: 4 x 2
  Sequence     n
  <chr>    <int>
1 HH       23810
2 HT       25043
3 TH       25042
4 TT       26005

我认为如果我是正确的，HH 应该明显大于 HT，而 TT 应该明显大于 TH。

请问有人可以告诉我是否我做对了，以及如何纠正它吗？

谢谢！

英文:

I am working with the R programming language.

Suppose I have the following problem:

There is a coin where if it lands head then the probability of the next flip being heads is 0.6 (and if tails then the next flip being tails is also 0.6)
There are 100 students in a class
Each student flips this coin a random number of times
The last flip of student_n does not influence the first flip of student_n+1 (i.e. when the next student flips the coin, the first flip has 0.5 probability of heads or tails, but the next flip for this student depends on the previous flip)

I am trying to write R code to represent this problem.

First I defined the variables:

library(dplyr)
library(stringr)

# generate data
set.seed(123)
ids &lt;- 1:100
student_id &lt;- sample(ids, 100000, replace = TRUE)
coin_result &lt;- character(1000)
coin_result[1] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1)

Next, I tried to write the flipping process:

for (i in 2:length(coin_result)) {
  if (student_id[i] != student_id[i-1]) {
    coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1)
  } else if (coin_result[i-1] == &quot;H&quot;) {
    coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1, prob = c(0.6, 0.4))
  } else {
    coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1, prob = c(0.4, 0.6))
  }
}

#tidy up
my_data &lt;- data.frame(student_id, coin_result)
my_data &lt;- my_data[order(my_data$student_id),]

Finally, I tried to verify the results:

my_data %&gt;%
  group_by(student_id) %&gt;%
  summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = &#39;drop&#39;) %&gt;%
  filter(!is.na(Sequence)) %&gt;%
  count(Sequence)

Even though the code ran, I don't think my code is correct - when I look at the results:

# A tibble: 4 x 2
  Sequence     n
  &lt;chr&gt;    &lt;int&gt;
1 HH       23810
2 HT       25043
3 TH       25042
4 TT       26005

I think if I was correct, HH should have been significantly greater than HT , and TT should have been significantly greater than TH.

Can someone please tell me if I have done this correctly and how to correct it?

Thanks!

答案1

得分: 1

I think you need to sort the student_id vector before the loop, so that your comparison of student_id[i] != student_id[i-1] would be valid. Otherwise, it's not catching consecutive flips from the same student.

结果似乎合理，其中HH和TT一起占总翻转的60.4%。

library(tidyverse)

set.seed(123)
ids <- 1:100
# 仅以下一行已更改，所有其他行与您的代码相同
student_id <- sort(sample(ids, 100000, replace = TRUE))
coin_result <- character(1000)
coin_result[1] <- sample(c("H", "T"), 1)

for (i in 2:length(coin_result)) {
  if (student_id[i] != student_id[i-1]) {
    coin_result[i] <- sample(c("H", "T"), 1)
  } else if (coin_result[i-1] == "H") {
    coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
  } else {
    coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
  }
}

# 整理数据
my_data <- data.frame(student_id, coin_result)
my_data <- my_data[order(my_data$student_id),]

my_data %>%
  group_by(student_id) %>%
  summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
  filter(!is.na(Sequence)) %>%
  count(Sequence)

# 一个tibble: 4 × 2
  Sequence     n
1 HH       29763
2 HT       19782
3 TH       19775
4 TT       30580

英文:

The result seems to make sense, where HH and TT together occupies 60.4% of the total flips.

library(tidyverse)

set.seed(123)
ids &lt;- 1:100
# only the following line was changed, all other lines are same as your code
student_id &lt;- sort(sample(ids, 100000, replace = TRUE))
coin_result &lt;- character(1000)
coin_result[1] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1)

for (i in 2:length(coin_result)) {
  if (student_id[i] != student_id[i-1]) {
    coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1)
  } else if (coin_result[i-1] == &quot;H&quot;) {
    coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1, prob = c(0.6, 0.4))
  } else {
    coin_result[i] &lt;- sample(c(&quot;H&quot;, &quot;T&quot;), 1, prob = c(0.4, 0.6))
  }
}

#tidy up
my_data &lt;- data.frame(student_id, coin_result)
my_data &lt;- my_data[order(my_data$student_id),]

my_data %&gt;%
  group_by(student_id) %&gt;%
  summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = &#39;drop&#39;) %&gt;%
  filter(!is.na(Sequence)) %&gt;%
  count(Sequence)

# A tibble: 4 &#215; 2
  Sequence     n
  &lt;chr&gt;    &lt;int&gt;
1 HH       29763
2 HT       19782
3 TH       19775
4 TT       30580

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R：验证抛硬币结果

问题

答案1

有没有一种方法可以按大小拆分分组的数据框？

creating labels in parallel coordinates plot with R, ggparcoord()

在一列中使用多个分隔符

Reading geojson file to convert to sf

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论