英文:
R: Verifying The Results Of Coin Flips
问题
Here is the translated portion of your text:
"我正在使用R编程语言工作。
假设我有以下问题:
- 有一枚硬币,如果它正面朝上,那么下一次翻转为正面的概率为0.6(如果是反面,下一次翻转也是0.6)
- 一个班级里有100名学生
- 每个学生随机翻转这枚硬币的次数
- 第n个学生的最后一次翻转不会影响第n+1个学生的第一次翻转(即,当下一个学生翻转硬币时,第一次翻转有0.5的正反面概率,但这个学生的下一次翻转取决于上一次翻转)
我试图编写R代码来表示这个问题。
首先,我定义了变量:
library(dplyr)
library(stringr)
# 生成数据
set.seed(123)
ids <- 1:100
student_id <- sample(ids, 100000, replace = TRUE)
coin_result <- character(1000)
coin_result[1] <- sample(c("H", "T"), 1)
接下来,我尝试编写翻转过程:
for (i in 2:length(coin_result)) {
if (student_id[i] != student_id[i-1]) {
coin_result[i] <- sample(c("H", "T"), 1)
} else if (coin_result[i-1] == "H") {
coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
} else {
coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
}
}
# 整理数据
my_data <- data.frame(student_id, coin_result)
my_data <- my_data[order(my_data$student_id),]
最后,我尝试验证结果:
my_data %>%
group_by(student_id) %>%
summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
filter(!is.na(Sequence)) %>%
count(Sequence)
尽管代码运行了,但我不认为我的代码是正确的 - 当我看结果时:
# A tibble: 4 x 2
Sequence n
<chr> <int>
1 HH 23810
2 HT 25043
3 TH 25042
4 TT 26005
我认为如果我是正确的,HH 应该明显大于 HT,而 TT 应该明显大于 TH。
请问有人可以告诉我是否我做对了,以及如何纠正它吗?
谢谢!
英文:
I am working with the R programming language.
Suppose I have the following problem:
- There is a coin where if it lands head then the probability of the next flip being heads is 0.6 (and if tails then the next flip being tails is also 0.6)
- There are 100 students in a class
- Each student flips this coin a random number of times
- The last flip of student_n does not influence the first flip of student_n+1 (i.e. when the next student flips the coin, the first flip has 0.5 probability of heads or tails, but the next flip for this student depends on the previous flip)
I am trying to write R code to represent this problem.
First I defined the variables:
library(dplyr)
library(stringr)
# generate data
set.seed(123)
ids <- 1:100
student_id <- sample(ids, 100000, replace = TRUE)
coin_result <- character(1000)
coin_result[1] <- sample(c("H", "T"), 1)
Next, I tried to write the flipping process:
for (i in 2:length(coin_result)) {
if (student_id[i] != student_id[i-1]) {
coin_result[i] <- sample(c("H", "T"), 1)
} else if (coin_result[i-1] == "H") {
coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
} else {
coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
}
}
#tidy up
my_data <- data.frame(student_id, coin_result)
my_data <- my_data[order(my_data$student_id),]
Finally, I tried to verify the results:
my_data %>%
group_by(student_id) %>%
summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
filter(!is.na(Sequence)) %>%
count(Sequence)
Even though the code ran, I don't think my code is correct - when I look at the results:
# A tibble: 4 x 2
Sequence n
<chr> <int>
1 HH 23810
2 HT 25043
3 TH 25042
4 TT 26005
I think if I was correct, HH should have been significantly greater than HT , and TT should have been significantly greater than TH.
Can someone please tell me if I have done this correctly and how to correct it?
Thanks!
答案1
得分: 1
I think you need to sort
the student_id
vector before the loop, so that your comparison of student_id[i] != student_id[i-1]
would be valid. Otherwise, it's not catching consecutive flips from the same student.
结果似乎合理,其中HH
和TT
一起占总翻转的60.4%。
library(tidyverse)
set.seed(123)
ids <- 1:100
# 仅以下一行已更改,所有其他行与您的代码相同
student_id <- sort(sample(ids, 100000, replace = TRUE))
coin_result <- character(1000)
coin_result[1] <- sample(c("H", "T"), 1)
for (i in 2:length(coin_result)) {
if (student_id[i] != student_id[i-1]) {
coin_result[i] <- sample(c("H", "T"), 1)
} else if (coin_result[i-1] == "H") {
coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
} else {
coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
}
}
# 整理数据
my_data <- data.frame(student_id, coin_result)
my_data <- my_data[order(my_data$student_id),]
my_data %>%
group_by(student_id) %>%
summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
filter(!is.na(Sequence)) %>%
count(Sequence)
# 一个tibble: 4 × 2
Sequence n
1 HH 29763
2 HT 19782
3 TH 19775
4 TT 30580
英文:
I think you need to sort
the student_id
vector before the loop, so that your comparison of student_id[i] != student_id[i-1]
would be valid. Otherwise, it's not catching consecutive flips from the same student.
The result seems to make sense, where HH
and TT
together occupies 60.4% of the total flips.
library(tidyverse)
set.seed(123)
ids <- 1:100
# only the following line was changed, all other lines are same as your code
student_id <- sort(sample(ids, 100000, replace = TRUE))
coin_result <- character(1000)
coin_result[1] <- sample(c("H", "T"), 1)
for (i in 2:length(coin_result)) {
if (student_id[i] != student_id[i-1]) {
coin_result[i] <- sample(c("H", "T"), 1)
} else if (coin_result[i-1] == "H") {
coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.6, 0.4))
} else {
coin_result[i] <- sample(c("H", "T"), 1, prob = c(0.4, 0.6))
}
}
#tidy up
my_data <- data.frame(student_id, coin_result)
my_data <- my_data[order(my_data$student_id),]
my_data %>%
group_by(student_id) %>%
summarize(Sequence = str_c(coin_result, lead(coin_result)), .groups = 'drop') %>%
filter(!is.na(Sequence)) %>%
count(Sequence)
# A tibble: 4 × 2
Sequence n
<chr> <int>
1 HH 29763
2 HT 19782
3 TH 19775
4 TT 30580
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论