2023年7月17日 13:19:34go评论89阅读模式

英文:

R: Simulating the Population of a Fish Pond

问题

我正在使用R编程语言工作。

我最近遇到了以下数学难题：

假设有一个有100条鱼的池塘
每天，池塘的人口增加5%，概率为5%
池塘的人口减少5%，概率为5%
池塘的人口保持不变的概率为90%

我编写了一些R代码来表示池塘在1000天内的大小：

set.seed(123)
n_days <- 1000
pond_population <- rep(0, n_days)
pond_population[1] <- 100
for (i in 2:n_days) {
    prob <- runif(1)
    if (prob <= 0.05) {
        pond_population[i] <- pond_population[i-1] + round(pond_population[i-1] * 0.05)
    } else if (prob > 0.05 && prob <= 0.10) {
        pond_population[i] <- pond_population[i-1] - round(pond_population[i-1] * 0.05)
    } else {
        pond_population[i] <- pond_population[i-1]
    }
}
plot(pond_population, type = "l", main = "Pond Population Over Time", xlab = "Day", ylab = "Population")

**我的问题：**我对以下对这个问题的修改很感兴趣 - 假设每天你从这个池塘中捕捉10条不同的鱼，给这些鱼打标签，然后放回池塘。自然地，有可能某些日子你会捕捉到以前捕捉过的鱼 - 也有可能以前捕捉过的鱼会死亡。在第1000天结束后，你将知道当前鱼塘人口的百分之多少？

我有兴趣学习如何编写一个模拟过程来回答这个问题 - 可以添加到我已经编写的代码中，该代码每天都会跟踪鱼塘的人口规模，以及你已经看到的池塘中的个别鱼（例如，想象每条鱼都被分配一个唯一的ID）。

我不确定如何开始解决这个问题 - 我认为我可能必须使用“堆栈”或“队列”来解决这个问题，但我不熟悉这些概念以及它们在这里如何使用。

有人可以教我如何做吗？

谢谢！

英文:

I am working with the R programming language.

I came across the following math puzzle recently:

Suppose there is a pond with 100 fish
Each day, there is a 5% chance that population of the pond increases by 5% of its current population
A 5% chance that the population of the pond decreases by 5% of its current population
a 90% chance that the population of the pond stays the same

I wrote some R code to represent the size of the pond over 1000 days:

set.seed(123)
n_days &lt;- 1000
pond_population &lt;- rep(0, n_days)
pond_population[1] &lt;- 100
for (i in 2:n_days) {
    prob &lt;- runif(1)
    if (prob &lt;= 0.05) {
        pond_population[i] &lt;- pond_population[i-1] + round(pond_population[i-1] * 0.05)
    } else if (prob &gt; 0.05 &amp;&amp; prob &lt;= 0.10) {
        pond_population[i] &lt;- pond_population[i-1] - round(pond_population[i-1] * 0.05)
    } else {
        pond_population[i] &lt;- pond_population[i-1]
    }
}
 
plot(pond_population, type = &quot;l&quot;, main = &quot;Pond Population Over Time&quot;, xlab = &quot;Day&quot;, ylab = &quot;Population&quot;)

My Question: I am curious about the following modification to this problem - suppose each day you catch 10 distinct fish from this pond, tag these fish, and then put them back into the pond. Naturally, it is possible that some days you will catch fish that you previously caught in the past - and it is also possible that some of the fish you caught in the past will die. After you have finished fishing on the 1000th day - what percent of the current fish pond population will be known to you?

I am interested in learning how to write a simulation procedure to answer this question - something that can be added to code I already wrote that keeps track of the size of fish pond population each day as well as the individual fish within the pond that you have already seen (e.g. imagine if each fish is assigned a unique ID).

I am not sure how to begin this problem - I think I might have to use "stacks" or "queues" for this problem, but I am not familiar with these concepts and how they would be used here.

Can someone please show me how to do this?

Thanks!

答案1

得分: 4

以下是翻译好的代码部分：

这是一种方法
```R
library(tidyverse)
set.seed(123)
fish_pond_sim <- function(pop=100, days=1000, fished_per_day=10) {
    fish <- tibble(
            id = 1:pop,
            seen = FALSE,
            dead = FALSE
            )
    results <- tibble(
        population = pop,
        day = 1,
        seen = 0,
        dead = 0,
        seen_and_alive = 0
    )
    living <- pop
  for (i in 2:days) {
    prob <- runif(1)
    five_percent <- round(length(living) * 0.05)
    if (prob <= 0.05) {
      five_pct_sample <- sample(living, five_percent, replace = FALSE)
      fish[fish$id %in% five_pct_sample,]$dead <- TRUE
    } else if (prob > 0.05 && prob <= 0.10) {
      fish <- fish %>% add_row(
        id = max(fish$id):(max(fish$id) + five_percent), 
        seen = FALSE,
        dead = FALSE
      )
    }
    fished_sample <- sample(living, fished_per_day, replace = FALSE)
    fish[fish$id %in% fished_sample,]$seen <- TRUE
    living <- fish[!fish$dead,]$id
    results <- results %>% add_row(
      population = length(living),
      day = i,
      seen = sum(fish$seen),
      dead = sum(fish$dead),
      seen_and_alive = sum(fish$seen & !fish$dead)
    )
  }
    return(results)
}
result <- fish_pond_sim(1000, 1000) 
result %>%
    ggplot(aes(x = day)) +
    geom_line(aes(y = population, color = "Population")) +
    geom_line(aes(y = seen_and_alive, color = "Seen and Alive")) + 
    theme_bw()

要获取已知的百分比，您可以像这样操作：

result %>%
    slice_tail() %>%
    pull(seen_and_alive_percentage) # 84.00424


希望这有所帮助。如果您需要任何进一步的信息，请告诉我。
<details>
<summary>英文:</summary>
Here&#39;s one approach

library(tidyverse)

set.seed(123)

fish_pond_sim <- function(pop=100, days=1000, fished_per_day=10) {
fish <- tibble(
id = 1:pop,
seen = FALSE,
dead = FALSE
)

results &lt;- tibble(
population = pop,
day = 1,
seen = 0,
dead = 0,
seen_and_alive = 0
)
living &lt;- pop

for (i in 2:days) {
prob <- runif(1)
five_percent <- round(length(living) * 0.05)
if (prob <= 0.05) {
five_pct_sample <- sample(living, five_percent, replace = FALSE)
fish[fish$id %in% five_pct_sample,]$dead <- TRUE
} else if (prob > 0.05 && prob <= 0.10) {
fish <- fish %>% add_row(
id = max(fish$id):(max(fish$id) + five_percent),
seen = FALSE,
dead = FALSE
)
}

fished_sample &lt;- sample(living, fished_per_day, replace = FALSE)
fish[fish$id %in% fished_sample,]$seen &lt;- TRUE
living &lt;- fish[!fish$dead,]$id
results &lt;- results %&gt;% add_row(
population = length(living),
day = i,
seen = sum(fish$seen),
dead = sum(fish$dead),
seen_and_alive = sum(fish$seen &amp; !fish$dead)
)

}
return(results)
}

result <- fish_pond_sim(1000, 1000)

result %>%
ggplot(aes(x = day)) +
geom_line(aes(y = population, color = "Population")) +
geom_line(aes(y = seen_and_alive, color = "Seen and Alive")) +
theme_bw()

[![plot of fish][1]][1]
To get the percentage which will be known to you, you can do something like this:

result %>%
slice_tail() %>%
pull(seen_and_alive_percentage) # 84.00424


[1]: https://i.stack.imgur.com/L6vJB.png
</details>
# 答案2
**得分**: 3
这是你提供的代码的翻译部分：
"alternate and new ways to solve this problem" 翻译为 "解决这个问题的替代和新方法"。
"Final data shows 93% of fishes are known/marked in the end!" 翻译为 "最终数据显示93%的鱼最终被识别/标记了！"。
接下来是R代码的翻译：
``` r
library(tidyverse) ; 
#&gt; 警告: package &#39;ggplot2&#39; 是在 R 版本 4.2.3 下构建的
#&gt; 警告: package &#39;tibble&#39; 是在 R 版本 4.2.3 下构建的
#&gt; 警告: package &#39;tidyr&#39; 是在 R 版本 4.2.3 下构建的
#&gt; 警告: package &#39;readr&#39; 是在 R 版本 4.2.3 下构建的
#&gt; 警告: package &#39;purrr&#39; 是在 R 版本 4.2.3 下构建的
#&gt; 警告: package &#39;dplyr&#39; 是在 R 版本 4.2.3 下构建的
#&gt; 警告: package &#39;stringr&#39; 是在 R 版本 4.2.3 下构建的
#&gt; 警告: package &#39;forcats&#39; 是在 R 版本 4.2.3 下构建的
library(patchwork) # 用于组合绘图
# 初始化
fishes &lt;- rep(0, 1000) # 0代表池塘中的所有鱼; 1代表被标记的鱼
prob_change &lt;- .05 # 人口增加的概率 = 人口减少的概率
frac_change &lt;- .05 # 每天的人口比例变化
fish_caught &lt;- 10 # 用于标记的鱼的数量
num_of_days &lt;- 1000 # 模拟运行的天数
# 单次模拟运行
next_day &lt;- function(fishpop, dummy)
{
# 人口大小的变化差异，用于增加或减少
pop_diff &lt;- round(frac_change * length(fishpop)) # 当前人口的5%
pop_red &lt;- length(fishpop) - pop_diff
# 随机变量
p = runif(1)
# 新的人口
fishpop_list &lt;- 
case_when( # 使用dplyr::包中的矢量化if_else语句
p &lt;= prob_change ~ list(sample(fishpop, size = pop_red)), # 从当前人口中抽样（少5%）
p &gt;= (1 - prob_change) ~ list(c(fishpop, rep(0, pop_diff))), # 添加到当前人口
.default = list(fishpop)
)
fishpop &lt;- fishpop_list[[1]] # 从列表中提取人口
# 钓鱼并标记为1
fishpop[sample(length(fishpop), size = fish_caught)] &lt;- 1
return(fishpop)
}
# 运行多天的模拟
# 创建迭代器列表：第一个元素是初始鱼的状态；其他都是虚拟的
iterator &lt;- as.list(0:num_of_days)
iterator[[1]] &lt;- fishes
# 数据框以存储每天的统计信息和模拟结果
daily_stats &lt;- 
tibble(day = 0:num_of_days,
iterator = iterator) %&gt;% 
# 为每一天运行模拟，将结果鱼的状态存储在“fishes”中，并在下一次迭代中使用结果
mutate(fishes = accumulate(iterator, next_day)) %&gt;% 
# 汇总数据
mutate(total = map_int(fishes, length),
marked_count = map_int(fishes, ~length(.x[.x == 1])),
fraction_marked = marked_count / total)
# 绘图
total_count &lt;- 
ggplot(daily_stats, aes(day, total)) + 
geom_point(alpha = 0.2) + geom_line(alpha = 0.2)
marked_count &lt;- 
ggplot(daily_stats, aes(day, marked_count)) + 
geom_point(alpha = 0.2) + geom_line(alpha = 0.2)
# 显示图表：一个在另一个下方
total_count / marked_count

最后，这部分代码显示了鱼群中被标记的鱼的比例：

# 显示标记的鱼群比例
daily_stats %&gt;% select(-2, -3) %&gt;% tail
#&gt; # A tibble: 6 &#215; 4
#&gt;     day total marked_count fraction_marked
#&gt;   &lt;int&gt; &lt;int&gt;        &lt;int&gt;           &lt;dbl&gt;
#&gt; 1   995   438          422           0.963
#&gt; 2   996   438          422           0.963
#&gt; 3   997   438          423           0.966
#&gt; 4   998   438          425           0.970
#&gt; 5   999   460          428           0.930
#&gt; 6  1000   460          428           0.930

希望这些翻译对你有帮助。

英文:

It wasn't clear what you meant by you want "alternate and new ways to solve this problem". Here's my attempt using vectorized functions in R's dplyr:: and purrr:: libraries for succinct code.

Final data shows 93% of fishes are known/marked in the end!

library(tidyverse) ; 
#&gt; Warning: package &#39;ggplot2&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;tibble&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;tidyr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;readr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;purrr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;dplyr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;stringr&#39; was built under R version 4.2.3
#&gt; Warning: package &#39;forcats&#39; was built under R version 4.2.3
library(patchwork) # for composing plots
# initialization
fishes &lt;- rep(0, 1000) # 0 represents all the fish in the pond ; 1 are the marked fishes
prob_change &lt;- .05 # probability of population increase = prob of population decrease
frac_change &lt;- .05 # population fraction change with every day
fish_caught &lt;- 10 # number of fish caught for marking
num_of_days &lt;- 1000 # number of days for simulation to run
# Single simulation run
next_day &lt;- function(fishpop, dummy)
{
  # difference in population size to increase or decrease
  pop_diff &lt;- round(frac_change * length(fishpop)) # 5% of current population
  
  pop_red &lt;- length(fishpop) - pop_diff
  
  # random variable 
  p = runif(1)
  
  # new population
  fishpop_list &lt;- 
    case_when( # a vectorized if_else statement from dplyr:: package
      p &lt;= prob_change ~ list(sample(fishpop, size = pop_red)), # subsample from current population (5% less)
      p &gt;= (1 - prob_change) ~ list(c(fishpop, rep(0, pop_diff))), # add to current population
      .default = list(fishpop)
  )
  
  fishpop &lt;- fishpop_list[[1]] # extract population from the list
  
  # fishing and marking with 1
  fishpop[sample(length(fishpop), size = fish_caught)] &lt;- 1
  
  return(fishpop)
}
# run simulation over many days
# create iterator list : first element is the initial fish state ; others are dummies
iterator &lt;- as.list(0:num_of_days)
iterator[[1]] &lt;- fishes
# data frame to store daily stats and the simulations
daily_stats &lt;- 
  tibble(day = 0:num_of_days,
         iterator = iterator) %&gt;% 
  
  # run simulation for each day, store resulting fish state in &quot;fishes&quot; and use result for next iteration
  mutate(fishes = accumulate(iterator, next_day)) %&gt;% 
  
  # summarize data
  mutate(total = map_int(fishes, length),
         marked_count = map_int(fishes, ~length(.x[.x == 1])),
         fraction_marked = marked_count / total)
# plotting
total_count &lt;- 
  ggplot(daily_stats, aes(day, total)) + 
  geom_point(alpha = 0.2) + geom_line(alpha = 0.2)
marked_count &lt;- 
  ggplot(daily_stats, aes(day, marked_count)) + 
      geom_point(alpha = 0.2) + geom_line(alpha = 0.2)
# show plots : one below the other
total_count / marked_count

R: 模拟鱼塘的人口


# Show fraction of fish population marked
daily_stats %&gt;% select(-2, -3) %&gt;% tail
#&gt; # A tibble: 6 &#215; 4
#&gt;     day total marked_count fraction_marked
#&gt;   &lt;int&gt; &lt;int&gt;        &lt;int&gt;           &lt;dbl&gt;
#&gt; 1   995   438          422           0.963
#&gt; 2   996   438          422           0.963
#&gt; 3   997   438          423           0.966
#&gt; 4   998   438          425           0.970
#&gt; 5   999   460          428           0.930
#&gt; 6  1000   460          428           0.930

<sup>Created on 2023-08-01 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R: 模拟鱼塘的人口

问题

答案1

如何在加载JavaScript文件之前显示旋转器？

‘rexp(1000, 1)’ 和 ‘replicate(1000, rexp(1,1))’ 在R中有什么区别？

自定义Tmap调色板基于结果

根据最新行条件筛选表格

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。