计算一个组因素的总出现次数中一个因素的百分比。

huangapple go评论53阅读模式
英文:

Calculate percentage of occurrence of a factor of the total amount of occurrences of a group factor

问题

我有一个示例数据集:

Species <- c("Bass", "Bass", "Bass", "Bass", "Bass", "Bass", "Bass", "Bass", "Bass")
FishID <- c("a1", "a1", "a1", "a2", "a2", "a3", "a3", "a3", "a3")
Prey <- c("Amphipoden", "Mysis", "Polychaeten", "Amphipoden", "Mysis", "Amphipoden", "Mysis", "Polychaeten", "Mollusca")

df <- data.frame(Species, FishID, Prey)

为了计算Bass作为捕食者时每个FishID(个体Bass)的猎物物种的绝对百分比,有3个不同的FishID:a1、a2和a3。我想计算每个FishID(个体Bass)中猎物物种的绝对百分比。

所以在这种情况下:
Amphipods出现了3次,在所有三个个体Bass的胃中都找到,所以百分比是100%。对于Mysis也是如此。但是Polychaete只在Bass的胃中找到了两次,所以这将是66.6%。而Mollusca只在Bass的胃中找到了一次,所以是33.3%。

最终结果应该类似于这样:

Species <- c("Bass", "Bass", "Bass", "Bass")
Prey <- c("Amphipoden", "Mysis", "Polychaeten", "Mollusca")
Percentage <- c(100, 100, 66.6, 33.3)
df2 <- data.frame(Species, Prey, Percentage)

我尝试了以下方法:

df %>%
  group_by(Species, Prey) %>%
  summarise(n = n()) %>%
  mutate(percent = n / sum(n) * 100)

但这不是我想要的结果。

欢迎提供任何帮助。

提前感谢!

英文:

I have a sample dataset:

Species &lt;-c(&quot;Bass&quot;, &quot;Bass&quot;, &quot;Bass&quot;, &quot;Bass&quot;, &quot;Bass&quot;, &quot;Bass&quot;,&quot;Bass&quot;,&quot;Bass&quot;,&quot;Bass&quot;)
FishID &lt;- c(&quot;a1&quot;, &quot;a1&quot;, &quot;a1&quot;, &quot;a2&quot;, &quot;a2&quot;, &quot;a3&quot;,&quot;a3&quot;,&quot;a3&quot;,&quot;a3&quot;)
Prey &lt;- c(&quot;Amphipoden&quot;, &quot;Mysis&quot;, &quot;Polychaeten&quot;, &quot;Amphipoden&quot;, &quot;Mysis&quot;, &quot;Amphipoden&quot;,&quot;Mysis&quot;,&quot;Polychaeten&quot;,&quot;Mollusca&quot;)

df &lt;- data.frame(Species, FishID, Prey)

For having Bass as a predator, there are 3 unique individual Basses as different FishID: a1, a2 and a3. I would like to calculate the absolute percentage of occurrence of a prey species per FishID (individual Bass).

So in this case:
Amphipods occurs 3 times, so 100% in the stomachs of Bass (found in all three of the individuals), for Mysis idem. For polychaete however, is found only two times in the stomach of Bass: so this would be then 66,6%. And Moluscs are only found one time, so 33,3 %

As an end result, I am looking for something like this:

Species &lt;-c(&quot;Bass&quot;, &quot;Bass&quot;, &quot;Bass&quot;, &quot;Bass&quot;)
Prey &lt;- c(&quot;Amphipoden&quot;, &quot;Mysis&quot;, &quot;Polychaeten&quot;, &quot;Mollusca&quot;)
Percentage &lt;- c(100, 100, 66,6, 33,3)
df2 &lt;- data.frame(Species,Prey, Percentage)

I tried this:

df %&gt;%
  group_by(Species,Prey) %&gt;% 
  summarise(n = n()) %&gt;%
  mutate(percent = n / sum(n) * 100)

But it isn't giving me hat I want.

Anty help is welcome.

Thank you in advance!

答案1

得分: 1

只需更改一个小地方:不要除以sum(n),而要除以length(unique(FishID)),以获得正确的FishID个体数。还请注意,FishID的最后一个元素应该是a3,而不是A3

英文:

You just have to change a little point: Instead of dividing by sum(n), you have to divide by length(unique(FishID)) in order to get the correct number of individual FishID. Also note that the last element of FishID has to be a3, not A3.

library(dplyr)

FishID &lt;- c(&quot;a1&quot;, &quot;a1&quot;, &quot;a1&quot;, &quot;a2&quot;, &quot;a2&quot;, &quot;a3&quot;,&quot;a3&quot;,&quot;a3&quot;,&quot;a3&quot;)

df %&gt;%
    summarise(n = n(), .by = Prey) %&gt;%
    mutate(percent = n / length(unique(FishID)) * 100)

         Prey n   percent
1  Amphipoden 3 100.00000
2       Mysis 3 100.00000
3 Polychaeten 2  66.66667
4    Mollusca 1  33.33333

</details>



# 答案2
**得分**: 1

```R
library(tidyverse)
df |&gt;
  # 只计算每个种类/鱼ID一次的猎物(我假设它们总是一起的)
  distinct(Species, FishID, Prey) |&gt;
  mutate(count = 1) |&gt;
  # 为每个鱼ID完成缺失的组合
  complete(nesting(Species, FishID), Prey, fill = list(count = 0)) |&gt; 
  summarize(percent = sum(count) / n(), .by = c(Species, Prey))


# 一个数据表:4 × 3
  种类    猎物        百分比
1 鲈鱼    介形虫    1    
2 鲈鱼    软体动物  0.333
3 鲈鱼    沙蚕    1    
4 鲈鱼    多毛类  0.667
英文:
library(tidyverse)
df |&gt;
  # only count Prey once per Species/FishID (which I presume always go together)
  distinct(Species, FishID, Prey) |&gt;
  mutate(count = 1) |&gt;
  # complete with missing combinations for each FishID
  complete(nesting(Species, FishID), Prey, fill = list(count = 0)) |&gt; 
  summarize(percent = sum(count) / n(), .by = c(Species, Prey))


# A tibble: 4 &#215; 3
  Species Prey        percent
  &lt;chr&gt;   &lt;chr&gt;         &lt;dbl&gt;
1 Bass    Amphipoden    1    
2 Bass    Mollusca      0.333
3 Bass    Mysis         1    
4 Bass    Polychaeten   0.667

huangapple
  • 本文由 发表于 2023年7月17日 23:48:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706159.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定