基于单独级别内的计数创建条件因素。

huangapple go评论64阅读模式
英文:

Create conditional factor based on counts within a separate level

问题

以下是您要翻译的代码部分:

I have what I think should be a relatively simple problem. I have a large data set of thousands of observations taken from different areas within distinct sites, the general structure is something like:

df <- data.frame(Site = as.factor(rep(c("Site.A","Site.B","Site.C"), 5)),
                   Response = as.numeric(runif(15, 0, 10)),
                   Habitat = as.factor(c("G","G","F","G","F",
                                         "F","F","F","G","S",
                                         "S", "S", "S","S","S")))

I would like to add a column identifying the dominant habitat in each site based on the counts of the habitats in each site (i.e. whatever habitat makes of the majority of observations within each site).

This should be relatively easy with something along the lines of:

dat %>%
  group_by(Site) %>%
  mutate(Dominant_Habitat = if_else(
    (count(Habitat == "G") >= 3, "G",
    (count(Habitat == "F") >= 3, "F", "S"))

但是我无论如何都找不到使其工作的方法。谢谢。


<details>
<summary>英文:</summary>

I have what I think should be a relatively simple problem. I have a large data set of thousands of observations taken from different areas within distinct sites, the general structure is something like:
 

df <- data.frame(Site = as.factor(rep(c("Site.A","Site.B","Site.C"), 5)),
Response = as.numeric(runif(15, 0, 10)),
Habitat = as.factor(c("G","G","F","G","F",
"F","F","F","G","S",
"S", "S", "S","S","S")))




I would like to add a column identifying the dominant habitat in each site based on the counts of the habitats in each site (i.e. whatever habitat makes of the majority of observations within each site). 

This should be relatively easy with something along the lines of:

dat %>%
group_by(Site) %>%
mutate(Dominant_Habitat = if_else(
(count(Habitat == "G") >=3, "G",
(count(Habitat == "F") >= 3, "F", "S"))


but for the life of me can’t find a way to make it work.
Thanks,


</details>


# 答案1
**得分**: 1

你可以在每个`Site`中对最常见的`Habitat`(可能存在并列情况)进行计数和切片,然后与初始数据集进行连接。

```r
library(dplyr)

df %>%
  count(Site, Habitat) %>%
  group_by(Site) %>%
  slice_max(n) %>%
  summarise(Dominant_Habitat = paste(Habitat, collapse = '/')) %>%
  left_join(df, ., by = "Site")

结果如下:

     Site  Response Habitat Dominant_Habitat
1  Site.A 2.6751221       G              G/S
2  Site.B 7.0941244       G              F/S
3  Site.C 3.3727804       F              F/S
4  Site.A 2.4453809       G              G/S
5  Site.B 2.0155192       F              F/S
6  Site.C 6.8103549       F              F/S
7  Site.A 9.5722247       F              G/S
8  Site.B 8.7405261       F              F/S
9  Site.C 1.0035530       G              F/S
10 Site.A 4.5928348       S              G/S
11 Site.B 5.6210020       S              F/S
12 Site.C 8.2221709       S              F/S
13 Site.A 0.3368293       S              G/S
14 Site.B 0.4153831       S              F/S
15 Site.C 6.0440495       S              F/S
英文:

You can count and slice the the most frequent Habitat(maybe with ties) in each Site, and then join back to the initial dataset.

library(dplyr)

df %&gt;%
  count(Site, Habitat) %&gt;%
  group_by(Site) %&gt;%
  slice_max(n) %&gt;%
  summarise(Dominant_Habitat = paste(Habitat, collapse = &#39;/&#39;)) %&gt;%
  left_join(df, ., by = &quot;Site&quot;)

     Site  Response Habitat Dominant_Habitat
1  Site.A 2.6751221       G              G/S
2  Site.B 7.0941244       G              F/S
3  Site.C 3.3727804       F              F/S
4  Site.A 2.4453809       G              G/S
5  Site.B 2.0155192       F              F/S
6  Site.C 6.8103549       F              F/S
7  Site.A 9.5722247       F              G/S
8  Site.B 8.7405261       F              F/S
9  Site.C 1.0035530       G              F/S
10 Site.A 4.5928348       S              G/S
11 Site.B 5.6210020       S              F/S
12 Site.C 8.2221709       S              F/S
13 Site.A 0.3368293       S              G/S
14 Site.B 0.4153831       S              F/S
15 Site.C 6.0440495       S              F/S

huangapple
  • 本文由 发表于 2023年2月8日 15:24:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382510.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定