英文:
Create conditional factor based on counts within a separate level
问题
以下是您要翻译的代码部分:
I have what I think should be a relatively simple problem. I have a large data set of thousands of observations taken from different areas within distinct sites, the general structure is something like:
df <- data.frame(Site = as.factor(rep(c("Site.A","Site.B","Site.C"), 5)),
Response = as.numeric(runif(15, 0, 10)),
Habitat = as.factor(c("G","G","F","G","F",
"F","F","F","G","S",
"S", "S", "S","S","S")))
I would like to add a column identifying the dominant habitat in each site based on the counts of the habitats in each site (i.e. whatever habitat makes of the majority of observations within each site).
This should be relatively easy with something along the lines of:
dat %>%
group_by(Site) %>%
mutate(Dominant_Habitat = if_else(
(count(Habitat == "G") >= 3, "G",
(count(Habitat == "F") >= 3, "F", "S"))
但是我无论如何都找不到使其工作的方法。谢谢。
<details>
<summary>英文:</summary>
I have what I think should be a relatively simple problem. I have a large data set of thousands of observations taken from different areas within distinct sites, the general structure is something like:
df <- data.frame(Site = as.factor(rep(c("Site.A","Site.B","Site.C"), 5)),
Response = as.numeric(runif(15, 0, 10)),
Habitat = as.factor(c("G","G","F","G","F",
"F","F","F","G","S",
"S", "S", "S","S","S")))
I would like to add a column identifying the dominant habitat in each site based on the counts of the habitats in each site (i.e. whatever habitat makes of the majority of observations within each site).
This should be relatively easy with something along the lines of:
dat %>%
group_by(Site) %>%
mutate(Dominant_Habitat = if_else(
(count(Habitat == "G") >=3, "G",
(count(Habitat == "F") >= 3, "F", "S"))
but for the life of me can’t find a way to make it work.
Thanks,
</details>
# 答案1
**得分**: 1
你可以在每个`Site`中对最常见的`Habitat`(可能存在并列情况)进行计数和切片,然后与初始数据集进行连接。
```r
library(dplyr)
df %>%
count(Site, Habitat) %>%
group_by(Site) %>%
slice_max(n) %>%
summarise(Dominant_Habitat = paste(Habitat, collapse = '/')) %>%
left_join(df, ., by = "Site")
结果如下:
Site Response Habitat Dominant_Habitat
1 Site.A 2.6751221 G G/S
2 Site.B 7.0941244 G F/S
3 Site.C 3.3727804 F F/S
4 Site.A 2.4453809 G G/S
5 Site.B 2.0155192 F F/S
6 Site.C 6.8103549 F F/S
7 Site.A 9.5722247 F G/S
8 Site.B 8.7405261 F F/S
9 Site.C 1.0035530 G F/S
10 Site.A 4.5928348 S G/S
11 Site.B 5.6210020 S F/S
12 Site.C 8.2221709 S F/S
13 Site.A 0.3368293 S G/S
14 Site.B 0.4153831 S F/S
15 Site.C 6.0440495 S F/S
英文:
You can count
and slice
the the most frequent Habitat
(maybe with ties) in each Site
, and then join back to the initial dataset.
library(dplyr)
df %>%
count(Site, Habitat) %>%
group_by(Site) %>%
slice_max(n) %>%
summarise(Dominant_Habitat = paste(Habitat, collapse = '/')) %>%
left_join(df, ., by = "Site")
Site Response Habitat Dominant_Habitat
1 Site.A 2.6751221 G G/S
2 Site.B 7.0941244 G F/S
3 Site.C 3.3727804 F F/S
4 Site.A 2.4453809 G G/S
5 Site.B 2.0155192 F F/S
6 Site.C 6.8103549 F F/S
7 Site.A 9.5722247 F G/S
8 Site.B 8.7405261 F F/S
9 Site.C 1.0035530 G F/S
10 Site.A 4.5928348 S G/S
11 Site.B 5.6210020 S F/S
12 Site.C 8.2221709 S F/S
13 Site.A 0.3368293 S G/S
14 Site.B 0.4153831 S F/S
15 Site.C 6.0440495 S F/S
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论