英文:
Summarise unique values from column by group from another column
问题
我有一个数据集,其中包含关于出现在我进行评估的地方的物种的一些指标。同一物种可能在同一地方多次出现,因为每个地方都有不同的坐标位置。
这是数据集的样本:
地点 | 物种 | 分数 |
---|---|---|
348 | Cercopithecus mitis mitis | 4.950851 |
597 | Acinonyx jubatus | 6.332438 |
597 | Acinonyx jubatus | 6.332438 |
597 | Acomys johannis | 4.655138 |
597 | Acomys cineraceus | 3.646404 |
873 | Aepyceros melampus | 5.668386 |
873 | Aepyceros melampus | 5.668386 |
873 | Aepyceros melampus | 5.668386 |
873 | Alcelaphus buselaphus | 5.075547 |
我想要按地点从唯一物种中总结列 score
的值(均值、中位数、偏差和方差)。我尝试使用 dplyr
如下所示:
library(dplyr)
dataSpp %>% group_by(PLACE) %>% summarise_each(funs(n_distinct(.)))
但这并没有起作用。我应该做什么呢?
英文:
I have a data set with some metrics about the species that occur across places where I am doing an assessment. A species can occur in the same place more than once with the same score because there are different coordinate locations in each place.
Here a sample of the data set:
PLACE | species | score |
---|---|---|
348 | Cercopithecus mitis mitis | 4.950851 |
597 | Acinonyx jubatus | 6.332438 |
597 | Acinonyx jubatus | 6.332438 |
597 | Acomys johannis | 4.655138 |
597 | Acomys cineraceus | 3.646404 |
873 | Aepyceros melampus | 5.668386 |
873 | Aepyceros melampus | 5.668386 |
873 | Aepyceros melampus | 5.668386 |
873 | Alcelaphus buselaphus | 5.075547 |
I want to summarise the values from column score
(mean, median, deviance, and variance) by place from unique species. I tried using dplyr
as follows:
library(dplyr)
dataSpp %>% group_by(PLACE) %>% summarise_each(funs(n_distinct(.)))
But that did not work. What should I do instead?
答案1
得分: 0
library(tidyverse)
df <- tibble::tribble(
~PLACE, ~species, ~score,
348L, "Cercopithecus mitis mitis", 4.950851,
597L, "Acinonyx jubatus", 6.332438,
597L, "Acinonyx jubatus", 6.332438,
597L, "Acomys johannis cineraceus", 4.655138,
597L, "Aepyceros melampus", 5.646404,
873L, "Aepyceros melampus", 5.668386,
873L, "Aepyceros melampus", 5.668386,
873L, "Alcelaphus buselaphus", 5.075547
)
df %>%
distinct() %>%
summarize(
mean = mean(score),
median = median(score),
sd = sd(score),
.by = c(PLACE)
)
#> # A tibble: 3 × 4
#> PLACE mean median sd
#> <int> <dbl> <dbl> <dbl>
#> 1 348 4.95 4.95 NA
#> 2 597 5.54 5.65 0.843
#> 3 873 5.37 5.37 0.419
Created on 2023-03-21 with reprex v2.0.2
<details>
<summary>英文:</summary>
``` r
library(tidyverse)
df <- tibble::tribble(
~PLACE, ~species, ~score,
348L, "Cercopithecus mitis mitis", 4.950851,
597L, "Acinonyx jubatus", 6.332438,
597L, "Acinonyx jubatus", 6.332438,
597L, "Acomys johannis cineraceus", 4.655138,
597L, "Aepyceros melampus", 5.646404,
873L, "Aepyceros melampus", 5.668386,
873L, "Aepyceros melampus", 5.668386,
873L, "Alcelaphus buselaphus", 5.075547
)
df |>
distinct() |>
summarize(
mean = mean(score),
median = median(score),
sd = sd(score),
.by = c(PLACE)
)
#> # A tibble: 3 × 4
#> PLACE mean median sd
#> <int> <dbl> <dbl> <dbl>
#> 1 348 4.95 4.95 NA
#> 2 597 5.54 5.65 0.843
#> 3 873 5.37 5.37 0.419
<sup>Created on 2023-03-21 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论