按另一列分组总结列的唯一值

huangapple go评论62阅读模式
英文:

Summarise unique values from column by group from another column

问题

我有一个数据集,其中包含关于出现在我进行评估的地方的物种的一些指标。同一物种可能在同一地方多次出现,因为每个地方都有不同的坐标位置。

这是数据集的样本:

地点 物种 分数
348 Cercopithecus mitis mitis 4.950851
597 Acinonyx jubatus 6.332438
597 Acinonyx jubatus 6.332438
597 Acomys johannis 4.655138
597 Acomys cineraceus 3.646404
873 Aepyceros melampus 5.668386
873 Aepyceros melampus 5.668386
873 Aepyceros melampus 5.668386
873 Alcelaphus buselaphus 5.075547

我想要按地点从唯一物种中总结列 score 的值(均值、中位数、偏差和方差)。我尝试使用 dplyr 如下所示:

library(dplyr)
dataSpp %>% group_by(PLACE) %>% summarise_each(funs(n_distinct(.)))

但这并没有起作用。我应该做什么呢?

英文:

I have a data set with some metrics about the species that occur across places where I am doing an assessment. A species can occur in the same place more than once with the same score because there are different coordinate locations in each place.

Here a sample of the data set:

PLACE species score
348 Cercopithecus mitis mitis 4.950851
597 Acinonyx jubatus 6.332438
597 Acinonyx jubatus 6.332438
597 Acomys johannis 4.655138
597 Acomys cineraceus 3.646404
873 Aepyceros melampus 5.668386
873 Aepyceros melampus 5.668386
873 Aepyceros melampus 5.668386
873 Alcelaphus buselaphus 5.075547

I want to summarise the values from column score (mean, median, deviance, and variance) by place from unique species. I tried using dplyr as follows:

library(dplyr)
dataSpp %>% group_by(PLACE) %>% summarise_each(funs(n_distinct(.)))

But that did not work. What should I do instead?

答案1

得分: 0

library(tidyverse)

df <- tibble::tribble(
  ~PLACE,                     ~species,   ~score,
  348L,  "Cercopithecus mitis mitis", 4.950851,
  597L,           "Acinonyx jubatus", 6.332438,
  597L,           "Acinonyx jubatus", 6.332438,
  597L, "Acomys johannis cineraceus", 4.655138,
  597L,         "Aepyceros melampus", 5.646404,
  873L,         "Aepyceros melampus", 5.668386,
  873L,         "Aepyceros melampus", 5.668386,
  873L,      "Alcelaphus buselaphus", 5.075547
)

df %>%
  distinct() %>%
  summarize(
    mean = mean(score),
    median = median(score),
    sd = sd(score),
    .by = c(PLACE)
  )
#> # A tibble: 3 × 4
#>   PLACE  mean median     sd
#>   <int> <dbl>  <dbl>  <dbl>
#> 1   348  4.95   4.95 NA    
#> 2   597  5.54   5.65  0.843
#> 3   873  5.37   5.37  0.419

Created on 2023-03-21 with reprex v2.0.2


<details>
<summary>英文:</summary>

``` r
library(tidyverse)

df &lt;- tibble::tribble(
  ~PLACE,                     ~species,   ~score,
  348L,  &quot;Cercopithecus mitis mitis&quot;, 4.950851,
  597L,           &quot;Acinonyx jubatus&quot;, 6.332438,
  597L,           &quot;Acinonyx jubatus&quot;, 6.332438,
  597L, &quot;Acomys johannis cineraceus&quot;, 4.655138,
  597L,         &quot;Aepyceros melampus&quot;, 5.646404,
  873L,         &quot;Aepyceros melampus&quot;, 5.668386,
  873L,         &quot;Aepyceros melampus&quot;, 5.668386,
  873L,      &quot;Alcelaphus buselaphus&quot;, 5.075547
)

df |&gt;
  distinct() |&gt;
  summarize(
    mean = mean(score),
    median = median(score),
    sd = sd(score),
    .by = c(PLACE)
  )
#&gt; # A tibble: 3 &#215; 4
#&gt;   PLACE  mean median     sd
#&gt;   &lt;int&gt; &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;
#&gt; 1   348  4.95   4.95 NA    
#&gt; 2   597  5.54   5.65  0.843
#&gt; 3   873  5.37   5.37  0.419

<sup>Created on 2023-03-21 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年3月21日 03:01:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75794285.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定