如何在R中汇总多个列并去除NAs

huangapple go评论156阅读模式
英文:

How to Summarize multiple columns in R while removing NAs

问题

我试图总结一个数据集,以便列出所有具有数字值的列的均值、中位数、标准差、第25个和第5个百分位数,其中NA已被移除。到目前为止,我有以下代码,但似乎无法将其整理成合适的结构。

  1. mtcars %>%
  2. summarize(across(where(is.numeric), list(Mean = mean, Median = median, Q90 = quantile(., 0.9), SD = sd, Q5 = quantile(., 0.05), N = sum(!is.na(.))), na.rm = TRUE))

我正在寻找类似下面的输出:

  1. > 均值 中位数 90%百分位数 标准差 5%百分位数 N
  2. > MPG # # # # # #
  3. > CYL #
  4. > DISP #
  5. > HP #
  6. > DRAT #
  7. > 等等 #

还类似下面的输出:

  1. > 均值 中位数 Q90 标准差 Q5 N
  2. > MPG # # # # # #
  3. > CYL #
  4. > DISP #
  5. > HP #
  6. > DRAT #
  7. > 等等 #

是否可以将数据整理成这种方式?感谢您帮助R中的新手。

英文:

I am trying to use summarize a dataset so that it would list out the mean, median, SD, 25th, 5th percentile for all columns with numeric values with NA removed. I have the below so far, but cannot seem to get it into the appropriate structure.

  1. mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE))

I am looking for something like the below:

> MPG CYL DISP HP DRAT etc
> Mean # # # # #
> Median #
> 90% Percentile(q90) #
> SD #
> 5% Percentile (q5) #
> N #

and also like the below

> Mean Median Q90 SD Q5 N
> MPG # # # # # #
> CYL #
> DISP #
> HP #
> DRAT #
> etc #

Is it possible to shape the data this way? Thanks for helping a novice in R.

答案1

得分: 1

这是一个处理数据的一种方法:

  1. mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE)) %>%
  2. pivot_longer(everything(), names_to = "var", values_to = "val") %>%
  3. separate(var, c("var", "stat"), sep = "_") %>%
  4. pivot_wider(names_from = "stat", values_from = "val")

输出结果:

  1. # A tibble: 11 x 3
  2. var mean sd
  3. <chr> <dbl> <dbl>
  4. 1 mpg 20.1 6.03
  5. 2 cyl 6.19 1.79
  6. 3 disp 231. 124.
  7. 4 hp 147. 68.6
  8. 5 drat 3.60 0.535
  9. 6 wt 3.22 0.978
  10. 7 qsec 17.8 1.79
  11. 8 vs 0.438 0.504
  12. 9 am 0.406 0.499
  13. 10 gear 3.69 0.738
  14. 11 carb 2.81 1.62

或者将 names_from = "stat" 更改为 names_from = "var"
输出结果:

  1. # A tibble: 2 x 12
  2. stat mpg cyl disp hp drat wt qsec vs am gear carb
  3. <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  4. 1 mean 20.1 6.19 231. 147. 3.60 3.22 17.8 0.438 0.406 3.69 2.81
  5. 2 sd 6.03 1.79 124. 68.6 0.535 0.978 1.79 0.504 0.499 0.738 1.62

请注意,这只是代码的翻译部分。

英文:

Here is one of way of doing this :

  1. mtcars %&gt;% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE)) %&gt;%
  2. pivot_longer(everything(),names_to = &quot;var&quot;, values_to = &quot;val&quot; ) %&gt;%
  3. separate(var, c(&quot;var&quot;, &quot;stat&quot;), sep = &quot;_&quot;) %&gt;%
  4. pivot_wider(names_from = &quot;stat&quot;, values_from = &quot;val&quot;)

output:

  1. # A tibble: 11 x 3
  2. var mean sd
  3. &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  4. 1 mpg 20.1 6.03
  5. 2 cyl 6.19 1.79
  6. 3 disp 231. 124.
  7. 4 hp 147. 68.6
  8. 5 drat 3.60 0.535
  9. 6 wt 3.22 0.978
  10. 7 qsec 17.8 1.79
  11. 8 vs 0.438 0.504
  12. 9 am 0.406 0.499
  13. 10 gear 3.69 0.738
  14. 11 carb 2.81 1.62

or change names_from = &quot;stat&quot; to names_from = &quot;var&quot; :
output

  1. # A tibble: 2 x 12
  2. stat mpg cyl disp hp drat wt qsec vs am gear carb
  3. &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  4. 1 mean 20.1 6.19 231. 147. 3.60 3.22 17.8 0.438 0.406 3.69 2.81
  5. 2 sd 6.03 1.79 124. 68.6 0.535 0.978 1.79 0.504 0.499 0.738 1.62

huangapple
  • 本文由 发表于 2023年2月24日 13:13:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75552830.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定