如何在R中汇总多个列并去除NAs

huangapple go评论132阅读模式
英文:

How to Summarize multiple columns in R while removing NAs

问题

我试图总结一个数据集,以便列出所有具有数字值的列的均值、中位数、标准差、第25个和第5个百分位数,其中NA已被移除。到目前为止,我有以下代码,但似乎无法将其整理成合适的结构。

mtcars %>%
  summarize(across(where(is.numeric), list(Mean = mean, Median = median, Q90 = quantile(., 0.9), SD = sd, Q5 = quantile(., 0.05), N = sum(!is.na(.))), na.rm = TRUE))

我正在寻找类似下面的输出:

>         均值         中位数         90%百分位数        标准差         5%百分位数          N
> MPG    #                #                  #                   #               #             #
> CYL       #
> DISP     #
> HP       #
> DRAT   #
> 等等        #

还类似下面的输出:

>         均值       中位数      Q90      标准差      Q5       N
> MPG       #           #          #           #        #        #
> CYL          #
> DISP        #
> HP           #
> DRAT     #
> 等等        #

是否可以将数据整理成这种方式?感谢您帮助R中的新手。

英文:

I am trying to use summarize a dataset so that it would list out the mean, median, SD, 25th, 5th percentile for all columns with numeric values with NA removed. I have the below so far, but cannot seem to get it into the appropriate structure.

  mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE))

I am looking for something like the below:

> MPG CYL DISP HP DRAT etc
> Mean # # # # #
> Median #
> 90% Percentile(q90) #
> SD #
> 5% Percentile (q5) #
> N #

and also like the below

> Mean Median Q90 SD Q5 N
> MPG # # # # # #
> CYL #
> DISP #
> HP #
> DRAT #
> etc #

Is it possible to shape the data this way? Thanks for helping a novice in R.

答案1

得分: 1

这是一个处理数据的一种方法:

 mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE)) %>%
  pivot_longer(everything(), names_to = "var", values_to = "val") %>%
  separate(var, c("var", "stat"), sep = "_") %>%
  pivot_wider(names_from = "stat", values_from = "val")

输出结果:

# A tibble: 11 x 3
   var      mean      sd
   <chr>   <dbl>   <dbl>
 1 mpg    20.1     6.03 
 2 cyl     6.19    1.79 
 3 disp  231.    124.   
 4 hp    147.     68.6  
 5 drat    3.60    0.535
 6 wt      3.22    0.978
 7 qsec   17.8     1.79 
 8 vs      0.438   0.504
 9 am      0.406   0.499
10 gear    3.69    0.738
11 carb    2.81    1.62 

或者将 names_from = "stat" 更改为 names_from = "var"
输出结果:

# A tibble: 2 x 12
  stat    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mean    20.1  6.19  231. 147.  3.60  3.22  17.8 0.438 0.406 3.69  2.81
2 sd       6.03 1.79  124.  68.6 0.535 0.978  1.79 0.504 0.499 0.738 1.62

请注意,这只是代码的翻译部分。

英文:

Here is one of way of doing this :

 mtcars %&gt;% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE)) %&gt;%
  pivot_longer(everything(),names_to = &quot;var&quot;, values_to = &quot;val&quot; ) %&gt;%
  separate(var, c(&quot;var&quot;, &quot;stat&quot;), sep = &quot;_&quot;) %&gt;%
  pivot_wider(names_from = &quot;stat&quot;, values_from = &quot;val&quot;)

output:

# A tibble: 11 x 3
   var      mean      sd
   &lt;chr&gt;   &lt;dbl&gt;   &lt;dbl&gt;
 1 mpg    20.1     6.03 
 2 cyl     6.19    1.79 
 3 disp  231.    124.   
 4 hp    147.     68.6  
 5 drat    3.60    0.535
 6 wt      3.22    0.978
 7 qsec   17.8     1.79 
 8 vs      0.438   0.504
 9 am      0.406   0.499
10 gear    3.69    0.738
11 carb    2.81    1.62 

or change names_from = &quot;stat&quot; to names_from = &quot;var&quot; :
output

# A tibble: 2 x 12
  stat    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 mean  20.1   6.19  231. 147.  3.60  3.22  17.8  0.438 0.406 3.69   2.81
2 sd     6.03  1.79  124.  68.6 0.535 0.978  1.79 0.504 0.499 0.738  1.62

huangapple
  • 本文由 发表于 2023年2月24日 13:13:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75552830.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定