英文:
How to Summarize multiple columns in R while removing NAs
问题
我试图总结一个数据集,以便列出所有具有数字值的列的均值、中位数、标准差、第25个和第5个百分位数,其中NA已被移除。到目前为止,我有以下代码,但似乎无法将其整理成合适的结构。
mtcars %>%
summarize(across(where(is.numeric), list(Mean = mean, Median = median, Q90 = quantile(., 0.9), SD = sd, Q5 = quantile(., 0.05), N = sum(!is.na(.))), na.rm = TRUE))
我正在寻找类似下面的输出:
> 均值 中位数 90%百分位数 标准差 5%百分位数 N
> MPG # # # # # #
> CYL #
> DISP #
> HP #
> DRAT #
> 等等 #
还类似下面的输出:
> 均值 中位数 Q90 标准差 Q5 N
> MPG # # # # # #
> CYL #
> DISP #
> HP #
> DRAT #
> 等等 #
是否可以将数据整理成这种方式?感谢您帮助R中的新手。
英文:
I am trying to use summarize a dataset so that it would list out the mean, median, SD, 25th, 5th percentile for all columns with numeric values with NA removed. I have the below so far, but cannot seem to get it into the appropriate structure.
mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE))
I am looking for something like the below:
> MPG CYL DISP HP DRAT etc
> Mean # # # # #
> Median #
> 90% Percentile(q90) #
> SD #
> 5% Percentile (q5) #
> N #
and also like the below
> Mean Median Q90 SD Q5 N
> MPG # # # # # #
> CYL #
> DISP #
> HP #
> DRAT #
> etc #
Is it possible to shape the data this way? Thanks for helping a novice in R.
答案1
得分: 1
这是一个处理数据的一种方法:
mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE)) %>%
pivot_longer(everything(), names_to = "var", values_to = "val") %>%
separate(var, c("var", "stat"), sep = "_") %>%
pivot_wider(names_from = "stat", values_from = "val")
输出结果:
# A tibble: 11 x 3
var mean sd
<chr> <dbl> <dbl>
1 mpg 20.1 6.03
2 cyl 6.19 1.79
3 disp 231. 124.
4 hp 147. 68.6
5 drat 3.60 0.535
6 wt 3.22 0.978
7 qsec 17.8 1.79
8 vs 0.438 0.504
9 am 0.406 0.499
10 gear 3.69 0.738
11 carb 2.81 1.62
或者将 names_from = "stat"
更改为 names_from = "var"
:
输出结果:
# A tibble: 2 x 12
stat mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mean 20.1 6.19 231. 147. 3.60 3.22 17.8 0.438 0.406 3.69 2.81
2 sd 6.03 1.79 124. 68.6 0.535 0.978 1.79 0.504 0.499 0.738 1.62
请注意,这只是代码的翻译部分。
英文:
Here is one of way of doing this :
mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE)) %>%
pivot_longer(everything(),names_to = "var", values_to = "val" ) %>%
separate(var, c("var", "stat"), sep = "_") %>%
pivot_wider(names_from = "stat", values_from = "val")
output:
# A tibble: 11 x 3
var mean sd
<chr> <dbl> <dbl>
1 mpg 20.1 6.03
2 cyl 6.19 1.79
3 disp 231. 124.
4 hp 147. 68.6
5 drat 3.60 0.535
6 wt 3.22 0.978
7 qsec 17.8 1.79
8 vs 0.438 0.504
9 am 0.406 0.499
10 gear 3.69 0.738
11 carb 2.81 1.62
or change names_from = "stat"
to names_from = "var"
:
output
# A tibble: 2 x 12
stat mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mean 20.1 6.19 231. 147. 3.60 3.22 17.8 0.438 0.406 3.69 2.81
2 sd 6.03 1.79 124. 68.6 0.535 0.978 1.79 0.504 0.499 0.738 1.62
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论