使用purr或map对因子变量的总百分比进行汇总。

huangapple go评论56阅读模式
英文:

Summarising with purr or map the percentage of the total of a factor variable

问题

我正在尝试为列表的每个变量执行映射函数。为了执行此函数,我想要按某些类别进行分组,并显示每个变量因子的总百分比。

例如,我有这个列表:

mtcars_list <- c("am","gear","carb")

我有要分组的变量:"cyl",以及我想要总结的变量。在这种情况下,我将转换mtcars数据库的变量"vs"为因子:

mtcars$vs <- factor(mtcars$vs, levels=c('0', '1'))

然后我执行这个map::purr函数,当我在总结时使用count、prop.table或类似的函数时,它会给我一个错误:

purrr::map(mtcars_list, ~ mtcars %>%
  group_by(cyl, .data[[.x]]) %>%
  summarise(count(vs), .groups = "drop")*100)

当我运行时,它会显示:

no applicable method for 'count' applied to an object of class "c('double', 'numeric')"

结果会类似于这样:

第一类别

   0        1
A 17.7%     83.3%
B  5.0%     95.5%

第二类别

    0        1 
A   2.0     98.0
B   4.0     96.0

谢谢!!!

英文:

I'm trying to do a map function for each of the variables of a list. To do this function, I want to group_by by some categories and show the percentage of the total of each factor of a variable

For instance, I have this list:

mtcars_list &lt;- c (&quot;am&quot;,&quot;gear&quot;,&quot;carb&quot;)

I have the variable I want to group_by for: "cyl" And the variable I want to summarise. In this case I will transform the variable "vs" of the mtcars database as a factor:

mtcars$vs &lt;- factor(mtcars$vs , levels=c(&#39;0&#39;, &#39;1&#39;))

I then do this map:: purr function, which gives me error when I use count, prop.table or similar when summarising...

purrr::map(mtcars_list, ~ mtcars %&gt;%
  group_by(cyl, .data[[.x]]) %&gt;%
  summarise(count(vs), .groups = &quot;drop&quot;)*100)

When I run this it says:

no applicable method for 'count' applied to an object of class "c('double', 'numeric')

The result would be something like this

First category

   0        1
A 17.7%     83.3%
B  5.0%     95.5%
Second category 
    0        1 
A   2.0     98.0
B   4.0     96.0

Thank you!!!

答案1

得分: 1

请检查这是否是预期输出:

df1 <- purrr::map(mtcars_list, ~ mtcars %>%
                    select(cyl, vs, !!sym(.x)) %>%
                    mutate(n = n(), .by = c(cyl, .data[[.x]], vs)) %>%
                    mutate(n2 = n(), .by = c(cyl, .data[[.x]])) %>%
                    group_by(cyl, .data[[.x]], vs) %>%
                    slice_tail(n = 1) %>%
                    mutate(perc = (n / n2) * 100) %>%
                    pivot_wider(id_cols = c(cyl, .data[[.x]]), names_from = vs, values_from = perc)
)

df1

[[1]]
# A tibble: 6 × 4
# Groups:   cyl, am [6]
    cyl    am   `1`   `0`
  <dbl> <dbl> <dbl> <dbl>
1     4     0  100   NA  
2     4     1   87.5  12.5
3     6     0  100   NA  
4     6     1   NA   100  
5     8     0   NA   100  
6     8     1   NA   100  

[[2]]
# A tibble: 8 × 4
# Groups:   cyl, gear [8]
    cyl  gear   `1`   `0`
  <dbl> <dbl> <dbl> <dbl>
1     4     3   100   NA
2     4     4   100   NA
3     4     5    50    50
4     6     3   100   NA
5     6     4    50    50
6     6     5    NA   100
7     8     3    NA   100
8     8     5    NA   100

[[3]]
# A tibble: 9 × 4
# Groups:   cyl, carb [9]
    cyl  carb   `1`   `0`
  <dbl> <dbl> <dbl> <dbl>
1     4     1  100   NA  
2     4     2   83.3  16.7
3     6     1  100   NA  
4     6     4   50    50  
5     6     6   NA   100  
6     8     2   NA   100  
7     8     3   NA   100  
8     8     4   NA   100  
9     8     8   NA   100
英文:

Please check if this is the expected output

df1 &lt;- purrr::map(mtcars_list, ~ mtcars %&gt;% select(cyl,vs,!!sym(.x)) %&gt;% 
                    mutate(n=n() , .by=c(cyl, .data[[.x]], vs)) %&gt;% 
                    mutate(n2=n(), .by=c(cyl, .data[[.x]]) ) %&gt;% 
                    group_by(cyl, .data[[.x]], vs) %&gt;% 
                    slice_tail(n=1) %&gt;% 
                    mutate(perc=(n/n2)*100) %&gt;% 
                    pivot_wider(id_cols = c(cyl,.data[[.x]]), names_from = vs, values_from = perc)
)

df1



[[1]]
# A tibble: 6 &#215; 4
# Groups:   cyl, am [6]
    cyl    am   `1`   `0`
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     4     0 100    NA  
2     4     1  87.5  12.5
3     6     0 100    NA  
4     6     1  NA   100  
5     8     0  NA   100  
6     8     1  NA   100  

[[2]]
# A tibble: 8 &#215; 4
# Groups:   cyl, gear [8]
    cyl  gear   `1`   `0`
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     4     3   100    NA
2     4     4   100    NA
3     4     5    50    50
4     6     3   100    NA
5     6     4    50    50
6     6     5    NA   100
7     8     3    NA   100
8     8     5    NA   100

[[3]]
# A tibble: 9 &#215; 4
# Groups:   cyl, carb [9]
    cyl  carb   `1`   `0`
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     4     1 100    NA  
2     4     2  83.3  16.7
3     6     1 100    NA  
4     6     4  50    50  
5     6     6  NA   100  
6     8     2  NA   100  
7     8     3  NA   100  
8     8     4  NA   100  
9     8     8  NA   100  



答案2

得分: 1

你的 dplyr 代码不起作用。对于你想要的输出格式,基本的 R 函数 tableprop.table 可以更快地达到目标:

purrr::map(
  mtcars_list, \(x) 
  (table(mtcars$cyl, mtcars[[x]]) |&gt; prop.table(margin = 1)) * 100
)
# [[1]]
#            0        1
#   4 27.27273 72.72727
#   6 57.14286 42.85714
#   8 85.71429 14.28571
# 
# [[2]]
#             3         4         5
#   4  9.090909 72.727273 18.181818
#   6 28.571429 57.142857 14.285714
#   8 85.714286  0.000000 14.285714
# 
# [[3]]
#             1         2         3         4         6         8
#   4 45.454545 54.545455  0.000000  0.000000  0.000000  0.000000
#   6 28.571429  0.000000  0.000000 57.142857 14.285714  0.000000
#   8  0.000000 28.571429 21.428571 42.857143  0.000000  7.142857

注意:我将代码部分保持不变,只进行了翻译。

英文:

Your dplyr code doesn't work. For the output format you want, the base R functions table and prop.table get you there faster:

purrr::map(
mtcars_list, \(x) 
(table(mtcars$cyl, mtcars[[x]]) |&gt; prop.table(margin = 1)) * 100
)
# [[1]]
#            0        1
#   4 27.27273 72.72727
#   6 57.14286 42.85714
#   8 85.71429 14.28571
# 
# [[2]]
#             3         4         5
#   4  9.090909 72.727273 18.181818
#   6 28.571429 57.142857 14.285714
#   8 85.714286  0.000000 14.285714
# 
# [[3]]
#             1         2         3         4         6         8
#   4 45.454545 54.545455  0.000000  0.000000  0.000000  0.000000
#   6 28.571429  0.000000  0.000000 57.142857 14.285714  0.000000
#   8  0.000000 28.571429 21.428571 42.857143  0.000000  7.142857

huangapple
  • 本文由 发表于 2023年7月13日 23:26:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76681063.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定