2023年7月13日 23:26:12go评论97阅读模式

英文:

Summarising with purr or map the percentage of the total of a factor variable

问题

我正在尝试为列表的每个变量执行映射函数。为了执行此函数，我想要按某些类别进行分组，并显示每个变量因子的总百分比。

例如，我有这个列表：

mtcars_list <- c("am","gear","carb")

我有要分组的变量："cyl"，以及我想要总结的变量。在这种情况下，我将转换mtcars数据库的变量"vs"为因子：

mtcars$vs <- factor(mtcars$vs, levels=c('0', '1'))

然后我执行这个map::purr函数，当我在总结时使用count、prop.table或类似的函数时，它会给我一个错误：

purrr::map(mtcars_list, ~ mtcars %>%
  group_by(cyl, .data[[.x]]) %>%
  summarise(count(vs), .groups = "drop")*100)

当我运行时，它会显示：

no applicable method for 'count' applied to an object of class "c('double', 'numeric')"

结果会类似于这样：

第一类别

   0        1
A 17.7%     83.3%
B  5.0%     95.5%

第二类别

    0        1 
A   2.0     98.0
B   4.0     96.0

谢谢！！！

英文:

I'm trying to do a map function for each of the variables of a list. To do this function, I want to group_by by some categories and show the percentage of the total of each factor of a variable

For instance, I have this list:

mtcars_list &lt;- c (&quot;am&quot;,&quot;gear&quot;,&quot;carb&quot;)

I have the variable I want to group_by for: "cyl" And the variable I want to summarise. In this case I will transform the variable "vs" of the mtcars database as a factor:

mtcars$vs &lt;- factor(mtcars$vs , levels=c(&#39;0&#39;, &#39;1&#39;))

I then do this map:: purr function, which gives me error when I use count, prop.table or similar when summarising...

purrr::map(mtcars_list, ~ mtcars %&gt;%
  group_by(cyl, .data[[.x]]) %&gt;%
  summarise(count(vs), .groups = &quot;drop&quot;)*100)

When I run this it says:

no applicable method for 'count' applied to an object of class "c('double', 'numeric')

The result would be something like this

First category

   0        1
A 17.7%     83.3%
B  5.0%     95.5%

Second category 
    0        1 
A   2.0     98.0
B   4.0     96.0

Thank you!!!

答案1

得分: 1

请检查这是否是预期输出：

df1 <- purrr::map(mtcars_list, ~ mtcars %>%
                    select(cyl, vs, !!sym(.x)) %>%
                    mutate(n = n(), .by = c(cyl, .data[[.x]], vs)) %>%
                    mutate(n2 = n(), .by = c(cyl, .data[[.x]])) %>%
                    group_by(cyl, .data[[.x]], vs) %>%
                    slice_tail(n = 1) %>%
                    mutate(perc = (n / n2) * 100) %>%
                    pivot_wider(id_cols = c(cyl, .data[[.x]]), names_from = vs, values_from = perc)
)
df1
[[1]]
# A tibble: 6 × 4
# Groups:   cyl, am [6]
    cyl    am   `1`   `0`
  <dbl> <dbl> <dbl> <dbl>
1     4     0  100   NA  
2     4     1   87.5  12.5
3     6     0  100   NA  
4     6     1   NA   100  
5     8     0   NA   100  
6     8     1   NA   100  
[[2]]
# A tibble: 8 × 4
# Groups:   cyl, gear [8]
    cyl  gear   `1`   `0`
  <dbl> <dbl> <dbl> <dbl>
1     4     3   100   NA
2     4     4   100   NA
3     4     5    50    50
4     6     3   100   NA
5     6     4    50    50
6     6     5    NA   100
7     8     3    NA   100
8     8     5    NA   100
[[3]]
# A tibble: 9 × 4
# Groups:   cyl, carb [9]
    cyl  carb   `1`   `0`
  <dbl> <dbl> <dbl> <dbl>
1     4     1  100   NA  
2     4     2   83.3  16.7
3     6     1  100   NA  
4     6     4   50    50  
5     6     6   NA   100  
6     8     2   NA   100  
7     8     3   NA   100  
8     8     4   NA   100  
9     8     8   NA   100

英文:

Please check if this is the expected output

df1 &lt;- purrr::map(mtcars_list, ~ mtcars %&gt;% select(cyl,vs,!!sym(.x)) %&gt;% 
                    mutate(n=n() , .by=c(cyl, .data[[.x]], vs)) %&gt;% 
                    mutate(n2=n(), .by=c(cyl, .data[[.x]]) ) %&gt;% 
                    group_by(cyl, .data[[.x]], vs) %&gt;% 
                    slice_tail(n=1) %&gt;% 
                    mutate(perc=(n/n2)*100) %&gt;% 
                    pivot_wider(id_cols = c(cyl,.data[[.x]]), names_from = vs, values_from = perc)
)
df1
[[1]]
# A tibble: 6 &#215; 4
# Groups:   cyl, am [6]
    cyl    am   `1`   `0`
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     4     0 100    NA  
2     4     1  87.5  12.5
3     6     0 100    NA  
4     6     1  NA   100  
5     8     0  NA   100  
6     8     1  NA   100  
[[2]]
# A tibble: 8 &#215; 4
# Groups:   cyl, gear [8]
    cyl  gear   `1`   `0`
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     4     3   100    NA
2     4     4   100    NA
3     4     5    50    50
4     6     3   100    NA
5     6     4    50    50
6     6     5    NA   100
7     8     3    NA   100
8     8     5    NA   100
[[3]]
# A tibble: 9 &#215; 4
# Groups:   cyl, carb [9]
    cyl  carb   `1`   `0`
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     4     1 100    NA  
2     4     2  83.3  16.7
3     6     1 100    NA  
4     6     4  50    50  
5     6     6  NA   100  
6     8     2  NA   100  
7     8     3  NA   100  
8     8     4  NA   100  
9     8     8  NA   100

答案2

得分: 1

你的 dplyr 代码不起作用。对于你想要的输出格式，基本的 R 函数 table 和 prop.table 可以更快地达到目标：

purrr::map(
  mtcars_list, \(x) 
  (table(mtcars$cyl, mtcars[[x]]) |&gt; prop.table(margin = 1)) * 100
)
# [[1]]
#            0        1
#   4 27.27273 72.72727
#   6 57.14286 42.85714
#   8 85.71429 14.28571
# 
# [[2]]
#             3         4         5
#   4  9.090909 72.727273 18.181818
#   6 28.571429 57.142857 14.285714
#   8 85.714286  0.000000 14.285714
# 
# [[3]]
#             1         2         3         4         6         8
#   4 45.454545 54.545455  0.000000  0.000000  0.000000  0.000000
#   6 28.571429  0.000000  0.000000 57.142857 14.285714  0.000000
#   8  0.000000 28.571429 21.428571 42.857143  0.000000  7.142857

注意：我将代码部分保持不变，只进行了翻译。

英文:

Your dplyr code doesn't work. For the output format you want, the base R functions table and prop.table get you there faster:

purrr::map(
mtcars_list, \(x) 
(table(mtcars$cyl, mtcars[[x]]) |&gt; prop.table(margin = 1)) * 100
)
# [[1]]
#            0        1
#   4 27.27273 72.72727
#   6 57.14286 42.85714
#   8 85.71429 14.28571
# 
# [[2]]
#             3         4         5
#   4  9.090909 72.727273 18.181818
#   6 28.571429 57.142857 14.285714
#   8 85.714286  0.000000 14.285714
# 
# [[3]]
#             1         2         3         4         6         8
#   4 45.454545 54.545455  0.000000  0.000000  0.000000  0.000000
#   6 28.571429  0.000000  0.000000 57.142857 14.285714  0.000000
#   8  0.000000 28.571429 21.428571 42.857143  0.000000  7.142857

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用purr或map对因子变量的总百分比进行汇总。

问题

答案1

答案2

如何从具有索引的图层中访问SpatRaster中的特定图层？

R: “保护” 代码免受 “参数暗示不同行数” 的影响

Why does my deSolve model in R stop integrating when I incorporate a conditional source of mortality in my population model?

在R中使用Webscraping FBRef获取个人球员统计数据

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。