英文:
Create a table that return counts of each value for multiple variables
问题
这是我的数据框:
# 加载库
library(data.table)
library(expss)
library(sjlabelled) # 调用函数 as_label()
# 创建数据框
a <- data.table("b1" = c(1, 2, 2, 2),
                "b2" = c(1, 2, 1, 1),
                "b3" = c(1, 1, 1, 1))
# 设置值标签
val_lab(a) = num_lab("
            1 是
            2 否    
")
a = as_label(a)
看起来像这样:
> a
    b1  b2  b3
1: 是 是 是
2: 否 否 是
3: 否 是 是
4: 否 是 是
我想创建一个数据集,返回每个值的总出现次数,应该看起来像下面这样:
  类别  b1  b2  b3
1:  是     1   3   4
2:  否     3   1   0
这可能与 Stata 中的 tabout 命令类似工作。此外,返回百分比会很好,如下所示:
  类别  b1   b2   b3
1:  是   25   75   100
2:  否   75   25   0
3:  合计 100  100  100
英文:
This is my dataframe
# Load libraries
library(data.table)
library(expss)
library(sjlabelled) # to call function as_label()
# Create dataframe
a <- data.table("b1" = c(1, 2, 2, 2),  
                "b2" = c(1, 2, 1, 1),
                "b3" = c(1, 1, 1, 1))
# Set value label
val_lab(a) = num_lab("
            1 Yes
            2 No    
")
a = as_label(a)
and it looks like this:
> a
    b1  b2  b3
1: Yes Yes Yes
2:  No  No Yes
3:  No Yes Yes
4:  No Yes Yes
I want to create a dataset that return the total occurrence counts of each value and it should look like the following:
  Category  b1  b2  b3
1:  Yes     1   3   4
2:  No      3   1   0
This might work in a similar way as the tabout command in Stata. Also, it would be great to return percentage like this
  Category  b1   b2   b3
1:  Yes    25   75   100
2:  No     75   25   0
3:  sum    100  100  100
答案1
得分: 3
以下是翻译好的部分:
One possibility is to use the janitor package after doing some transformations with tidyverse packages:
一个可能的方法是在使用 tidyverse 包进行一些转换之后使用 janitor 包:
Output
输出
As for the percentages you can use the janitor adorn functions:
至于百分比,您可以使用 janitor 的 adorn 函数:
Note, see the ?adorn_pct_formatting for additional formatting options.
注意,查看 ?adorn_pct_formatting 以获取其他格式选项。
Output
输出
英文:
One possibility is to use the janitor package after doing some transformations with tidyverse packages:
library(janitor)
library(dplyr)
library(tidyr)
counts <- a %>% 
  pivot_longer(everything(), values_to = "Category") %>% 
  mutate(Category = c("Yes", "No")[Category]) %>% 
  tabyl(Category, name)
Output
 Category b1 b2 b3
       No  3  1  0
      Yes  1  3  4
As for the percentages you can use the janitor adorn functions:
counts %>% 
  adorn_percentages(denominator = "col") %>% 
  adorn_totals("row") %>% 
  adorn_pct_formatting()
Note, see the ?adorn_pct_formatting for additional formatting options.
Output
 Category     b1     b2     b3
       No  75.0%  25.0%   0.0%
      Yes  25.0%  75.0% 100.0%
    Total 100.0% 100.0% 100.0%
答案2
得分: 2
你可以在tidyverse中使用数据透视和简单聚合,尽管我也同意@LMc的看法,janitor 包也是一个用于制表摘要的很好的选择。
library(tidyverse)
a %>%
  pivot_longer(everything()) %>%
  group_by(name, value) %>%
  summarise(n = n()) %>%
  mutate(p = n / sum(n)) %>%
  pivot_wider(id_cols = value, names_from = name, values_from = n, values_fill = 0)
# A tibble: 2 × 4
  value    b1    b2    b3
  <chr> <int> <int> <int>
1 No        3     1     0
2 Yes       1     3     4
或者使用 values_from = p 替代:
# A tibble: 2 × 4
  value    b1    b2    b3
  <chr> <dbl> <dbl> <dbl>
1 No     0.75  0.25     0
2 Yes    0.25  0.75     1
英文:
You can use pivots and simple aggregations with tidyverse, although I would also agree with @LMc that janitor is a great package for tabular summaries.
library(tidyverse)
a |> 
  pivot_longer(everything()) |> 
  group_by(name, value) |> 
  summarise(n = n()) |> 
  mutate(p = n / sum(n)) |> 
  pivot_wider(id_cols = value, names_from = name, values_from = n, values_fill = 0)
# A tibble: 2 × 4
  value    b1    b2    b3
  <chr> <int> <int> <int>
1 No        3     1     0
2 Yes       1     3     4
Or with values_from = p instead:
# A tibble: 2 × 4
  value    b1    b2    b3
  <chr> <dbl> <dbl> <dbl>
1 No     0.75  0.25     0
2 Yes    0.25  0.75     1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论