英文:
Create a table that return counts of each value for multiple variables
问题
这是我的数据框:
# 加载库
library(data.table)
library(expss)
library(sjlabelled) # 调用函数 as_label()
# 创建数据框
a <- data.table("b1" = c(1, 2, 2, 2),
"b2" = c(1, 2, 1, 1),
"b3" = c(1, 1, 1, 1))
# 设置值标签
val_lab(a) = num_lab("
1 是
2 否
")
a = as_label(a)
看起来像这样:
> a
b1 b2 b3
1: 是 是 是
2: 否 否 是
3: 否 是 是
4: 否 是 是
我想创建一个数据集,返回每个值的总出现次数,应该看起来像下面这样:
类别 b1 b2 b3
1: 是 1 3 4
2: 否 3 1 0
这可能与 Stata 中的 tabout
命令类似工作。此外,返回百分比会很好,如下所示:
类别 b1 b2 b3
1: 是 25 75 100
2: 否 75 25 0
3: 合计 100 100 100
英文:
This is my dataframe
# Load libraries
library(data.table)
library(expss)
library(sjlabelled) # to call function as_label()
# Create dataframe
a <- data.table("b1" = c(1, 2, 2, 2),
"b2" = c(1, 2, 1, 1),
"b3" = c(1, 1, 1, 1))
# Set value label
val_lab(a) = num_lab("
1 Yes
2 No
")
a = as_label(a)
and it looks like this:
> a
b1 b2 b3
1: Yes Yes Yes
2: No No Yes
3: No Yes Yes
4: No Yes Yes
I want to create a dataset that return the total occurrence counts of each value and it should look like the following:
Category b1 b2 b3
1: Yes 1 3 4
2: No 3 1 0
This might work in a similar way as the tabout
command in Stata. Also, it would be great to return percentage like this
Category b1 b2 b3
1: Yes 25 75 100
2: No 75 25 0
3: sum 100 100 100
答案1
得分: 3
以下是翻译好的部分:
One possibility is to use the janitor
package after doing some transformations with tidyverse packages:
一个可能的方法是在使用 tidyverse 包进行一些转换之后使用 janitor
包:
Output
输出
As for the percentages you can use the janitor
adorn
functions:
至于百分比,您可以使用 janitor
的 adorn
函数:
Note, see the ?adorn_pct_formatting
for additional formatting options.
注意,查看 ?adorn_pct_formatting
以获取其他格式选项。
Output
输出
英文:
One possibility is to use the janitor
package after doing some transformations with tidyverse packages:
library(janitor)
library(dplyr)
library(tidyr)
counts <- a %>%
pivot_longer(everything(), values_to = "Category") %>%
mutate(Category = c("Yes", "No")[Category]) %>%
tabyl(Category, name)
Output
Category b1 b2 b3
No 3 1 0
Yes 1 3 4
As for the percentages you can use the janitor
adorn
functions:
counts %>%
adorn_percentages(denominator = "col") %>%
adorn_totals("row") %>%
adorn_pct_formatting()
Note, see the ?adorn_pct_formatting
for additional formatting options.
Output
Category b1 b2 b3
No 75.0% 25.0% 0.0%
Yes 25.0% 75.0% 100.0%
Total 100.0% 100.0% 100.0%
答案2
得分: 2
你可以在tidyverse中使用数据透视和简单聚合,尽管我也同意@LMc的看法,janitor
包也是一个用于制表摘要的很好的选择。
library(tidyverse)
a %>%
pivot_longer(everything()) %>%
group_by(name, value) %>%
summarise(n = n()) %>%
mutate(p = n / sum(n)) %>%
pivot_wider(id_cols = value, names_from = name, values_from = n, values_fill = 0)
# A tibble: 2 × 4
value b1 b2 b3
<chr> <int> <int> <int>
1 No 3 1 0
2 Yes 1 3 4
或者使用 values_from = p
替代:
# A tibble: 2 × 4
value b1 b2 b3
<chr> <dbl> <dbl> <dbl>
1 No 0.75 0.25 0
2 Yes 0.25 0.75 1
英文:
You can use pivots and simple aggregations with tidyverse, although I would also agree with @LMc that janitor
is a great package for tabular summaries.
library(tidyverse)
a |>
pivot_longer(everything()) |>
group_by(name, value) |>
summarise(n = n()) |>
mutate(p = n / sum(n)) |>
pivot_wider(id_cols = value, names_from = name, values_from = n, values_fill = 0)
# A tibble: 2 × 4
value b1 b2 b3
<chr> <int> <int> <int>
1 No 3 1 0
2 Yes 1 3 4
Or with values_from = p
instead:
# A tibble: 2 × 4
value b1 b2 b3
<chr> <dbl> <dbl> <dbl>
1 No 0.75 0.25 0
2 Yes 0.25 0.75 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论