2023年4月7日 01:19:59go评论89阅读模式

英文:

Create a table that return counts of each value for multiple variables

问题

这是我的数据框：

# 加载库
library(data.table)
library(expss)
library(sjlabelled) # 调用函数 as_label()

# 创建数据框
a <- data.table("b1" = c(1, 2, 2, 2),
                "b2" = c(1, 2, 1, 1),
                "b3" = c(1, 1, 1, 1))

# 设置值标签
val_lab(a) = num_lab("
            1 是
            2 否    
")
a = as_label(a)

看起来像这样：

> a
    b1  b2  b3
1: 是 是 是
2: 否 否 是
3: 否 是 是
4: 否 是 是

我想创建一个数据集，返回每个值的总出现次数，应该看起来像下面这样：

  类别  b1  b2  b3
1:  是     1   3   4
2:  否     3   1   0

这可能与 Stata 中的 tabout 命令类似工作。此外，返回百分比会很好，如下所示：

  类别  b1   b2   b3
1:  是   25   75   100
2:  否   75   25   0
3:  合计 100  100  100

英文:

This is my dataframe

# Load libraries
library(data.table)
library(expss)
library(sjlabelled) # to call function as_label()

# Create dataframe
a &lt;- data.table(&quot;b1&quot; = c(1, 2, 2, 2),  
                &quot;b2&quot; = c(1, 2, 1, 1),
                &quot;b3&quot; = c(1, 1, 1, 1))

# Set value label
val_lab(a) = num_lab(&quot;
            1 Yes
            2 No    
&quot;)
a = as_label(a)

and it looks like this:

&gt; a
    b1  b2  b3
1: Yes Yes Yes
2:  No  No Yes
3:  No Yes Yes
4:  No Yes Yes

I want to create a dataset that return the total occurrence counts of each value and it should look like the following:

  Category  b1  b2  b3
1:  Yes     1   3   4
2:  No      3   1   0

This might work in a similar way as the tabout command in Stata. Also, it would be great to return percentage like this

  Category  b1   b2   b3
1:  Yes    25   75   100
2:  No     75   25   0
3:  sum    100  100  100

答案1

得分: 3

以下是翻译好的部分：

One possibility is to use the janitor package after doing some transformations with tidyverse packages:

一个可能的方法是在使用 tidyverse 包进行一些转换之后使用 janitor 包：

Output

输出

As for the percentages you can use the janitor adorn functions:

至于百分比，您可以使用 janitor 的 adorn 函数：

Note, see the ?adorn_pct_formatting for additional formatting options.

注意，查看 ?adorn_pct_formatting 以获取其他格式选项。

Output

输出

英文:

One possibility is to use the janitor package after doing some transformations with tidyverse packages:

library(janitor)
library(dplyr)
library(tidyr)

counts &lt;- a %&gt;% 
  pivot_longer(everything(), values_to = &quot;Category&quot;) %&gt;% 
  mutate(Category = c(&quot;Yes&quot;, &quot;No&quot;)[Category]) %&gt;% 
  tabyl(Category, name)

Output

 Category b1 b2 b3
       No  3  1  0
      Yes  1  3  4

As for the percentages you can use the janitor adorn functions:

counts %&gt;% 
  adorn_percentages(denominator = &quot;col&quot;) %&gt;% 
  adorn_totals(&quot;row&quot;) %&gt;% 
  adorn_pct_formatting()

Note, see the ?adorn_pct_formatting for additional formatting options.

Output

 Category     b1     b2     b3
       No  75.0%  25.0%   0.0%
      Yes  25.0%  75.0% 100.0%
    Total 100.0% 100.0% 100.0%

答案2

得分: 2

你可以在tidyverse中使用数据透视和简单聚合，尽管我也同意@LMc的看法，janitor 包也是一个用于制表摘要的很好的选择。

library(tidyverse)

a %>%
  pivot_longer(everything()) %>%
  group_by(name, value) %>%
  summarise(n = n()) %>%
  mutate(p = n / sum(n)) %>%
  pivot_wider(id_cols = value, names_from = name, values_from = n, values_fill = 0)

# A tibble: 2 × 4
  value    b1    b2    b3
  <chr> <int> <int> <int>
1 No        3     1     0
2 Yes       1     3     4

或者使用 values_from = p 替代：

# A tibble: 2 × 4
  value    b1    b2    b3
  <chr> <dbl> <dbl> <dbl>
1 No     0.75  0.25     0
2 Yes    0.25  0.75     1

英文:

You can use pivots and simple aggregations with tidyverse, although I would also agree with @LMc that janitor is a great package for tabular summaries.

library(tidyverse)

a |&gt; 
  pivot_longer(everything()) |&gt; 
  group_by(name, value) |&gt; 
  summarise(n = n()) |&gt; 
  mutate(p = n / sum(n)) |&gt; 
  pivot_wider(id_cols = value, names_from = name, values_from = n, values_fill = 0)

# A tibble: 2 &#215; 4
  value    b1    b2    b3
  &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
1 No        3     1     0
2 Yes       1     3     4

Or with values_from = p instead:

# A tibble: 2 &#215; 4
  value    b1    b2    b3
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 No     0.75  0.25     0
2 Yes    0.25  0.75     1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个表格，返回多个变量每个值的计数。

问题

答案1

答案2

Pandas数据框如何通过比较列A和B的正则表达式输出来删除行？

如何针对特定ID保留包含特定短语的字符串？

Unnesting/rectangling/flattening a nested list using `tidyr::unnest_longer()`

Downloading images from web and its attributes in R.

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论