2023年2月24日 15:37:20go评论93阅读模式

英文:

R - How to put aggregated row results as columns

问题

假设有一个名为df的数据框：

age category
12  A
15  B
12  A
13  C
14  B
14  D

你想使用聚合函数来找出每个年龄（age）中每个类别（category）{A, B, C, D}的出现次数。A、B、C和D的计数分别作为列，因此输出的数据框应如下所示：

age A B C D
12  2 0 0 0
13  0 0 1 0
14  0 1 0 1
15  0 1 0 0

你尝试的代码如下：

agdf &lt;- aggregate(df, by=list(df$age, df$category), FUN=length)

但这只会得到以下结果：

age category x
12  A        2
15  B        1
14  B        1
13  C        1
14  D        1

另一个问题是原始的df数据框还有其他列，但为简化起见，它们已被省略。然而，使用FUN=length的聚合方法将所有这些其他列都变成与"x"列相同的计数值。你如何保留这些值？

例如：

age category x  y  z
12  A        2  2  2
15  B        1  1  1
14  B        1  1  1
13  C        1  1  1
14  D        1  1  1

但我只想要y和z保持它们的原始值，只需要一个计数列x。如何将数据框结构调整为所需的形式？

解决方法：

你可以使用reshape2包中的dcast函数来实现你想要的结果。首先，确保你已经加载了reshape2包。然后，可以按照以下方式操作：

library(reshape2)
# 使用dcast函数
result <- dcast(df, age ~ category, value.var = "category", fun.aggregate = length, fill = 0)
# 如果需要保留其他列，可以使用merge函数
# 假设df包括其他列y和z
df <- merge(df, result, by = "age", all = TRUE)

这将生成你所需的数据框，其中"A"、"B"、"C"和"D"的计数作为列，而其他列（例如y和z）也被保留。

英文:

Suppose there is dataframe df

age category
12  A
15  B
12  A
13  C
14  B
14  D

I want to use aggregate to find the number of occurrences in each category {A, B, C, D} for each age. The number of A, B, C and D respectively are established as columns so the output data frame should look like

age A B C D
12  2 0 0 0
13  0 0 1 0
14  0 1 0 1
15  0 1 0 0

Attempt

agdf &lt;- aggregate(df, by=list(df$age, df$category), FUN=length)

But doing this only gives me

age category x
12  A        2
15  B        1
14  B        1
13  C        1
14  D        1

Another problem is that the original df has other columns but they have been omitted for simplicity. Yet with this aggregate approach using FUN=length, it turns all those other columns into the same count value as x. How can I keep those values?

E.g.

age category x  y  z
12  A        2  2  2
15  B        1  1  1
14  B        1  1  1
13  C        1  1  1
14  D        1  1  1

but I want y and z to keep their original values, only need 1 count column x

How to massage it to the desired structure?

答案1

得分: 1

xtabs(~., df1)

    category
age  A B C D
  12 2 0 0 0
  13 0 0 1 0
  14 0 1 0 1
  15 0 1 0 0

table(df1)
category
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0

reshape2::dcast(df1, age~category)
age A B C D
1 12 2 0 0 0
2 13 0 0 1 0
3 14 0 1 0 1
4 15 0 1 0 0

pivot_wider(df1, id_cols = age, names_from = category,
values_from = category, values_fn = length, values_fill = 0)

A tibble: 4 × 5

age     A     B     C     D

1 12 2 0 0 0
2 15 0 1 0 0
3 13 0 0 1 0
4 14 0 1 0 1

英文:

xtabs(~., df1)
    category
age  A B C D
  12 2 0 0 0
  13 0 0 1 0
  14 0 1 0 1
  15 0 1 0 0
table(df1)
    category
age  A B C D
  12 2 0 0 0
  13 0 0 1 0
  14 0 1 0 1
  15 0 1 0 0
reshape2::dcast(df1, age~category)
  age A B C D
1  12 2 0 0 0
2  13 0 0 1 0
3  14 0 1 0 1
4  15 0 1 0 0
pivot_wider(df1, id_cols = age, names_from = category,
              values_from = category, values_fn = length, values_fill = 0)
# A tibble: 4 &#215; 5
    age     A     B     C     D
  &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
1    12     2     0     0     0
2    15     0     1     0     0
3    13     0     0     1     0
4    14     0     1     0     1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R – 如何将汇总的行结果放入列中

问题

答案1

A tibble: 4 × 5

从数据框单元格中删除特定元素时，只需将该元素从列表中删除。

如何用同一行中的列值替换列表中的列名？

根据另一个数据框填充数据框中的缺失值。

递归迷宫解决程序在R中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。