英文:
R - How to put aggregated row results as columns
问题
假设有一个名为df的数据框:
age category
12 A
15 B
12 A
13 C
14 B
14 D
你想使用聚合函数来找出每个年龄(age)中每个类别(category){A, B, C, D}的出现次数。A、B、C和D的计数分别作为列,因此输出的数据框应如下所示:
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
你尝试的代码如下:
agdf <- aggregate(df, by=list(df$age, df$category), FUN=length)
但这只会得到以下结果:
age category x
12 A 2
15 B 1
14 B 1
13 C 1
14 D 1
另一个问题是原始的df数据框还有其他列,但为简化起见,它们已被省略。然而,使用FUN=length
的聚合方法将所有这些其他列都变成与"x"列相同的计数值。你如何保留这些值?
例如:
age category x y z
12 A 2 2 2
15 B 1 1 1
14 B 1 1 1
13 C 1 1 1
14 D 1 1 1
但我只想要y和z保持它们的原始值,只需要一个计数列x。如何将数据框结构调整为所需的形式?
解决方法:
你可以使用reshape2包中的dcast
函数来实现你想要的结果。首先,确保你已经加载了reshape2包。然后,可以按照以下方式操作:
library(reshape2)
# 使用dcast函数
result <- dcast(df, age ~ category, value.var = "category", fun.aggregate = length, fill = 0)
# 如果需要保留其他列,可以使用merge函数
# 假设df包括其他列y和z
df <- merge(df, result, by = "age", all = TRUE)
这将生成你所需的数据框,其中"A"、"B"、"C"和"D"的计数作为列,而其他列(例如y和z)也被保留。
英文:
Suppose there is dataframe df
age category
12 A
15 B
12 A
13 C
14 B
14 D
I want to use aggregate to find the number of occurrences in each category {A, B, C, D} for each age. The number of A, B, C and D respectively are established as columns so the output data frame should look like
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
Attempt
agdf <- aggregate(df, by=list(df$age, df$category), FUN=length)
But doing this only gives me
age category x
12 A 2
15 B 1
14 B 1
13 C 1
14 D 1
Another problem is that the original df
has other columns but they have been omitted for simplicity. Yet with this aggregate approach using FUN=length
, it turns all those other columns into the same count value as x
. How can I keep those values?
E.g.
age category x y z
12 A 2 2 2
15 B 1 1 1
14 B 1 1 1
13 C 1 1 1
14 D 1 1 1
but I want y and z to keep their original values, only need 1 count column x
How to massage it to the desired structure?
答案1
得分: 1
xtabs(~., df1)
category
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
table(df1)
category
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
reshape2::dcast(df1, age~category)
age A B C D
1 12 2 0 0 0
2 13 0 0 1 0
3 14 0 1 0 1
4 15 0 1 0 0
pivot_wider(df1, id_cols = age, names_from = category,
values_from = category, values_fn = length, values_fill = 0)
A tibble: 4 × 5
age A B C D
1 12 2 0 0 0
2 15 0 1 0 0
3 13 0 0 1 0
4 14 0 1 0 1
英文:
xtabs(~., df1)
category
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
table(df1)
category
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
reshape2::dcast(df1, age~category)
age A B C D
1 12 2 0 0 0
2 13 0 0 1 0
3 14 0 1 0 1
4 15 0 1 0 0
pivot_wider(df1, id_cols = age, names_from = category,
values_from = category, values_fn = length, values_fill = 0)
# A tibble: 4 × 5
age A B C D
<int> <int> <int> <int> <int>
1 12 2 0 0 0
2 15 0 1 0 0
3 13 0 0 1 0
4 14 0 1 0 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论