R – 如何将汇总的行结果放入列中

huangapple go评论93阅读模式
英文:

R - How to put aggregated row results as columns

问题

假设有一个名为df的数据框:

  1. age category
  2. 12 A
  3. 15 B
  4. 12 A
  5. 13 C
  6. 14 B
  7. 14 D

你想使用聚合函数来找出每个年龄(age)中每个类别(category){A, B, C, D}的出现次数。A、B、C和D的计数分别作为列,因此输出的数据框应如下所示:

  1. age A B C D
  2. 12 2 0 0 0
  3. 13 0 0 1 0
  4. 14 0 1 0 1
  5. 15 0 1 0 0

你尝试的代码如下:

  1. agdf <- aggregate(df, by=list(df$age, df$category), FUN=length)

但这只会得到以下结果:

  1. age category x
  2. 12 A 2
  3. 15 B 1
  4. 14 B 1
  5. 13 C 1
  6. 14 D 1

另一个问题是原始的df数据框还有其他列,但为简化起见,它们已被省略。然而,使用FUN=length的聚合方法将所有这些其他列都变成与"x"列相同的计数值。你如何保留这些值?

例如:

  1. age category x y z
  2. 12 A 2 2 2
  3. 15 B 1 1 1
  4. 14 B 1 1 1
  5. 13 C 1 1 1
  6. 14 D 1 1 1

但我只想要y和z保持它们的原始值,只需要一个计数列x。如何将数据框结构调整为所需的形式?

解决方法:

你可以使用reshape2包中的dcast函数来实现你想要的结果。首先,确保你已经加载了reshape2包。然后,可以按照以下方式操作:

  1. library(reshape2)
  2. # 使用dcast函数
  3. result <- dcast(df, age ~ category, value.var = "category", fun.aggregate = length, fill = 0)
  4. # 如果需要保留其他列,可以使用merge函数
  5. # 假设df包括其他列y和z
  6. df <- merge(df, result, by = "age", all = TRUE)

这将生成你所需的数据框,其中"A"、"B"、"C"和"D"的计数作为列,而其他列(例如y和z)也被保留。

英文:

Suppose there is dataframe df

  1. age category
  2. 12 A
  3. 15 B
  4. 12 A
  5. 13 C
  6. 14 B
  7. 14 D

I want to use aggregate to find the number of occurrences in each category {A, B, C, D} for each age. The number of A, B, C and D respectively are established as columns so the output data frame should look like

  1. age A B C D
  2. 12 2 0 0 0
  3. 13 0 0 1 0
  4. 14 0 1 0 1
  5. 15 0 1 0 0

Attempt

  1. agdf &lt;- aggregate(df, by=list(df$age, df$category), FUN=length)

But doing this only gives me

  1. age category x
  2. 12 A 2
  3. 15 B 1
  4. 14 B 1
  5. 13 C 1
  6. 14 D 1

Another problem is that the original df has other columns but they have been omitted for simplicity. Yet with this aggregate approach using FUN=length, it turns all those other columns into the same count value as x. How can I keep those values?

E.g.

  1. age category x y z
  2. 12 A 2 2 2
  3. 15 B 1 1 1
  4. 14 B 1 1 1
  5. 13 C 1 1 1
  6. 14 D 1 1 1

but I want y and z to keep their original values, only need 1 count column x

How to massage it to the desired structure?

答案1

得分: 1

xtabs(~., df1)

  1. category
  2. age A B C D
  3. 12 2 0 0 0
  4. 13 0 0 1 0
  5. 14 0 1 0 1
  6. 15 0 1 0 0

table(df1)
category
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0

reshape2::dcast(df1, age~category)
age A B C D
1 12 2 0 0 0
2 13 0 0 1 0
3 14 0 1 0 1
4 15 0 1 0 0

pivot_wider(df1, id_cols = age, names_from = category,
values_from = category, values_fn = length, values_fill = 0)

A tibble: 4 × 5

  1. age A B C D


1 12 2 0 0 0
2 15 0 1 0 0
3 13 0 0 1 0
4 14 0 1 0 1

英文:
  1. xtabs(~., df1)
  2. category
  3. age A B C D
  4. 12 2 0 0 0
  5. 13 0 0 1 0
  6. 14 0 1 0 1
  7. 15 0 1 0 0
  8. table(df1)
  9. category
  10. age A B C D
  11. 12 2 0 0 0
  12. 13 0 0 1 0
  13. 14 0 1 0 1
  14. 15 0 1 0 0
  15. reshape2::dcast(df1, age~category)
  16. age A B C D
  17. 1 12 2 0 0 0
  18. 2 13 0 0 1 0
  19. 3 14 0 1 0 1
  20. 4 15 0 1 0 0
  21. pivot_wider(df1, id_cols = age, names_from = category,
  22. values_from = category, values_fn = length, values_fill = 0)
  23. # A tibble: 4 &#215; 5
  24. age A B C D
  25. &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
  26. 1 12 2 0 0 0
  27. 2 15 0 1 0 0
  28. 3 13 0 0 1 0
  29. 4 14 0 1 0 1

huangapple
  • 本文由 发表于 2023年2月24日 15:37:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75553737.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定