在R中获取所选行的数据框总和

huangapple go评论162阅读模式
英文:

Getting the sum of selected rows in a data frame in R

问题

I'd like to combine the population number for the 15 and 18 groups into the 15 age group, the 20, 21, and 22 age groups into the 20 age group, the 60 and 62 age groups into the 60 age group, and the 65 and 67 age groups into the 65 age group.

我想将15和18岁年龄组的人口数量合并为15岁年龄组,将20、21和22岁年龄组的人口数量合并为20岁年龄组,将60和62岁年龄组的人口数量合并为60岁年龄组,将65和67岁年龄组的人口数量合并为65岁年龄组。

英文:

I'm creating a population age pyramid for my metro area from US Census data. Here's some sample data:

  1. Sex = rep(c("male", "female"), each = 23)
  2. Age_group = as.factor(rep(c(0, 5, 10, 15, 18, 20, 21, 22, 25, 30, 35, 40, 45, 50, 55, 60, 62, 65, 67, 70, 75, 80, 85), length = 46))
  3. pop = c(-24684, -24946, -26465, -16103, -11431, -6233, -6071, -14838, -27420, -26246, -24612, -22753, -23036, -24870, -27676, -11400, -14267, -9493, -12132, -17456, -10913, -7241, -5836, 23322, 23225, 25521, 15128, 11388, 5858, 5300, 15385, 27610, 26368, 25329, 23045, 24025, 25847, 28077, 12419, 16241, 10000, 14411, 20807, 14309, 10216, 11125)
  4. pop_pyramid <- data.frame(Sex, Age_group, pop)

My issue: While most of the data is in 5-year age groups, the 15, 18, 20, 21, 22, 60, 62, 65, and 67 age groups for each sex are not. I'd like to combine the population number for the 15 and 18 groups into the 15 age group, the 20, 21, and 22 age groups into the 20 age group, the 60 and 62 age groups the 60 age group, and the 65 and 67 age groups into the 65 age group.

Simply adding the rows I want fails as Age_group and Sex are factors; rowSums also fails for the same reason. I thought of aggregation but only need those select rows aggregated, not the entire data frame. Is there a way to do this without resorting to doing it by hand?

Thanks.

答案1

得分: 1

我建议将 Age_group 中的 5 取模操作移除,然后您可以继续进行计算:

  1. pop_pyramid$Age_group2 <- as.numeric(as.character(Age_group)) -
  2. as.numeric(as.character(Age_group))%%5
  3. pop_pyramid
  4. Sex Age_group pop Age_group2
  5. 1 male 0 -24684 0
  6. 2 male 5 -24946 5
  7. 3 male 10 -26465 10
  8. 4 male 15 -16103 15
  9. 5 male 18 -11431 15
  10. 6 male 20 -6233 20
  11. 7 male 21 -6071 20
  12. 8 male 22 -14838 20
  13. 9 male 25 -27420 25
  14. 10 male 30 -26246 30
  15. 11 male 35 -24612 35
  16. 12 male 40 -22753 40
  17. 13 male 45 -23036 45
  18. 14 male 50 -24870 50
  19. 15 male 55 -27676 55
  20. 16 male 60 -11400 60
  21. 17 male 62 -14267 60
  22. 18 male 65 -9493 65
  23. 19 male 67 -12132 65
  24. 20 male 70 -17456 70
  25. 21 male 75 -10913 75
  26. 22 male 80 -7241 80
  27. 23 male 85 -5836 85
  28. 24 female 0 23322 0
  29. 25 female 5 23225 5
  30. 26 female 10 25521 10
  31. 27 female 15 15128 15
  32. 28 female 18 11388 15
  33. 29 female 20 5858 20
  34. 30 female 21 5300 20
  35. 31 female 22 15385 20
英文:

I would suggest to remove the modulus of 5 from Age_group then you can continue with the calculations:

  1. pop_pyramid$Age_group2 &lt;- as.numeric(as.character(Age_group)) -
  2. as.numeric(as.character(Age_group))%%5
  3. pop_pyramid
  4. Sex Age_group pop Age_group2
  5. 1 male 0 -24684 0
  6. 2 male 5 -24946 5
  7. 3 male 10 -26465 10
  8. 4 male 15 -16103 15
  9. 5 male 18 -11431 15
  10. 6 male 20 -6233 20
  11. 7 male 21 -6071 20
  12. 8 male 22 -14838 20
  13. 9 male 25 -27420 25
  14. 10 male 30 -26246 30
  15. 11 male 35 -24612 35
  16. 12 male 40 -22753 40
  17. 13 male 45 -23036 45
  18. 14 male 50 -24870 50
  19. 15 male 55 -27676 55
  20. 16 male 60 -11400 60
  21. 17 male 62 -14267 60
  22. 18 male 65 -9493 65
  23. 19 male 67 -12132 65
  24. 20 male 70 -17456 70
  25. 21 male 75 -10913 75
  26. 22 male 80 -7241 80
  27. 23 male 85 -5836 85
  28. 24 female 0 23322 0
  29. 25 female 5 23225 5
  30. 26 female 10 25521 10
  31. 27 female 15 15128 15
  32. 28 female 18 11388 15
  33. 29 female 20 5858 20
  34. 30 female 21 5300 20
  35. 31 female 22 15385 20

答案2

得分: 0

以下是翻译后的代码部分:

  1. # 使用 `cut` 函数来创建每个年龄组的计数表,但首先需要将它们转换为数值型值:
  2. table(cut(as.numeric(as.character(pop_pyramid$Age_group)),
  3. breaks = seq(0, 100, 5),
  4. labels = paste0(seq(0, 95, 5), "-", seq(5, 100, 5)),
  5. include.lowest = TRUE))

输出:

  1. 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 90-95 95-100
  2. 4 2 2 4 6 2 2 2 2 2 2 2 4 4 2 2 2 0 0 0

如果您更愿意重新分类它们,您可以进行如下调整:

  1. pop_pyramid$new_agecat <- cut(as.numeric(as.character(pop_pyramid$Age_group)),
  2. breaks = seq(0, 100, 5),
  3. labels = paste0(seq(0, 95, 5), "-", seq(5, 100, 5)),
  4. include.lowest = TRUE)

输出:

  1. # Sex Age_group pop new_agecat
  2. # 1 male 0 -24684 0-5
  3. # 2 male 5 -24946 0-5
  4. # 3 male 10 -26465 5-10
  5. # 4 male 15 -16103 10-15
  6. # 5 male 18 -11431 15-20

希望这对您有所帮助。

英文:

You can use cut to create a table of counts for each age group, but you will want to convert them to numeric values first:

  1. table(cut(as.numeric(as.character(pop_pyramid$Age_group)),
  2. breaks = seq(0, 100, 5),
  3. labels = paste0(seq(0, 95, 5),&quot;-&quot;, seq(5, 100, 5)),
  4. include.lowest = TRUE) )

Output:

  1. 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 90-95 95-100
  2. 4 2 2 4 6 2 2 2 2 2 2 2 4 4 2 2 2 0 0 0

If you would rather just recategorize them, you could tweak to do:

  1. pop_pyramid$new_agecat &lt;- cut(as.numeric(as.character(pop_pyramid$Age_group)),
  2. breaks = seq(0, 100, 5),
  3. labels = paste0(seq(0, 95, 5),&quot;-&quot;, seq(5, 100, 5)),
  4. include.lowest = TRUE)

Output:

  1. # Sex Age_group pop new_agecat
  2. # 1 male 0 -24684 0-5
  3. # 2 male 5 -24946 0-5
  4. # 3 male 10 -26465 5-10
  5. # 4 male 15 -16103 10-15
  6. # 5 male 18 -11431 15-20

huangapple
  • 本文由 发表于 2023年6月30日 02:49:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76583870.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定