英文:
Getting the sum of selected rows in a data frame in R
问题
I'd like to combine the population number for the 15 and 18 groups into the 15 age group, the 20, 21, and 22 age groups into the 20 age group, the 60 and 62 age groups into the 60 age group, and the 65 and 67 age groups into the 65 age group.
我想将15和18岁年龄组的人口数量合并为15岁年龄组,将20、21和22岁年龄组的人口数量合并为20岁年龄组,将60和62岁年龄组的人口数量合并为60岁年龄组,将65和67岁年龄组的人口数量合并为65岁年龄组。
英文:
I'm creating a population age pyramid for my metro area from US Census data. Here's some sample data:
Sex = rep(c("male", "female"), each = 23)
Age_group = as.factor(rep(c(0, 5, 10, 15, 18, 20, 21, 22, 25, 30, 35, 40, 45, 50, 55, 60, 62, 65, 67, 70, 75, 80, 85), length = 46))
pop = c(-24684, -24946, -26465, -16103, -11431, -6233, -6071, -14838, -27420, -26246, -24612, -22753, -23036, -24870, -27676, -11400, -14267, -9493, -12132, -17456, -10913, -7241, -5836, 23322, 23225, 25521, 15128, 11388, 5858, 5300, 15385, 27610, 26368, 25329, 23045, 24025, 25847, 28077, 12419, 16241, 10000, 14411, 20807, 14309, 10216, 11125)
pop_pyramid <- data.frame(Sex, Age_group, pop)
My issue: While most of the data is in 5-year age groups, the 15, 18, 20, 21, 22, 60, 62, 65, and 67 age groups for each sex are not. I'd like to combine the population number for the 15 and 18 groups into the 15 age group, the 20, 21, and 22 age groups into the 20 age group, the 60 and 62 age groups the 60 age group, and the 65 and 67 age groups into the 65 age group.
Simply adding the rows I want fails as Age_group and Sex are factors; rowSums also fails for the same reason. I thought of aggregation but only need those select rows aggregated, not the entire data frame. Is there a way to do this without resorting to doing it by hand?
Thanks.
答案1
得分: 1
我建议将 Age_group 中的 5 取模操作移除,然后您可以继续进行计算:
pop_pyramid$Age_group2 <- as.numeric(as.character(Age_group)) -
as.numeric(as.character(Age_group))%%5
pop_pyramid
Sex Age_group pop Age_group2
1 male 0 -24684 0
2 male 5 -24946 5
3 male 10 -26465 10
4 male 15 -16103 15
5 male 18 -11431 15
6 male 20 -6233 20
7 male 21 -6071 20
8 male 22 -14838 20
9 male 25 -27420 25
10 male 30 -26246 30
11 male 35 -24612 35
12 male 40 -22753 40
13 male 45 -23036 45
14 male 50 -24870 50
15 male 55 -27676 55
16 male 60 -11400 60
17 male 62 -14267 60
18 male 65 -9493 65
19 male 67 -12132 65
20 male 70 -17456 70
21 male 75 -10913 75
22 male 80 -7241 80
23 male 85 -5836 85
24 female 0 23322 0
25 female 5 23225 5
26 female 10 25521 10
27 female 15 15128 15
28 female 18 11388 15
29 female 20 5858 20
30 female 21 5300 20
31 female 22 15385 20
英文:
I would suggest to remove the modulus of 5 from Age_group then you can continue with the calculations:
pop_pyramid$Age_group2 <- as.numeric(as.character(Age_group)) -
as.numeric(as.character(Age_group))%%5
pop_pyramid
Sex Age_group pop Age_group2
1 male 0 -24684 0
2 male 5 -24946 5
3 male 10 -26465 10
4 male 15 -16103 15
5 male 18 -11431 15
6 male 20 -6233 20
7 male 21 -6071 20
8 male 22 -14838 20
9 male 25 -27420 25
10 male 30 -26246 30
11 male 35 -24612 35
12 male 40 -22753 40
13 male 45 -23036 45
14 male 50 -24870 50
15 male 55 -27676 55
16 male 60 -11400 60
17 male 62 -14267 60
18 male 65 -9493 65
19 male 67 -12132 65
20 male 70 -17456 70
21 male 75 -10913 75
22 male 80 -7241 80
23 male 85 -5836 85
24 female 0 23322 0
25 female 5 23225 5
26 female 10 25521 10
27 female 15 15128 15
28 female 18 11388 15
29 female 20 5858 20
30 female 21 5300 20
31 female 22 15385 20
答案2
得分: 0
以下是翻译后的代码部分:
# 使用 `cut` 函数来创建每个年龄组的计数表,但首先需要将它们转换为数值型值:
table(cut(as.numeric(as.character(pop_pyramid$Age_group)),
breaks = seq(0, 100, 5),
labels = paste0(seq(0, 95, 5), "-", seq(5, 100, 5)),
include.lowest = TRUE))
输出:
0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 90-95 95-100
4 2 2 4 6 2 2 2 2 2 2 2 4 4 2 2 2 0 0 0
如果您更愿意重新分类它们,您可以进行如下调整:
pop_pyramid$new_agecat <- cut(as.numeric(as.character(pop_pyramid$Age_group)),
breaks = seq(0, 100, 5),
labels = paste0(seq(0, 95, 5), "-", seq(5, 100, 5)),
include.lowest = TRUE)
输出:
# Sex Age_group pop new_agecat
# 1 male 0 -24684 0-5
# 2 male 5 -24946 0-5
# 3 male 10 -26465 5-10
# 4 male 15 -16103 10-15
# 5 male 18 -11431 15-20
希望这对您有所帮助。
英文:
You can use cut
to create a table of counts for each age group, but you will want to convert them to numeric values first:
table(cut(as.numeric(as.character(pop_pyramid$Age_group)),
breaks = seq(0, 100, 5),
labels = paste0(seq(0, 95, 5),"-", seq(5, 100, 5)),
include.lowest = TRUE) )
Output:
0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90 90-95 95-100
4 2 2 4 6 2 2 2 2 2 2 2 4 4 2 2 2 0 0 0
If you would rather just recategorize them, you could tweak to do:
pop_pyramid$new_agecat <- cut(as.numeric(as.character(pop_pyramid$Age_group)),
breaks = seq(0, 100, 5),
labels = paste0(seq(0, 95, 5),"-", seq(5, 100, 5)),
include.lowest = TRUE)
Output:
# Sex Age_group pop new_agecat
# 1 male 0 -24684 0-5
# 2 male 5 -24946 0-5
# 3 male 10 -26465 5-10
# 4 male 15 -16103 10-15
# 5 male 18 -11431 15-20
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论