在R中获取所选行的数据框总和

huangapple go评论131阅读模式
英文:

Getting the sum of selected rows in a data frame in R

问题

I'd like to combine the population number for the 15 and 18 groups into the 15 age group, the 20, 21, and 22 age groups into the 20 age group, the 60 and 62 age groups into the 60 age group, and the 65 and 67 age groups into the 65 age group.

我想将15和18岁年龄组的人口数量合并为15岁年龄组,将20、21和22岁年龄组的人口数量合并为20岁年龄组,将60和62岁年龄组的人口数量合并为60岁年龄组,将65和67岁年龄组的人口数量合并为65岁年龄组。

英文:

I'm creating a population age pyramid for my metro area from US Census data. Here's some sample data:

Sex = rep(c("male", "female"), each = 23)
Age_group = as.factor(rep(c(0, 5, 10, 15, 18, 20, 21, 22, 25, 30, 35, 40, 45, 50, 55, 60, 62, 65, 67, 70, 75, 80, 85), length = 46))
pop = c(-24684, -24946, -26465, -16103, -11431, -6233, -6071, -14838, -27420, -26246, -24612, -22753, -23036, -24870, -27676, -11400, -14267, -9493, -12132, -17456, -10913, -7241, -5836, 23322, 23225, 25521, 15128, 11388, 5858, 5300, 15385, 27610, 26368, 25329, 23045, 24025, 25847, 28077, 12419, 16241, 10000, 14411, 20807, 14309, 10216, 11125)

pop_pyramid <- data.frame(Sex, Age_group, pop)

My issue: While most of the data is in 5-year age groups, the 15, 18, 20, 21, 22, 60, 62, 65, and 67 age groups for each sex are not. I'd like to combine the population number for the 15 and 18 groups into the 15 age group, the 20, 21, and 22 age groups into the 20 age group, the 60 and 62 age groups the 60 age group, and the 65 and 67 age groups into the 65 age group.

Simply adding the rows I want fails as Age_group and Sex are factors; rowSums also fails for the same reason. I thought of aggregation but only need those select rows aggregated, not the entire data frame. Is there a way to do this without resorting to doing it by hand?

Thanks.

答案1

得分: 1

我建议将 Age_group 中的 5 取模操作移除,然后您可以继续进行计算:

pop_pyramid$Age_group2 <- as.numeric(as.character(Age_group)) -
  as.numeric(as.character(Age_group))%%5
pop_pyramid

      Sex Age_group    pop Age_group2
1    male         0 -24684          0
2    male         5 -24946          5
3    male        10 -26465         10
4    male        15 -16103         15
5    male        18 -11431         15
6    male        20  -6233         20
7    male        21  -6071         20
8    male        22 -14838         20
9    male        25 -27420         25
10   male        30 -26246         30
11   male        35 -24612         35
12   male        40 -22753         40
13   male        45 -23036         45
14   male        50 -24870         50
15   male        55 -27676         55
16   male        60 -11400         60
17   male        62 -14267         60
18   male        65  -9493         65
19   male        67 -12132         65
20   male        70 -17456         70
21   male        75 -10913         75
22   male        80  -7241         80
23   male        85  -5836         85
24 female         0  23322          0
25 female         5  23225          5
26 female        10  25521         10
27 female        15  15128         15
28 female        18  11388         15
29 female        20   5858         20
30 female        21   5300         20
31 female        22  15385         20
英文:

I would suggest to remove the modulus of 5 from Age_group then you can continue with the calculations:

pop_pyramid$Age_group2 &lt;- as.numeric(as.character(Age_group)) -
  as.numeric(as.character(Age_group))%%5
pop_pyramid

      Sex Age_group    pop Age_group2
1    male         0 -24684          0
2    male         5 -24946          5
3    male        10 -26465         10
4    male        15 -16103         15
5    male        18 -11431         15
6    male        20  -6233         20
7    male        21  -6071         20
8    male        22 -14838         20
9    male        25 -27420         25
10   male        30 -26246         30
11   male        35 -24612         35
12   male        40 -22753         40
13   male        45 -23036         45
14   male        50 -24870         50
15   male        55 -27676         55
16   male        60 -11400         60
17   male        62 -14267         60
18   male        65  -9493         65
19   male        67 -12132         65
20   male        70 -17456         70
21   male        75 -10913         75
22   male        80  -7241         80
23   male        85  -5836         85
24 female         0  23322          0
25 female         5  23225          5
26 female        10  25521         10
27 female        15  15128         15
28 female        18  11388         15
29 female        20   5858         20
30 female        21   5300         20
31 female        22  15385         20

答案2

得分: 0

以下是翻译后的代码部分:

# 使用 `cut` 函数来创建每个年龄组的计数表,但首先需要将它们转换为数值型值:
table(cut(as.numeric(as.character(pop_pyramid$Age_group)), 
    breaks = seq(0, 100, 5),
    labels = paste0(seq(0, 95, 5), "-", seq(5, 100, 5)),
    include.lowest = TRUE))

输出:

   0-5   5-10  10-15  15-20  20-25  25-30  30-35  35-40  40-45  45-50  50-55  55-60  60-65  65-70  70-75  75-80  80-85  85-90  90-95 95-100 
     4      2      2      4      6      2      2      2      2      2      2      2      4      4      2      2      2      0      0      0 

如果您更愿意重新分类它们,您可以进行如下调整:

pop_pyramid$new_agecat <- cut(as.numeric(as.character(pop_pyramid$Age_group)), 
                              breaks = seq(0, 100, 5),
                              labels = paste0(seq(0, 95, 5), "-", seq(5, 100, 5)),
                              include.lowest = TRUE)

输出:

#       Sex Age_group    pop new_agecat
# 1    male         0 -24684        0-5
# 2    male         5 -24946        0-5
# 3    male        10 -26465       5-10
# 4    male        15 -16103      10-15
# 5    male        18 -11431      15-20

希望这对您有所帮助。

英文:

You can use cut to create a table of counts for each age group, but you will want to convert them to numeric values first:

table(cut(as.numeric(as.character(pop_pyramid$Age_group)), 
    breaks = seq(0, 100, 5),
    labels = paste0(seq(0, 95, 5),&quot;-&quot;, seq(5, 100, 5)),
    include.lowest = TRUE) )

Output:

   0-5   5-10  10-15  15-20  20-25  25-30  30-35  35-40  40-45  45-50  50-55  55-60  60-65  65-70  70-75  75-80  80-85  85-90  90-95 95-100 
     4      2      2      4      6      2      2      2      2      2      2      2      4      4      2      2      2      0      0      0 

If you would rather just recategorize them, you could tweak to do:

pop_pyramid$new_agecat &lt;- cut(as.numeric(as.character(pop_pyramid$Age_group)), 
                              breaks = seq(0, 100, 5),
                              labels = paste0(seq(0, 95, 5),&quot;-&quot;, seq(5, 100, 5)),
                              include.lowest = TRUE)

Output:

#       Sex Age_group    pop new_agecat
# 1    male         0 -24684        0-5
# 2    male         5 -24946        0-5
# 3    male        10 -26465       5-10
# 4    male        15 -16103      10-15
# 5    male        18 -11431      15-20

huangapple
  • 本文由 发表于 2023年6月30日 02:49:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76583870.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定