英文:
How to best sum by area
问题
以下是示例数据。
library(tidyverse)
area <- c("003","003","003","003","003","003","003","003","017","017","017","017","017","017","017","017")
year <- c("2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022")
period <- c("01","01","01","01","02","02","02","02","01","01","01","01","02","02","02","02")
naics <- c("231","331","341","421","231","331","341","421","231","331","341","421","231","331","341","421")
m1 <- c(100,105,110,152,102,107,112,155,42,45,52,61,39,47,55,100)
m2 <- c(101,106,111,153,103,108,111,156,40,44,53,62,40,48,56,98)
m3 <- c(102,107,112,155,104,109,112,157,43,46,55,63,41,49,57,95)
first <- data.frame(area, year, period, naics, m1, m2, m3)
first <- first %>% group_by(area, year, period, naics) %>%
mutate(avgemp = mean(m1:m3))
期望的目标是为每个年份、季度、naics 和地区的组合创建新行。这将是某种程度上的地区总计。新的naics将是000000(所有行业的naics)。我是否需要进行更长时间的旋转?
期望结果如下:
area year period naics m1 m2 m3 avgemp
003 2022 01 000000 467 471 476 471
003 2022 02 000000 476 478 482 479
017 2022 01 000000 200 199 207 202
以此类推....
英文:
Below is the sample data.
library(tidyverse)
area <- c("003","003","003","003","003","003","003","003","017","017","017","017","017","017","017","017")
year <- c("2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022")
period <- c("01","01","01","01","02","02","02","02","01","01","01","01","02","02","02","02")
naics <- c("231","331","341","421","231","331","341","421","231","331","341","421","231","331","341","421")
m1 <- c(100,105,110,152,102,107,112,155,42,45,52,61,39,47,55,100)
m2 <- c(101,106,111,153,103,108,111,156,40,44,53,62,40,48,56,98)
m3 <- c(102,107,112,155,104,109,112,157,43,46,55,63,41,49,57,95)
first <- data.frame(area,year,period, naics,m1,m2,m3)
first <- first %>% group_by(area,year,qtr, naics) %>% mutate (avgemp = mean(m1:m3))
The desired goal is to create a new row for each combination of year, qtr, naics, and area. This would be an area total of sorts. The new naics would 000000 (naics for total, all industries). do I have to do a pivot longer for this?
Desired result is below
area year period naics m1 m2 m3 avgemp
003 2022 01 000000 467 471 476 471
003 2022 02 000000 476 478 482 479
017 2022 01 000000 200 199 207 202
and so on....
答案1
得分: 1
你可以通过对数据进行分组,然后创建一个naics = "000000"的新行来实现所需的结果:
first %>%
group_by(area, year, period) %>%
summarize(m1 = sum(m1),
m2 = sum(m2),
m3 = sum(m3),
avgemp = sum(avgemp)) %>%
mutate(naics = "000000")
输出:
# Groups: area, year [2]
area year period m1 m2 m3 avgemp naics
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 003 2022 01 467 471 476 472. 000000
2 003 2022 02 476 478 482 479 000000
3 017 2022 01 200 199 207 204. 000000
4 017 2022 02 241 242 242 242. 000000
英文:
You can achieve the desired result by grouping the data and then you can create a new row with naics = "000000" to represent the total across industries:
first %>%
group_by(area, year, period) %>%
summarize(m1 = sum(m1),
m2 = sum(m2),
m3 = sum(m3),
avgemp = sum(avgemp)) %>%
mutate(naics = "000000")
Output:
# Groups: area, year [2]
area year period m1 m2 m3 avgemp naics
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 003 2022 01 467 471 476 472. 000000
2 003 2022 02 476 478 482 479 000000
3 017 2022 01 200 199 207 204. 000000
4 017 2022 02 241 242 242 242. 000000
答案2
得分: 0
first %>%
group_by(area, year, period) %>%
summarize(across(m1:m3, sum), .groups = "drop") %>%
rowwise() %>%
mutate(avgemp = mean(m1:m3), naics = "000000")
英文:
first %>%
group_by(area,year,period) %>%
summarize(across(m1:m3, sum), .groups = "drop") %>%
rowwise() %>%
mutate(avgemp = mean(m1:m3), naics = "000000")
答案3
得分: 0
area year period m1 m2 m3 avgemp naics
1: 003 2022 01 467 471 476 471.3333 00000
2: 003 2022 02 476 478 482 478.6667 00000
3: 017 2022 01 200 199 207 202.0000 00000
4: 017 2022 02 241 242 242 241.6667 00000
英文:
Using data.table
library(data.table)
setDT(first)[, lapply(.SD, sum), by = .(area, year, period),
.SDcols = patterns("^m\\d+")][, c("avgemp", "naics") := .(rowMeans(.SD,
na.rm = TRUE), strrep("0", 5)), .SDcols = patterns("^m\\d+")][]
-output
area year period m1 m2 m3 avgemp naics
1: 003 2022 01 467 471 476 471.3333 00000
2: 003 2022 02 476 478 482 478.6667 00000
3: 017 2022 01 200 199 207 202.0000 00000
4: 017 2022 02 241 242 242 241.6667 00000
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论