如何最好地按区域求和

huangapple go评论65阅读模式
英文:

How to best sum by area

问题

以下是示例数据。

library(tidyverse)

area <- c("003","003","003","003","003","003","003","003","017","017","017","017","017","017","017","017")
year <- c("2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022")
period <- c("01","01","01","01","02","02","02","02","01","01","01","01","02","02","02","02")
naics <- c("231","331","341","421","231","331","341","421","231","331","341","421","231","331","341","421")
m1 <- c(100,105,110,152,102,107,112,155,42,45,52,61,39,47,55,100)
m2 <- c(101,106,111,153,103,108,111,156,40,44,53,62,40,48,56,98)
m3 <- c(102,107,112,155,104,109,112,157,43,46,55,63,41,49,57,95)

first <- data.frame(area, year, period, naics, m1, m2, m3)

first <- first %>% group_by(area, year, period, naics) %>%
  mutate(avgemp = mean(m1:m3))

期望的目标是为每个年份、季度、naics 和地区的组合创建新行。这将是某种程度上的地区总计。新的naics将是000000(所有行业的naics)。我是否需要进行更长时间的旋转?

期望结果如下:

area  year  period  naics   m1   m2   m3   avgemp
003   2022    01    000000  467  471  476  471
003   2022    02    000000  476  478  482  479
017   2022    01    000000  200  199  207  202
以此类推....
英文:

Below is the sample data.

library(tidyverse)

  area &lt;- c(&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;)
  year &lt;- c(&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;)
  period &lt;- c(&quot;01&quot;,&quot;01&quot;,&quot;01&quot;,&quot;01&quot;,&quot;02&quot;,&quot;02&quot;,&quot;02&quot;,&quot;02&quot;,&quot;01&quot;,&quot;01&quot;,&quot;01&quot;,&quot;01&quot;,&quot;02&quot;,&quot;02&quot;,&quot;02&quot;,&quot;02&quot;)
  naics &lt;- c(&quot;231&quot;,&quot;331&quot;,&quot;341&quot;,&quot;421&quot;,&quot;231&quot;,&quot;331&quot;,&quot;341&quot;,&quot;421&quot;,&quot;231&quot;,&quot;331&quot;,&quot;341&quot;,&quot;421&quot;,&quot;231&quot;,&quot;331&quot;,&quot;341&quot;,&quot;421&quot;)
  m1 &lt;- c(100,105,110,152,102,107,112,155,42,45,52,61,39,47,55,100)
  m2 &lt;- c(101,106,111,153,103,108,111,156,40,44,53,62,40,48,56,98)
  m3 &lt;- c(102,107,112,155,104,109,112,157,43,46,55,63,41,49,57,95)


  first &lt;- data.frame(area,year,period, naics,m1,m2,m3)

  first &lt;- first %&gt;% group_by(area,year,qtr, naics) %&gt;% mutate (avgemp = mean(m1:m3))

The desired goal is to create a new row for each combination of year, qtr, naics, and area. This would be an area total of sorts. The new naics would 000000 (naics for total, all industries). do I have to do a pivot longer for this?

Desired result is below

  area      year      period     naics     m1     m2     m3     avgemp
  003       2022       01       000000    467    471    476      471
  003       2022       02       000000    476    478    482      479
  017       2022       01       000000    200    199    207      202
  and so on.... 

答案1

得分: 1

你可以通过对数据进行分组,然后创建一个naics = "000000"的新行来实现所需的结果:

first %>%
  group_by(area, year, period) %>%
  summarize(m1 = sum(m1),
            m2 = sum(m2),
            m3 = sum(m3),
            avgemp = sum(avgemp)) %>%
  mutate(naics = "000000")

输出:

# Groups:   area, year [2]
  area  year  period    m1    m2    m3 avgemp naics 
  <chr> <chr> <chr>  <dbl> <dbl> <dbl>  <dbl> <chr> 
1 003   2022  01       467   471   476   472. 000000
2 003   2022  02       476   478   482   479  000000
3 017   2022  01       200   199   207   204. 000000
4 017   2022  02       241   242   242   242. 000000
英文:

You can achieve the desired result by grouping the data and then you can create a new row with naics = "000000" to represent the total across industries:

first %&gt;%
  group_by(area, year, period) %&gt;%
  summarize(m1 = sum(m1),
            m2 = sum(m2),
            m3 = sum(m3),
            avgemp = sum(avgemp)) %&gt;%
  mutate(naics = &quot;000000&quot;)

Output:

# Groups:   area, year [2]
  area  year  period    m1    m2    m3 avgemp naics 
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;chr&gt; 
1 003   2022  01       467   471   476   472. 000000
2 003   2022  02       476   478   482   479  000000
3 017   2022  01       200   199   207   204. 000000
4 017   2022  02       241   242   242   242. 000000

答案2

得分: 0

first %>%
group_by(area, year, period) %>%
summarize(across(m1:m3, sum), .groups = "drop") %>%
rowwise() %>%
mutate(avgemp = mean(m1:m3), naics = "000000")

英文:
first %&gt;%
  group_by(area,year,period) %&gt;%
  summarize(across(m1:m3, sum), .groups = &quot;drop&quot;) %&gt;%
  rowwise() %&gt;%
  mutate(avgemp = mean(m1:m3), naics = &quot;000000&quot;) 

答案3

得分: 0

   area year period  m1  m2  m3   avgemp naics
1:  003 2022     01 467 471 476 471.3333 00000
2:  003 2022     02 476 478 482 478.6667 00000
3:  017 2022     01 200 199 207 202.0000 00000
4:  017 2022     02 241 242 242 241.6667 00000
英文:

Using data.table

library(data.table)
setDT(first)[, lapply(.SD, sum), by = .(area, year, period),
  .SDcols = patterns(&quot;^m\\d+&quot;)][, c(&quot;avgemp&quot;, &quot;naics&quot;) := .(rowMeans(.SD, 
    na.rm = TRUE), strrep(&quot;0&quot;, 5)), .SDcols = patterns(&quot;^m\\d+&quot;)][]

-output

   area year period  m1  m2  m3   avgemp naics
1:  003 2022     01 467 471 476 471.3333 00000
2:  003 2022     02 476 478 482 478.6667 00000
3:  017 2022     01 200 199 207 202.0000 00000
4:  017 2022     02 241 242 242 241.6667 00000

huangapple
  • 本文由 发表于 2023年2月14日 08:21:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75442382.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定