2023年2月14日 08:21:09go评论94阅读模式

英文:

How to best sum by area

问题

以下是示例数据。

library(tidyverse)
area <- c("003","003","003","003","003","003","003","003","017","017","017","017","017","017","017","017")
year <- c("2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022","2022")
period <- c("01","01","01","01","02","02","02","02","01","01","01","01","02","02","02","02")
naics <- c("231","331","341","421","231","331","341","421","231","331","341","421","231","331","341","421")
m1 <- c(100,105,110,152,102,107,112,155,42,45,52,61,39,47,55,100)
m2 <- c(101,106,111,153,103,108,111,156,40,44,53,62,40,48,56,98)
m3 <- c(102,107,112,155,104,109,112,157,43,46,55,63,41,49,57,95)
first <- data.frame(area, year, period, naics, m1, m2, m3)
first <- first %>% group_by(area, year, period, naics) %>%
  mutate(avgemp = mean(m1:m3))

期望的目标是为每个年份、季度、naics 和地区的组合创建新行。这将是某种程度上的地区总计。新的naics将是000000（所有行业的naics）。我是否需要进行更长时间的旋转？

期望结果如下：

area  year  period  naics   m1   m2   m3   avgemp
003   2022    01    000000  467  471  476  471
003   2022    02    000000  476  478  482  479
017   2022    01    000000  200  199  207  202
以此类推....

英文:

Below is the sample data.

library(tidyverse)
  area &lt;- c(&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;003&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;,&quot;017&quot;)
  year &lt;- c(&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;,&quot;2022&quot;)
  period &lt;- c(&quot;01&quot;,&quot;01&quot;,&quot;01&quot;,&quot;01&quot;,&quot;02&quot;,&quot;02&quot;,&quot;02&quot;,&quot;02&quot;,&quot;01&quot;,&quot;01&quot;,&quot;01&quot;,&quot;01&quot;,&quot;02&quot;,&quot;02&quot;,&quot;02&quot;,&quot;02&quot;)
  naics &lt;- c(&quot;231&quot;,&quot;331&quot;,&quot;341&quot;,&quot;421&quot;,&quot;231&quot;,&quot;331&quot;,&quot;341&quot;,&quot;421&quot;,&quot;231&quot;,&quot;331&quot;,&quot;341&quot;,&quot;421&quot;,&quot;231&quot;,&quot;331&quot;,&quot;341&quot;,&quot;421&quot;)
  m1 &lt;- c(100,105,110,152,102,107,112,155,42,45,52,61,39,47,55,100)
  m2 &lt;- c(101,106,111,153,103,108,111,156,40,44,53,62,40,48,56,98)
  m3 &lt;- c(102,107,112,155,104,109,112,157,43,46,55,63,41,49,57,95)
  first &lt;- data.frame(area,year,period, naics,m1,m2,m3)
  first &lt;- first %&gt;% group_by(area,year,qtr, naics) %&gt;% mutate (avgemp = mean(m1:m3))

The desired goal is to create a new row for each combination of year, qtr, naics, and area. This would be an area total of sorts. The new naics would 000000 (naics for total, all industries). do I have to do a pivot longer for this?

Desired result is below

  area      year      period     naics     m1     m2     m3     avgemp
  003       2022       01       000000    467    471    476      471
  003       2022       02       000000    476    478    482      479
  017       2022       01       000000    200    199    207      202
  and so on....

答案1

得分: 1

你可以通过对数据进行分组，然后创建一个naics = "000000"的新行来实现所需的结果：

first %>%
  group_by(area, year, period) %>%
  summarize(m1 = sum(m1),
            m2 = sum(m2),
            m3 = sum(m3),
            avgemp = sum(avgemp)) %>%
  mutate(naics = "000000")

输出：

# Groups:   area, year [2]
  area  year  period    m1    m2    m3 avgemp naics 
  <chr> <chr> <chr>  <dbl> <dbl> <dbl>  <dbl> <chr> 
1 003   2022  01       467   471   476   472. 000000
2 003   2022  02       476   478   482   479  000000
3 017   2022  01       200   199   207   204. 000000
4 017   2022  02       241   242   242   242. 000000

英文:

You can achieve the desired result by grouping the data and then you can create a new row with naics = "000000" to represent the total across industries:

first %&gt;%
  group_by(area, year, period) %&gt;%
  summarize(m1 = sum(m1),
            m2 = sum(m2),
            m3 = sum(m3),
            avgemp = sum(avgemp)) %&gt;%
  mutate(naics = &quot;000000&quot;)

Output:

# Groups:   area, year [2]
  area  year  period    m1    m2    m3 avgemp naics 
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;chr&gt; 
1 003   2022  01       467   471   476   472. 000000
2 003   2022  02       476   478   482   479  000000
3 017   2022  01       200   199   207   204. 000000
4 017   2022  02       241   242   242   242. 000000

答案2

得分: 0

first %>%
group_by(area, year, period) %>%
summarize(across(m1:m3, sum), .groups = "drop") %>%
rowwise() %>%
mutate(avgemp = mean(m1:m3), naics = "000000")

英文:

first %&gt;%
  group_by(area,year,period) %&gt;%
  summarize(across(m1:m3, sum), .groups = &quot;drop&quot;) %&gt;%
  rowwise() %&gt;%
  mutate(avgemp = mean(m1:m3), naics = &quot;000000&quot;)

答案3

得分: 0

   area year period  m1  m2  m3   avgemp naics
1:  003 2022     01 467 471 476 471.3333 00000
2:  003 2022     02 476 478 482 478.6667 00000
3:  017 2022     01 200 199 207 202.0000 00000
4:  017 2022     02 241 242 242 241.6667 00000

英文:

Using data.table

library(data.table)
setDT(first)[, lapply(.SD, sum), by = .(area, year, period),
  .SDcols = patterns(&quot;^m\\d+&quot;)][, c(&quot;avgemp&quot;, &quot;naics&quot;) := .(rowMeans(.SD, 
    na.rm = TRUE), strrep(&quot;0&quot;, 5)), .SDcols = patterns(&quot;^m\\d+&quot;)][]

-output

   area year period  m1  m2  m3   avgemp naics
1:  003 2022     01 467 471 476 471.3333 00000
2:  003 2022     02 476 478 482 478.6667 00000
3:  017 2022     01 200 199 207 202.0000 00000
4:  017 2022     02 241 242 242 241.6667 00000

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何最好地按区域求和

问题

答案1

答案2

答案3

PostgreSQL表格转换

如何从glmer()拟合中访问Fisher权重矩阵W？

你可以使用以下方法在基本R中找到对象中给定值的索引：

“Partial modification of layout in igraph” 的中文翻译是 “igraph 中布局的部分修改”。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。