pivot_longer 以创建多个变量。

huangapple go评论64阅读模式
英文:

pivot_longer to create more than one variable

问题

pivot_test <- structure(list(year = 1991:1996, rec = c(1635970, 1269210, 927557, 
1312280, 1387800, 1772490), teggStk = c(419648000, 411314000, 
445784000, 434684000, 364483000, 686490000), tadultStk = c(84009.2, 
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), tadult = c(84009.2, 
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), teggs = c(419648000, 
411314000, 445784000, 434684000, 364483000, 686490000), recStk = c(1635970, 
1269210, 927557, 1312280, 1387800, 1772490)), row.names = c(NA, 
6L), class = "data.frame)

annual21atest <- pivot_test %>%
  pivot_longer(-c("year"),
               names_to=c("stage","stock"),
               names_pattern = "([A-Za-z]+)([A-Za-z]+)",
               values_to="quantity") %>%
  data.frame()

最终数据框应该包括4列:

year stage   stock      quantity
1991 rec     recStk     1635970
1992 rec     recStk     1269210
1993 rec     recStk     927557
1994 rec     recStk     1312280
1995 rec     recStk     1387800
1996 rec     recStk     1772490
1991 tegg    teggStk    419648000
1992 tegg    teggStk    411314000
1993 tegg    teggStk    445784000
1994 tegg    teggStk    434684000
1995 tegg    teggStk    364483000
1996 tegg    teggStk    686490000
1991 tadult  tadultStk  84009.2
1992 tadult  tadultStk  58228.6
1993 tadult  tadultStk  65226.2
1994 tadult  tadultStk  44371.9
1995 tadult  tadultStk  65555.1
1996 tadult  tadultStk  79610.6
1991 tadult  tadultStk  84009.2
1992 tadult  tadultStk  58228.6
1993 tadult  tadultStk  65226.2
1994 tadult  tadultStk  44371.9
1995 tadult  tadultStk  65555.1
1996 tadult  tadultStk  79610.6
1991 teggs   teggsStk   419648000
1992 teggs   teggsStk   411314000
1993 teggs   teggsStk   445784000
1994 teggs   teggsStk   434684000
1995 teggs   teggsStk   364483000
1996 teggs   teggsStk   686490000
1991 rec     recStk     1635970
1992 rec     recStk     1269210
1993 rec     recStk     927557
1994 rec     recStk     1312280
1995 rec     recStk     1387800
1996 rec     recStk     1772490
英文:

I am trying to use pivot_longer() to create two variables instead of one. My two variable
names should be "stage" and "stock" with their corresponding values. The "stage" column should
include rec, tadult and teggs. The "stock" column should have recStk, teggStk and tadultStk.
I have tried names_pattern below with a dummy example but I obviously don't know how to use patterns.
Could someone help with this? Thanks beforehand.

pivot_test &lt;- structure(list(year = 1991:1996, rec = c(1635970, 1269210, 927557, 
1312280, 1387800, 1772490), teggStk = c(419648000, 411314000, 
445784000, 434684000, 364483000, 686490000), tadultStk = c(84009.2, 
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), tadult = c(84009.2, 
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), teggs = c(419648000, 
411314000, 445784000, 434684000, 364483000, 686490000), recStk = c(1635970, 
1269210, 927557, 1312280, 1387800, 1772490)), row.names = c(NA, 
6L), class = &quot;data.frame&quot;)

  year     rec   teggStk tadultStk  tadult     teggs  recStk
1 1991 1635970 419648000   84009.2 84009.2 419648000 1635970
2 1992 1269210 411314000   58228.6 58228.6 411314000 1269210
3 1993  927557 445784000   65226.2 65226.2 445784000  927557
4 1994 1312280 434684000   44371.9 44371.9 434684000 1312280
5 1995 1387800 364483000   65555.1 65555.1 364483000 1387800
6 1996 1772490 686490000   79610.6 79610.6 686490000 1772490

annual21atest &lt;- pivot_test %&gt;% pivot_longer(-c(&quot;year&quot;),
                                names_to=c(&quot;stage&quot;,&quot;stock&quot;),
                                names_pattern = &quot;([A-Za-z]+)([A-Za-z]+)&quot;,
                                values_to=&quot;quantity&quot;) %&gt;% data.frame()
                                
  The final dataframe should have 4 columns:
  year stage   stock      quantity
  1991 rec     recStk     00000
  1992 tadult  tadultStk  00000
  1993 teggs   teggsStk   0000
  ............................... 

答案1

得分: 1

以下是代码的翻译部分:

这里有一个可能的解决方案。问题有点复杂,因为您有舞台和库存的重复值,`pivot_longer` 不知道如何自动折叠它们。相反,我们首先删除重复值,然后进行一次调用:

pivot_test %>%
  select(year, ends_with("Stk")) %>%
  pivot_longer(-year, names_to = "stock", values_to = "quantity") %>%
  mutate(stage=str_remove(stock, "Stk")) %>%
  select(year, stage, stock, quantity)

这将返回:

# A tibble: 18 × 4
    year stage  stock       quantity
   <int> <chr>  <chr>          <dbl>
 1  1991 tegg   teggStk   419648000 
 2  1991 tadult tadultStk     84009.
 3  1991 rec    recStk      1635970 
 4  1992 tegg   teggStk   411314000 
 5  1992 tadult tadultStk     58229.
 6  1992 rec    recStk      1269210 
 7  1993 tegg   teggStk   445784000 
 8  1993 tadult tadultStk     65226.
 9  1993 rec    recStk       927557 
10  1994 tegg   teggStk   434684000 
11  1994 tadult tadultStk     44372.
12  1994 rec    recStk      1312280 
13  1995 tegg   teggStk   364483000 
14  1995 tadult tadultStk     65555.
15  1995 rec    recStk      1387800 
16  1996 tegg   teggStk   686490000 
17  1996 tadult tadultStk     79611.
18  1996 rec    recStk      1772490 
英文:

Here's one possible solution. The problem gets a little funky because you've got the duplicated values for stage and stock which pivot_longer doesn't know how to automatically collapse. Instead, we drop the duplicated values to start with and then it's a single call:

pivot_test %&gt;%
  select(year, ends_with(&quot;Stk&quot;)) %&gt;%
  pivot_longer(-year, names_to = &quot;stock&quot;, values_to = &quot;quantity&quot;) %&gt;%
  mutate(stage=str_remove(stock, &quot;Stk&quot;)) %&gt;%
  select(year, stage, stock, quantity)

which returns

# A tibble: 18 &#215; 4
    year stage  stock       quantity
   &lt;int&gt; &lt;chr&gt;  &lt;chr&gt;          &lt;dbl&gt;
 1  1991 tegg   teggStk   419648000 
 2  1991 tadult tadultStk     84009.
 3  1991 rec    recStk      1635970 
 4  1992 tegg   teggStk   411314000 
 5  1992 tadult tadultStk     58229.
 6  1992 rec    recStk      1269210 
 7  1993 tegg   teggStk   445784000 
 8  1993 tadult tadultStk     65226.
 9  1993 rec    recStk       927557 
10  1994 tegg   teggStk   434684000 
11  1994 tadult tadultStk     44372.
12  1994 rec    recStk      1312280 
13  1995 tegg   teggStk   364483000 
14  1995 tadult tadultStk     65555.
15  1995 rec    recStk      1387800 
16  1996 tegg   teggStk   686490000 
17  1996 tadult tadultStk     79611.
18  1996 rec    recStk      1772490 

答案2

得分: 1

主要挑战是获取正确的命名模式:

library(tidyr)
library(dplyr)
  
pivot_test %>%
    pivot_longer(cols = -year, 
                 names_to = c("stage", "stock"), 
                 names_pattern = "(.*)(Stk)", 
                 values_to = "quantity") %>%
  na.omit() %>%
  mutate(stock = paste0(stage, stock))
    year stage  stock       quantity
   <int> <chr>  <chr>          <dbl>
 1  1991 tegg   teggStk   419648000 
 2  1991 tadult tadultStk     84009.
 3  1991 rec    recStk      1635970 
 4  1992 tegg   teggStk   411314000 
 5  1992 tadult tadultStk     58229.
 6  1992 rec    recStk      1269210 
 7  1993 tegg   teggStk   445784000 
 8  1993 tadult tadultStk     65226.
 9  1993 rec    recStk       927557 
10  1994 tegg   teggStk   434684000 
11  1994 tadult tadultStk     44372.
12  1994 rec    recStk      1312280 
13  1995 tegg   teggStk   364483000 
14  1995 tadult tadultStk     65555.
15  1995 rec    recStk      1387800 
16  1996 tegg   teggStk   686490000 
17  1996 tadult tadultStk     79611.
18  1996 rec    recStk      1772490 
英文:

The main challenge is to get the correct names pattern:

library(tidyr)
library(dplyr)
  
pivot_test %&gt;% 
    pivot_longer(cols = -year, 
                 names_to = c(&quot;stage&quot;, &quot;stock&quot;), 
                 names_pattern = &quot;(.*)(Stk)&quot;, 
                 values_to = &quot;quantity&quot;) %&gt;% 
  na.omit() %&gt;% 
  mutate(stock = paste0(stage, stock))
    year stage  stock       quantity
   &lt;int&gt; &lt;chr&gt;  &lt;chr&gt;          &lt;dbl&gt;
 1  1991 tegg   teggStk   419648000 
 2  1991 tadult tadultStk     84009.
 3  1991 rec    recStk      1635970 
 4  1992 tegg   teggStk   411314000 
 5  1992 tadult tadultStk     58229.
 6  1992 rec    recStk      1269210 
 7  1993 tegg   teggStk   445784000 
 8  1993 tadult tadultStk     65226.
 9  1993 rec    recStk       927557 
10  1994 tegg   teggStk   434684000 
11  1994 tadult tadultStk     44372.
12  1994 rec    recStk      1312280 
13  1995 tegg   teggStk   364483000 
14  1995 tadult tadultStk     65555.
15  1995 rec    recStk      1387800 
16  1996 tegg   teggStk   686490000 
17  1996 tadult tadultStk     79611.
18  1996 rec    recStk      1772490 

答案3

得分: 0

为什么不将对 'stock' 的 `tidyr::pivot_longer()` 调用嵌套到对 'stage' 的另一个 `tidyr::pivot_longer()` 调用中,反之亦然?我觉得这样看起来很整洁:

pivot_test <- structure(list(
  year = 1991:1996,
  rec = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490),
  teggStk = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
  tadultStk = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
  tadult = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
  teggs = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
  recStk = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490)),
  row.names = c(NA, 6L), class = 'data.frame')

annual21atest <- 
  tidyr::pivot_longer(pivot_test,
                      -c('year', 'recStk', 'teggStk', 'tadultStk'),
                      names_to= 'stage',
                      values_to = 'quantity') |
  tidyr::pivot_longer(-c('year', 'stage'),
                      names_to= 'stock',
                      values_to = 'quantity') |
  as.data.frame()

结果就是你想要的:

   year  stage     stock    quantity
1  1991    rec   teggStk 419648000.0
2  1991    rec tadultStk     84009.2
3  1991    rec    recStk   1635970.0
4  1991    rec  quantity   1635970.0
5  1991 tadult   teggStk 419648000.0
6  1991 tadult tadultStk     84009.2
7  1991 tadult    recStk   1635970.0
8  1991 tadult  quantity     84009.2
9  1991  teggs   teggStk 419648000.0
10 1991  teggs tadultStk     84009.2

你也可以为了简洁性将函数封装成这样:

twostage.pivot_longer <- function(x, aggregated_names, aggregated_values,
                                  skipped_columns){
  # 第一次聚合
  y <- tidyr::pivot_longer(x,
                           -c(skipped_columns[[1]]),
                      names_to= aggregated_names[1],
                      values_to = aggregated_values)
  # 第二次聚合
  y <- tidyr::pivot_longer(y,
                           -c(skipped_columns[[2]], aggregated_names[1]),
                           names_to= aggregated_names[2],
                           values_to = aggregated_values)
  return(y)
}

然后像这样使用它:

annual21atest_fun <- twostage.pivot_longer(x = pivot_test,
                                       aggregated_names = c('stage', 'stock'),
                                       aggregated_values = 'quantity',
                                       skipped_columns = list(
                                         c('year', 'recStk', 'teggStk', 'tadultStk'),
                                         'year'
                                         ))

<details>
<summary>英文:</summary>

Why don&#39;t you just pipe a call to `tidyr::pivot_longer()` for &#39;stock&#39; into another call to `tidyr::pivot_longer()` for &#39;stage&#39; or vice versa? It looks neat enough to me:

pivot_test &lt;- structure(list(
  year = 1991:1996,
  rec = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490),
  teggStk = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
  tadultStk = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
  tadult = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
  teggs = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
  recStk = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490)),
  row.names = c(NA, 6L), class = &#39;data.frame&#39;)

annual21atest &lt;- 
  tidyr::pivot_longer(pivot_test,
                      -c(&#39;year&#39;, &#39;recStk&#39;, &#39;teggStk&#39;, &#39;tadultStk&#39;),
                      names_to= &#39;stage&#39;,
                      values_to = &#39;quantity&#39;)|&gt;
  tidyr::pivot_longer(-c(&#39;year&#39;, &#39;stage&#39;),
                      names_to= &#39;stock&#39;,
                      values_to = &#39;quantity&#39;)|&gt;
  as.data.frame()

The result is what you asked for:

   year  stage     stock    quantity
1  1991    rec   teggStk 419648000.0
2  1991    rec tadultStk     84009.2
3  1991    rec    recStk   1635970.0
4  1991    rec  quantity   1635970.0
5  1991 tadult   teggStk 419648000.0
6  1991 tadult tadultStk     84009.2
7  1991 tadult    recStk   1635970.0
8  1991 tadult  quantity     84009.2
9  1991  teggs   teggStk 419648000.0
10 1991  teggs tadultStk     84009.2

And you can also wrap the function like this for brevity:

twostage.pivot_longer &lt;- function(x, aggregated_names, aggregated_values,
                                  skipped_columns){
  # First aggregation
  y &lt;- tidyr::pivot_longer(x,
                           -c(skipped_columns[[1]]),
                      names_to= aggregated_names[1],
                      values_to = aggregated_values)
  # Second aggregation
  y &lt;- tidyr::pivot_longer(y,
                           -c(skipped_columns[[2]], aggregated_names[1]),
                           names_to= aggregated_names[2],
                           values_to = aggregated_values)
  return(y)
}

and then use it like here:

annual21atest_fun &lt;- twostage.pivot_longer(x = pivot_test,
                                       aggregated_names = c(&#39;stage&#39;, &#39;stock&#39;),
                                       aggregated_values = &#39;quantity&#39;,
                                       skipped_columns = list(
                                         c(&#39;year&#39;, &#39;recStk&#39;, &#39;teggStk&#39;, &#39;tadultStk&#39;),
                                         &#39;year&#39;
                                         ))

</details>



huangapple
  • 本文由 发表于 2023年3月10日 01:21:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75688012.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定