英文:
pivot_longer to create more than one variable
问题
pivot_test <- structure(list(year = 1991:1996, rec = c(1635970, 1269210, 927557,
1312280, 1387800, 1772490), teggStk = c(419648000, 411314000,
445784000, 434684000, 364483000, 686490000), tadultStk = c(84009.2,
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), tadult = c(84009.2,
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), teggs = c(419648000,
411314000, 445784000, 434684000, 364483000, 686490000), recStk = c(1635970,
1269210, 927557, 1312280, 1387800, 1772490)), row.names = c(NA,
6L), class = "data.frame)
annual21atest <- pivot_test %>%
pivot_longer(-c("year"),
names_to=c("stage","stock"),
names_pattern = "([A-Za-z]+)([A-Za-z]+)",
values_to="quantity") %>%
data.frame()
最终数据框应该包括4列:
year stage stock quantity
1991 rec recStk 1635970
1992 rec recStk 1269210
1993 rec recStk 927557
1994 rec recStk 1312280
1995 rec recStk 1387800
1996 rec recStk 1772490
1991 tegg teggStk 419648000
1992 tegg teggStk 411314000
1993 tegg teggStk 445784000
1994 tegg teggStk 434684000
1995 tegg teggStk 364483000
1996 tegg teggStk 686490000
1991 tadult tadultStk 84009.2
1992 tadult tadultStk 58228.6
1993 tadult tadultStk 65226.2
1994 tadult tadultStk 44371.9
1995 tadult tadultStk 65555.1
1996 tadult tadultStk 79610.6
1991 tadult tadultStk 84009.2
1992 tadult tadultStk 58228.6
1993 tadult tadultStk 65226.2
1994 tadult tadultStk 44371.9
1995 tadult tadultStk 65555.1
1996 tadult tadultStk 79610.6
1991 teggs teggsStk 419648000
1992 teggs teggsStk 411314000
1993 teggs teggsStk 445784000
1994 teggs teggsStk 434684000
1995 teggs teggsStk 364483000
1996 teggs teggsStk 686490000
1991 rec recStk 1635970
1992 rec recStk 1269210
1993 rec recStk 927557
1994 rec recStk 1312280
1995 rec recStk 1387800
1996 rec recStk 1772490
英文:
I am trying to use pivot_longer() to create two variables instead of one. My two variable
names should be "stage" and "stock" with their corresponding values. The "stage" column should
include rec, tadult and teggs. The "stock" column should have recStk, teggStk and tadultStk.
I have tried names_pattern below with a dummy example but I obviously don't know how to use patterns.
Could someone help with this? Thanks beforehand.
pivot_test <- structure(list(year = 1991:1996, rec = c(1635970, 1269210, 927557,
1312280, 1387800, 1772490), teggStk = c(419648000, 411314000,
445784000, 434684000, 364483000, 686490000), tadultStk = c(84009.2,
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), tadult = c(84009.2,
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), teggs = c(419648000,
411314000, 445784000, 434684000, 364483000, 686490000), recStk = c(1635970,
1269210, 927557, 1312280, 1387800, 1772490)), row.names = c(NA,
6L), class = "data.frame")
year rec teggStk tadultStk tadult teggs recStk
1 1991 1635970 419648000 84009.2 84009.2 419648000 1635970
2 1992 1269210 411314000 58228.6 58228.6 411314000 1269210
3 1993 927557 445784000 65226.2 65226.2 445784000 927557
4 1994 1312280 434684000 44371.9 44371.9 434684000 1312280
5 1995 1387800 364483000 65555.1 65555.1 364483000 1387800
6 1996 1772490 686490000 79610.6 79610.6 686490000 1772490
annual21atest <- pivot_test %>% pivot_longer(-c("year"),
names_to=c("stage","stock"),
names_pattern = "([A-Za-z]+)([A-Za-z]+)",
values_to="quantity") %>% data.frame()
The final dataframe should have 4 columns:
year stage stock quantity
1991 rec recStk 00000
1992 tadult tadultStk 00000
1993 teggs teggsStk 0000
...............................
答案1
得分: 1
以下是代码的翻译部分:
这里有一个可能的解决方案。问题有点复杂,因为您有舞台和库存的重复值,`pivot_longer` 不知道如何自动折叠它们。相反,我们首先删除重复值,然后进行一次调用:
pivot_test %>%
select(year, ends_with("Stk")) %>%
pivot_longer(-year, names_to = "stock", values_to = "quantity") %>%
mutate(stage=str_remove(stock, "Stk")) %>%
select(year, stage, stock, quantity)
这将返回:
# A tibble: 18 × 4
year stage stock quantity
<int> <chr> <chr> <dbl>
1 1991 tegg teggStk 419648000
2 1991 tadult tadultStk 84009.
3 1991 rec recStk 1635970
4 1992 tegg teggStk 411314000
5 1992 tadult tadultStk 58229.
6 1992 rec recStk 1269210
7 1993 tegg teggStk 445784000
8 1993 tadult tadultStk 65226.
9 1993 rec recStk 927557
10 1994 tegg teggStk 434684000
11 1994 tadult tadultStk 44372.
12 1994 rec recStk 1312280
13 1995 tegg teggStk 364483000
14 1995 tadult tadultStk 65555.
15 1995 rec recStk 1387800
16 1996 tegg teggStk 686490000
17 1996 tadult tadultStk 79611.
18 1996 rec recStk 1772490
英文:
Here's one possible solution. The problem gets a little funky because you've got the duplicated values for stage and stock which pivot_longer
doesn't know how to automatically collapse. Instead, we drop the duplicated values to start with and then it's a single call:
pivot_test %>%
select(year, ends_with("Stk")) %>%
pivot_longer(-year, names_to = "stock", values_to = "quantity") %>%
mutate(stage=str_remove(stock, "Stk")) %>%
select(year, stage, stock, quantity)
which returns
# A tibble: 18 × 4
year stage stock quantity
<int> <chr> <chr> <dbl>
1 1991 tegg teggStk 419648000
2 1991 tadult tadultStk 84009.
3 1991 rec recStk 1635970
4 1992 tegg teggStk 411314000
5 1992 tadult tadultStk 58229.
6 1992 rec recStk 1269210
7 1993 tegg teggStk 445784000
8 1993 tadult tadultStk 65226.
9 1993 rec recStk 927557
10 1994 tegg teggStk 434684000
11 1994 tadult tadultStk 44372.
12 1994 rec recStk 1312280
13 1995 tegg teggStk 364483000
14 1995 tadult tadultStk 65555.
15 1995 rec recStk 1387800
16 1996 tegg teggStk 686490000
17 1996 tadult tadultStk 79611.
18 1996 rec recStk 1772490
答案2
得分: 1
主要挑战是获取正确的命名模式:
library(tidyr)
library(dplyr)
pivot_test %>%
pivot_longer(cols = -year,
names_to = c("stage", "stock"),
names_pattern = "(.*)(Stk)",
values_to = "quantity") %>%
na.omit() %>%
mutate(stock = paste0(stage, stock))
year stage stock quantity
<int> <chr> <chr> <dbl>
1 1991 tegg teggStk 419648000
2 1991 tadult tadultStk 84009.
3 1991 rec recStk 1635970
4 1992 tegg teggStk 411314000
5 1992 tadult tadultStk 58229.
6 1992 rec recStk 1269210
7 1993 tegg teggStk 445784000
8 1993 tadult tadultStk 65226.
9 1993 rec recStk 927557
10 1994 tegg teggStk 434684000
11 1994 tadult tadultStk 44372.
12 1994 rec recStk 1312280
13 1995 tegg teggStk 364483000
14 1995 tadult tadultStk 65555.
15 1995 rec recStk 1387800
16 1996 tegg teggStk 686490000
17 1996 tadult tadultStk 79611.
18 1996 rec recStk 1772490
英文:
The main challenge is to get the correct names pattern:
library(tidyr)
library(dplyr)
pivot_test %>%
pivot_longer(cols = -year,
names_to = c("stage", "stock"),
names_pattern = "(.*)(Stk)",
values_to = "quantity") %>%
na.omit() %>%
mutate(stock = paste0(stage, stock))
year stage stock quantity
<int> <chr> <chr> <dbl>
1 1991 tegg teggStk 419648000
2 1991 tadult tadultStk 84009.
3 1991 rec recStk 1635970
4 1992 tegg teggStk 411314000
5 1992 tadult tadultStk 58229.
6 1992 rec recStk 1269210
7 1993 tegg teggStk 445784000
8 1993 tadult tadultStk 65226.
9 1993 rec recStk 927557
10 1994 tegg teggStk 434684000
11 1994 tadult tadultStk 44372.
12 1994 rec recStk 1312280
13 1995 tegg teggStk 364483000
14 1995 tadult tadultStk 65555.
15 1995 rec recStk 1387800
16 1996 tegg teggStk 686490000
17 1996 tadult tadultStk 79611.
18 1996 rec recStk 1772490
答案3
得分: 0
为什么不将对 'stock' 的 `tidyr::pivot_longer()` 调用嵌套到对 'stage' 的另一个 `tidyr::pivot_longer()` 调用中,反之亦然?我觉得这样看起来很整洁:
pivot_test <- structure(list(
year = 1991:1996,
rec = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490),
teggStk = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
tadultStk = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
tadult = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
teggs = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
recStk = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490)),
row.names = c(NA, 6L), class = 'data.frame')
annual21atest <-
tidyr::pivot_longer(pivot_test,
-c('year', 'recStk', 'teggStk', 'tadultStk'),
names_to= 'stage',
values_to = 'quantity') |
tidyr::pivot_longer(-c('year', 'stage'),
names_to= 'stock',
values_to = 'quantity') |
as.data.frame()
结果就是你想要的:
year stage stock quantity
1 1991 rec teggStk 419648000.0
2 1991 rec tadultStk 84009.2
3 1991 rec recStk 1635970.0
4 1991 rec quantity 1635970.0
5 1991 tadult teggStk 419648000.0
6 1991 tadult tadultStk 84009.2
7 1991 tadult recStk 1635970.0
8 1991 tadult quantity 84009.2
9 1991 teggs teggStk 419648000.0
10 1991 teggs tadultStk 84009.2
你也可以为了简洁性将函数封装成这样:
twostage.pivot_longer <- function(x, aggregated_names, aggregated_values,
skipped_columns){
# 第一次聚合
y <- tidyr::pivot_longer(x,
-c(skipped_columns[[1]]),
names_to= aggregated_names[1],
values_to = aggregated_values)
# 第二次聚合
y <- tidyr::pivot_longer(y,
-c(skipped_columns[[2]], aggregated_names[1]),
names_to= aggregated_names[2],
values_to = aggregated_values)
return(y)
}
然后像这样使用它:
annual21atest_fun <- twostage.pivot_longer(x = pivot_test,
aggregated_names = c('stage', 'stock'),
aggregated_values = 'quantity',
skipped_columns = list(
c('year', 'recStk', 'teggStk', 'tadultStk'),
'year'
))
<details>
<summary>英文:</summary>
Why don't you just pipe a call to `tidyr::pivot_longer()` for 'stock' into another call to `tidyr::pivot_longer()` for 'stage' or vice versa? It looks neat enough to me:
pivot_test <- structure(list(
year = 1991:1996,
rec = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490),
teggStk = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
tadultStk = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
tadult = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
teggs = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
recStk = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490)),
row.names = c(NA, 6L), class = 'data.frame')
annual21atest <-
tidyr::pivot_longer(pivot_test,
-c('year', 'recStk', 'teggStk', 'tadultStk'),
names_to= 'stage',
values_to = 'quantity')|>
tidyr::pivot_longer(-c('year', 'stage'),
names_to= 'stock',
values_to = 'quantity')|>
as.data.frame()
The result is what you asked for:
year stage stock quantity
1 1991 rec teggStk 419648000.0
2 1991 rec tadultStk 84009.2
3 1991 rec recStk 1635970.0
4 1991 rec quantity 1635970.0
5 1991 tadult teggStk 419648000.0
6 1991 tadult tadultStk 84009.2
7 1991 tadult recStk 1635970.0
8 1991 tadult quantity 84009.2
9 1991 teggs teggStk 419648000.0
10 1991 teggs tadultStk 84009.2
And you can also wrap the function like this for brevity:
twostage.pivot_longer <- function(x, aggregated_names, aggregated_values,
skipped_columns){
# First aggregation
y <- tidyr::pivot_longer(x,
-c(skipped_columns[[1]]),
names_to= aggregated_names[1],
values_to = aggregated_values)
# Second aggregation
y <- tidyr::pivot_longer(y,
-c(skipped_columns[[2]], aggregated_names[1]),
names_to= aggregated_names[2],
values_to = aggregated_values)
return(y)
}
and then use it like here:
annual21atest_fun <- twostage.pivot_longer(x = pivot_test,
aggregated_names = c('stage', 'stock'),
aggregated_values = 'quantity',
skipped_columns = list(
c('year', 'recStk', 'teggStk', 'tadultStk'),
'year'
))
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论