英文:
Extract variable name into a column and Create Long format data
问题
我正在分析从鱼类消费调查中获得的数据。数据如下所示:
Harbor <- c("Harbor1","Harbor1","Harbor2","Harbor2")
dfish01_correctname_s1 <- c("Anchovy", "Croaker","Anchovy","Mackerel")
dfish01_qty_s1 <- c(3,NA,24,2)
dfish02_correctname_s1 <- c("Croaker", "Herring","Shrimp","False trevally")
dfish02_qty_s1 <- c(22, 12, 3, NA)
dfish03_correctname_s1 <- c("Anchovy", "Croaker","Anchovy","Mackerel")
dfish03_qty_s1 <- c(10, 9, 5, 7)
fishData <- data.frame(Harbor, dfish01_correctname_s1,
dfish01_qty_s1, dfish02_correctname_s1, dfish02_qty_s1,
dfish03_correctname_s1, dfish03_qty_s1)
在上述数据中,变量命名如下:
Harbor
包含进行调查的地点的名称dfish01
,dfish02
和dfish03
表示调查受访者选择的鱼类选项。correctname
是鱼类物种的名称。qty
是受访者报告的鱼类数量。s1
表示这些数据来自调查的第一轮(还有两轮)。
我想将这些数据转换为长格式,并将鱼类选项编号提取到一个名为 FishOptionNum
的单独列中。感谢任何帮助。期望的输出如下图所示:
英文:
I am analyzing a dataset obtained from a survey of fish consumption. The data look like this:
Harbor <- c("Harbor1","Harbor1","Harbor2","Harbor2")
dfish01_correctname_s1 <- c("Anchovy", "Croaker","Anchovy","Mackerel")
dfish01_qty_s1 <- c(3,NA,24,2)
dfish02_correctname_s1 <- c("Croaker", "Herring","Shrimp","False trevally")
dfish02_qty_s1 <- c(22, 12, 3, NA)
dfish03_correctname_s1 <- c("Anchovy", "Croaker","Anchovy","Mackerel")
dfish03_qty_s1 <- c(10, 9, 5, 7)
fishData <- data.frame(Harbor, dfish01_correctname_s1,
dfish01_qty_s1, dfish02_correctname_s1, dfish02_qty_s1,
dfish03_correctname_s1, dfish03_qty_s1)
In the above data, the variables are named as follows:
Harbor
contains the names of the places where surveys were conducteddfish01
,dfish02
, anddfish03
denote the fish options chosen by the survey respondents.correctname
is the name of the fish species.qty
is the quantity of fish reported by the respondents.s1
indicates that this data is from the first round of the survey (There are two more rounds).
I want to put this data into the long format, and I want to extract the fish option number into a separate column named FishOptionNum
. Any help is appreciated. Expected output is shown below:
答案1
得分: 3
使用 tidyr::pivot_longer()
,使用 names_pattern
和 names_transform
从列名中提取鱼类选项编号:
library(tidyr)
library(dplyr)
fishData %>%
pivot_longer(
dfish01_correctname_s1:dfish03_qty_s1,
names_to = c("FishOptionNum", ".value"),
names_pattern = "dfish(\\d+)_(.+)_",
names_transform = list(FishOptionNum = as.numeric)
) %>%
select(Harbor, fishname = correctname, quantity = qty, FishOptionNum)
# A tibble: 12 × 4
Harbor fishname quantity FishOptionNum
<chr> <chr> <dbl> <dbl>
1 Harbor1 Anchovy 3 1
2 Harbor1 Croaker 22 2
3 Harbor1 Anchovy 10 3
4 Harbor1 Croaker NA 1
5 Harbor1 Herring 12 2
6 Harbor1 Croaker 9 3
7 Harbor2 Anchovy 24 1
8 Harbor2 Shrimp 3 2
9 Harbor2 Anchovy 5 3
10 Harbor2 Macekrel 2 1
11 Harbor2 False trevally NA 2
12 Harbor2 Mackerel 7 3
英文:
With tidyr::pivot_longer()
, using names_pattern
and names_transform
to extract fish option number from column names:
library(tidyr)
library(dplyr)
fishData %>%
pivot_longer(
dfish01_correctname_s1:dfish03_qty_s1,
names_to = c("FishOptionNum", ".value"),
names_pattern = "dfish(\\d+)_(.+)_",
names_transform = list(FishOptionNum = as.numeric)
) %>%
select(Harbor, fishname = correctname, quantity = qty, FishOptionNum)
# A tibble: 12 × 4
Harbor fishname quantity FishOptionNum
<chr> <chr> <dbl> <dbl>
1 Harbor1 Anchovy 3 1
2 Harbor1 Croaker 22 2
3 Harbor1 Anchovy 10 3
4 Harbor1 Croaker NA 1
5 Harbor1 Herring 12 2
6 Harbor1 Croaker 9 3
7 Harbor2 Anchovy 24 1
8 Harbor2 Shrimp 3 2
9 Harbor2 Anchovy 5 3
10 Harbor2 Macekrel 2 1
11 Harbor2 False trevally NA 2
12 Harbor2 Mackerel 7 3
答案2
得分: 1
使用data.table
:
setDT(fishData)
melt(
fishData,
measure.vars = patterns(fishname = 'dfish[0-9]{2}_correctname', quantity = '^dfish[0-9]{2}_q'),
variable.name = "FishOptionNum"
)[order(Harbor)]
Harbor FishOptionNum fishname quantity
1: Harbor1 1 Anchovy 3
2: Harbor1 1 Croaker NA
3: Harbor1 2 Croaker 22
4: Harbor1 2 Herring 12
5: Harbor1 3 Anchovy 10
6: Harbor1 3 Croaker 9
7: Harbor2 1 Anchovy 24
8: Harbor2 1 Macekrel 2
9: Harbor2 2 Shrimp 3
10: Harbor2 2 False trevally NA
11: Harbor2 3 Anchovy 5
12: Harbor2 3 Mackerel 7
<details>
<summary>英文:</summary>
With `data.table`:
setDT(fishData)
melt(
fishData,
measure.vars = patterns(fishname = 'dfish[0-9]{2}_correctname', quantity = '^dfish[0-9]{2}_q'),
variable.name = "FishOptionNum"
)[order(Harbor)]
# Harbor FishOptionNum fishname quantity
# <char> <fctr> <char> <num>
# 1: Harbor1 1 Anchovy 3
# 2: Harbor1 1 Croaker NA
# 3: Harbor1 2 Croaker 22
# 4: Harbor1 2 Herring 12
# 5: Harbor1 3 Anchovy 10
# 6: Harbor1 3 Croaker 9
# 7: Harbor2 1 Anchovy 24
# 8: Harbor2 1 Macekrel 2
# 9: Harbor2 2 Shrimp 3
# 10: Harbor2 2 False trevally NA
# 11: Harbor2 3 Anchovy 5
# 12: Harbor2 3 Mackerel 7
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论