提取变量名到一列并创建长格式数据

huangapple go评论57阅读模式
英文:

Extract variable name into a column and Create Long format data

问题

我正在分析从鱼类消费调查中获得的数据。数据如下所示:

Harbor <- c("Harbor1","Harbor1","Harbor2","Harbor2")
dfish01_correctname_s1 <- c("Anchovy", "Croaker","Anchovy","Mackerel")
dfish01_qty_s1 <- c(3,NA,24,2)
dfish02_correctname_s1 <- c("Croaker", "Herring","Shrimp","False trevally")
dfish02_qty_s1 <- c(22, 12, 3, NA)
dfish03_correctname_s1 <- c("Anchovy", "Croaker","Anchovy","Mackerel")
dfish03_qty_s1 <- c(10, 9, 5, 7)

fishData <- data.frame(Harbor, dfish01_correctname_s1, 
dfish01_qty_s1, dfish02_correctname_s1, dfish02_qty_s1, 
dfish03_correctname_s1, dfish03_qty_s1)

在上述数据中,变量命名如下:

  1. Harbor 包含进行调查的地点的名称
  2. dfish01dfish02dfish03 表示调查受访者选择的鱼类选项。correctname 是鱼类物种的名称。
  3. qty 是受访者报告的鱼类数量。
  4. s1 表示这些数据来自调查的第一轮(还有两轮)。

我想将这些数据转换为长格式,并将鱼类选项编号提取到一个名为 FishOptionNum 的单独列中。感谢任何帮助。期望的输出如下图所示:

英文:

I am analyzing a dataset obtained from a survey of fish consumption. The data look like this:

Harbor &lt;- c(&quot;Harbor1&quot;,&quot;Harbor1&quot;,&quot;Harbor2&quot;,&quot;Harbor2&quot;)
dfish01_correctname_s1 &lt;- c(&quot;Anchovy&quot;, &quot;Croaker&quot;,&quot;Anchovy&quot;,&quot;Mackerel&quot;)
dfish01_qty_s1 &lt;- c(3,NA,24,2)
dfish02_correctname_s1 &lt;- c(&quot;Croaker&quot;, &quot;Herring&quot;,&quot;Shrimp&quot;,&quot;False trevally&quot;)
dfish02_qty_s1 &lt;- c(22, 12, 3, NA)
dfish03_correctname_s1 &lt;- c(&quot;Anchovy&quot;, &quot;Croaker&quot;,&quot;Anchovy&quot;,&quot;Mackerel&quot;)
dfish03_qty_s1 &lt;- c(10, 9, 5, 7)

fishData &lt;- data.frame(Harbor, dfish01_correctname_s1, 
dfish01_qty_s1, dfish02_correctname_s1, dfish02_qty_s1, 
dfish03_correctname_s1, dfish03_qty_s1)

提取变量名到一列并创建长格式数据

In the above data, the variables are named as follows:

  1. Harbor contains the names of the places where surveys were conducted
  2. dfish01, dfish02, and dfish03 denote the fish options chosen by the survey respondents. correctname is the name of the fish species.
  3. qty is the quantity of fish reported by the respondents.
  4. s1 indicates that this data is from the first round of the survey (There are two more rounds).

I want to put this data into the long format, and I want to extract the fish option number into a separate column named FishOptionNum. Any help is appreciated. Expected output is shown below:

提取变量名到一列并创建长格式数据

答案1

得分: 3

使用 tidyr::pivot_longer(),使用 names_patternnames_transform 从列名中提取鱼类选项编号:

library(tidyr)
library(dplyr)

fishData %>%
  pivot_longer(
    dfish01_correctname_s1:dfish03_qty_s1,
    names_to = c("FishOptionNum", ".value"),
    names_pattern = "dfish(\\d+)_(.+)_",
    names_transform = list(FishOptionNum = as.numeric)
  ) %>%
  select(Harbor, fishname = correctname, quantity = qty, FishOptionNum)
  

# A tibble: 12 × 4
   Harbor  fishname       quantity FishOptionNum
   <chr>   <chr>             <dbl>         <dbl>
 1 Harbor1 Anchovy               3             1
 2 Harbor1 Croaker              22             2
 3 Harbor1 Anchovy              10             3
 4 Harbor1 Croaker              NA             1
 5 Harbor1 Herring              12             2
 6 Harbor1 Croaker               9             3
 7 Harbor2 Anchovy              24             1
 8 Harbor2 Shrimp                3             2
 9 Harbor2 Anchovy               5             3
10 Harbor2 Macekrel              2             1
11 Harbor2 False trevally       NA             2
12 Harbor2 Mackerel              7             3
英文:

With tidyr::pivot_longer(), using names_pattern and names_transform to extract fish option number from column names:

library(tidyr)
library(dplyr)

fishData %&gt;%
  pivot_longer(
    dfish01_correctname_s1:dfish03_qty_s1,
    names_to = c(&quot;FishOptionNum&quot;, &quot;.value&quot;),
    names_pattern = &quot;dfish(\\d+)_(.+)_&quot;,
    names_transform = list(FishOptionNum = as.numeric)
  ) %&gt;%
  select(Harbor, fishname = correctname, quantity = qty, FishOptionNum)
  

# A tibble: 12 &#215; 4
   Harbor  fishname       quantity FishOptionNum
   &lt;chr&gt;   &lt;chr&gt;             &lt;dbl&gt;         &lt;dbl&gt;
 1 Harbor1 Anchovy               3             1
 2 Harbor1 Croaker              22             2
 3 Harbor1 Anchovy              10             3
 4 Harbor1 Croaker              NA             1
 5 Harbor1 Herring              12             2
 6 Harbor1 Croaker               9             3
 7 Harbor2 Anchovy              24             1
 8 Harbor2 Shrimp                3             2
 9 Harbor2 Anchovy               5             3
10 Harbor2 Macekrel              2             1
11 Harbor2 False trevally       NA             2
12 Harbor2 Mackerel              7             3

答案2

得分: 1

使用data.table

setDT(fishData)
melt(
  fishData, 
  measure.vars = patterns(fishname = 'dfish[0-9]{2}_correctname', quantity = '^dfish[0-9]{2}_q'), 
  variable.name = "FishOptionNum"
)[order(Harbor)]

Harbor FishOptionNum fishname quantity

1: Harbor1 1 Anchovy 3

2: Harbor1 1 Croaker NA

3: Harbor1 2 Croaker 22

4: Harbor1 2 Herring 12

5: Harbor1 3 Anchovy 10

6: Harbor1 3 Croaker 9

7: Harbor2 1 Anchovy 24

8: Harbor2 1 Macekrel 2

9: Harbor2 2 Shrimp 3

10: Harbor2 2 False trevally NA

11: Harbor2 3 Anchovy 5

12: Harbor2 3 Mackerel 7


<details>
<summary>英文:</summary>

With `data.table`:

    setDT(fishData)
    melt(
      fishData, 
      measure.vars = patterns(fishname = &#39;dfish[0-9]{2}_correctname&#39;, quantity = &#39;^dfish[0-9]{2}_q&#39;), 
      variable.name = &quot;FishOptionNum&quot;
    )[order(Harbor)]


    #      Harbor FishOptionNum       fishname quantity
    #      &lt;char&gt;        &lt;fctr&gt;         &lt;char&gt;    &lt;num&gt;
    #  1: Harbor1             1        Anchovy        3
    #  2: Harbor1             1        Croaker       NA
    #  3: Harbor1             2        Croaker       22
    #  4: Harbor1             2        Herring       12
    #  5: Harbor1             3        Anchovy       10
    #  6: Harbor1             3        Croaker        9
    #  7: Harbor2             1        Anchovy       24
    #  8: Harbor2             1       Macekrel        2
    #  9: Harbor2             2         Shrimp        3
    # 10: Harbor2             2 False trevally       NA
    # 11: Harbor2             3        Anchovy        5
    # 12: Harbor2             3       Mackerel        7



</details>



huangapple
  • 本文由 发表于 2023年2月23日 21:55:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75545760.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定