将R中的数据框从宽格式转换为长格式,使用多组变量。

huangapple go评论64阅读模式
英文:

Reshaping a dataframe from wide to long format in R using multiple sets of variables

问题

Sure, here's the translation of the code and your request:

我有一个宽格式的数据集,包含了多次调查的参与者信息,包括他们的国家、性别、访谈时的年龄以及每次调查的年份和是否参与。

这是三位参与者信息的样本:

# 数据集
df <- data.frame(
id = c(1,2,3),
country = c("UK", "Spain", "Sweden"),
gender = c(1, 1, 2),
interview_w1 = c(1, 2, 2),
interview_w2 = c(2, 2, 2),
interview_w3 = c(1, 1, 1),
int_year_w1 = c(2007, 2008, 2007),
int_year_w2 = c(2010, 2009, 2010),
int_year_w3 = c(2012, 2012, 2013),
age_int_w1 = c(60, 40, 50),
age_int_w2 = c(63, 41, 53),
age_int_w3 = c(65, 44, 56)
)

我想使用R中的pivot_longer()函数将这个数据集转换为长格式。然而,我在实现期望结果方面遇到了困难。具体来说,我想要将以'interview_'、'int_year_'和'age_int_'开头的列进行转换。

这是一个显示期望结果的表格:

   id country gender wave interview year age
1  1      UK      1   w1         1 2007  60
2  2   Spain      1   w1         2 2008  40
3  3  Sweden      2   w1         2 2007  50
4  1      UK      1   w2         2 2010  63
5  2   Spain      1   w2         2 2009  41
6  3  Sweden      2   w2         2 2010  53
7  1      UK      1   w3         1 2012  65
8  2   Spain      1   w3         1 2012  44
9  3  Sweden      2   w3         1 2013  56

请问是否可以提供关于如何实现这一转换的指导?我尝试使用pivot_longer()中的names_tonames_pattern参数,但没有成功,因为我不完全理解它们的工作原理。

英文:

I have a dataset in wide format with participants' information for multiple waves of a survey, including their country, gender, age at interview, and the year and whether they participated in each wave of a survey.

Here is a sample of the information of three participants:

#Dataset
df &lt;- data.frame(
id = c(1,2,3),
country = c(&quot;UK&quot;, &quot;Spain&quot;, &quot;Sweden&quot;),
gender = c(1, 1, 2),
interview_w1 = c(1, 2, 2),
interview_w2 = c(2, 2, 2),
interview_w3 = c(1, 1, 1),
int_year_w1 = c(2007, 2008, 2007),
int_year_w2 = c(2010, 2009, 2010),
int_year_w3 = c(2012, 2012, 2013),
age_int_w1 = c(60, 40, 50),
age_int_w2 = c(63, 41, 53),
age_int_w3 = c(65, 44, 56)
)

I want to convert this dataset to long format using the pivot_longer() function in R. However, I am having difficulty achieving the desired result. Specifically, I want to pivot the columns starting with 'interview_', 'int_year_' and 'age_int_'.

Here is a table showing the desired result:

     id country gender wave  interview  year   age
  &lt;dbl&gt; &lt;chr&gt;    &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     1 UK           1 w1            1  2007    60
2     2 Spain        1 w1            2  2008    40
3     3 Sweden       2 w1            2  2007    50
4     1 UK           1 w2            2  2010    63
5     2 Spain        1 w2            2  2009    41
6     3 Sweden       2 w2            2  2010    53
7     1 UK           1 w3            1  2012    65
8     2 Spain        1 w3            1  2012    44
9     3 Sweden       2 w3            1  2013    56

Can someone please provide guidance on how to do that?

I tried using the names_to and names_pattern arguments in pivot_longer() without success as I don't fully understand how they work.

答案1

得分: 2

这是翻译好的部分:

你可以这样做:

    > library(tidyr)
    > pivot_longer(df, cols=-c(id,country,gender), 
                   names_to=c(".value", "wave"), 
                   names_pattern="(.*)_(w.)") %>%
        arrange(wave)
    # A tibble: 9 × 7
         id country gender wave  interview int_year age_int
      <dbl> <chr>    <dbl> <chr>     <dbl>    <dbl>   <dbl>
    1     1 UK           1 w1            1     2007      60
    2     2 Spain        1 w1            2     2008      40
    3     3 Sweden       2 w1            2     2007      50
    4     1 UK           1 w2            2     2010      63
    5     2 Spain        1 w2            2     2009      41
    6     3 Sweden       2 w2            2     2010      53
    7     1 UK           1 w3            1     2012      65
    8     2 Spain        1 w3            1     2012      44
    9     3 Sweden       2 w3            1     2013      56

如果您需要进一步的帮助,请随时告诉我。

英文:

You can do this as follows:

&gt; library(tidyr)
&gt; pivot_longer(df, cols=-c(id,country,gender), 
               names_to=c(&quot;.value&quot;, &quot;wave&quot;), 
               names_pattern=&quot;(.*)_(w.)&quot;) %&gt;% 
    arrange(wave)
# A tibble: 9 &#215; 7
     id country gender wave  interview int_year age_int
  &lt;dbl&gt; &lt;chr&gt;    &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt;    &lt;dbl&gt;   &lt;dbl&gt;
1     1 UK           1 w1            1     2007      60
2     2 Spain        1 w1            2     2008      40
3     3 Sweden       2 w1            2     2007      50
4     1 UK           1 w2            2     2010      63
5     2 Spain        1 w2            2     2009      41
6     3 Sweden       2 w2            2     2010      53
7     1 UK           1 w3            1     2012      65
8     2 Spain        1 w3            1     2012      44
9     3 Sweden       2 w3            1     2013      56

huangapple
  • 本文由 发表于 2023年5月10日 12:04:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76214813.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定