英文:
Reshaping a dataframe from wide to long format in R using multiple sets of variables
问题
Sure, here's the translation of the code and your request:
我有一个宽格式的数据集,包含了多次调查的参与者信息,包括他们的国家、性别、访谈时的年龄以及每次调查的年份和是否参与。
这是三位参与者信息的样本:
# 数据集
df <- data.frame(
id = c(1,2,3),
country = c("UK", "Spain", "Sweden"),
gender = c(1, 1, 2),
interview_w1 = c(1, 2, 2),
interview_w2 = c(2, 2, 2),
interview_w3 = c(1, 1, 1),
int_year_w1 = c(2007, 2008, 2007),
int_year_w2 = c(2010, 2009, 2010),
int_year_w3 = c(2012, 2012, 2013),
age_int_w1 = c(60, 40, 50),
age_int_w2 = c(63, 41, 53),
age_int_w3 = c(65, 44, 56)
)
我想使用R中的pivot_longer()
函数将这个数据集转换为长格式。然而,我在实现期望结果方面遇到了困难。具体来说,我想要将以'interview_'、'int_year_'和'age_int_'开头的列进行转换。
这是一个显示期望结果的表格:
id country gender wave interview year age
1 1 UK 1 w1 1 2007 60
2 2 Spain 1 w1 2 2008 40
3 3 Sweden 2 w1 2 2007 50
4 1 UK 1 w2 2 2010 63
5 2 Spain 1 w2 2 2009 41
6 3 Sweden 2 w2 2 2010 53
7 1 UK 1 w3 1 2012 65
8 2 Spain 1 w3 1 2012 44
9 3 Sweden 2 w3 1 2013 56
请问是否可以提供关于如何实现这一转换的指导?我尝试使用pivot_longer()
中的names_to
和names_pattern
参数,但没有成功,因为我不完全理解它们的工作原理。
英文:
I have a dataset in wide format with participants' information for multiple waves of a survey, including their country, gender, age at interview, and the year and whether they participated in each wave of a survey.
Here is a sample of the information of three participants:
#Dataset
df <- data.frame(
id = c(1,2,3),
country = c("UK", "Spain", "Sweden"),
gender = c(1, 1, 2),
interview_w1 = c(1, 2, 2),
interview_w2 = c(2, 2, 2),
interview_w3 = c(1, 1, 1),
int_year_w1 = c(2007, 2008, 2007),
int_year_w2 = c(2010, 2009, 2010),
int_year_w3 = c(2012, 2012, 2013),
age_int_w1 = c(60, 40, 50),
age_int_w2 = c(63, 41, 53),
age_int_w3 = c(65, 44, 56)
)
I want to convert this dataset to long format using the pivot_longer() function in R. However, I am having difficulty achieving the desired result. Specifically, I want to pivot the columns starting with 'interview_', 'int_year_' and 'age_int_'.
Here is a table showing the desired result:
id country gender wave interview year age
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 UK 1 w1 1 2007 60
2 2 Spain 1 w1 2 2008 40
3 3 Sweden 2 w1 2 2007 50
4 1 UK 1 w2 2 2010 63
5 2 Spain 1 w2 2 2009 41
6 3 Sweden 2 w2 2 2010 53
7 1 UK 1 w3 1 2012 65
8 2 Spain 1 w3 1 2012 44
9 3 Sweden 2 w3 1 2013 56
Can someone please provide guidance on how to do that?
I tried using the names_to and names_pattern arguments in pivot_longer() without success as I don't fully understand how they work.
答案1
得分: 2
这是翻译好的部分:
你可以这样做:
> library(tidyr)
> pivot_longer(df, cols=-c(id,country,gender),
names_to=c(".value", "wave"),
names_pattern="(.*)_(w.)") %>%
arrange(wave)
# A tibble: 9 × 7
id country gender wave interview int_year age_int
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 UK 1 w1 1 2007 60
2 2 Spain 1 w1 2 2008 40
3 3 Sweden 2 w1 2 2007 50
4 1 UK 1 w2 2 2010 63
5 2 Spain 1 w2 2 2009 41
6 3 Sweden 2 w2 2 2010 53
7 1 UK 1 w3 1 2012 65
8 2 Spain 1 w3 1 2012 44
9 3 Sweden 2 w3 1 2013 56
如果您需要进一步的帮助,请随时告诉我。
英文:
You can do this as follows:
> library(tidyr)
> pivot_longer(df, cols=-c(id,country,gender),
names_to=c(".value", "wave"),
names_pattern="(.*)_(w.)") %>%
arrange(wave)
# A tibble: 9 × 7
id country gender wave interview int_year age_int
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 UK 1 w1 1 2007 60
2 2 Spain 1 w1 2 2008 40
3 3 Sweden 2 w1 2 2007 50
4 1 UK 1 w2 2 2010 63
5 2 Spain 1 w2 2 2009 41
6 3 Sweden 2 w2 2 2010 53
7 1 UK 1 w3 1 2012 65
8 2 Spain 1 w3 1 2012 44
9 3 Sweden 2 w3 1 2013 56
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论