英文:
How do I reshape into long format when I have multiple 'varying' variables? in R
问题
I am working with a dataset in wide format that I would like to transform to a long format for statistical analyses (linear models). However, I am stuck because I have multiple variables that 'change' (for lack of a better word) or are 'varying' variables.
让我试着用一些模拟数据来解释一下:
pptns = 参与者的ID
educ = 教育水平
我有两个条件(exp和CTRL)
在这两个条件下,我使用3种不同的工具(1、2和3)以及两个不同的时间点(之前和之后)测量了一个区域,所以对于每个参与者,总共有12个不同的区域计算
我还收集了其他数据,例如两个条件下的心率(HR)以及三个不同的时间点(之前、期间、之后)
现在我的问题是,如何重塑一个宽数据框,其中有条件、工具、面积时间点和心率时间点作为在参与者之间“变化”的变量?
我也很难想象这个长数据集会是什么样子,但我猜它会是这样的:
我试图逐个变量地重塑,以便可以逐步将宽数据集重塑为长数据集。例如,首先重塑条件如下:
然后,例如,工具。
然而,我不知道如何在R中编写代码。这是我的尝试:
long_1 <- reshape(wide, direction = 'long',
varying=c('area_exp_1_before', 'area_exp_2_before', 'area_exp_3_before',
'area_exp_1_after', 'area_exp_2_after', 'area_exp_3_after',
'area_CTRL_1_before', 'area_CTRL_2_before', 'area_3_Brush_before',
'area_CTRL_1_after', 'area_CTRL_2_after', 'area_3_Brush_after'),
timevar='Condition',
times=c('exp', 'CTRL'),
v.names=c('1_before', '2_before', '3_before', '1_after', '2_after', '3_after'),
idvar = "pptns",
ids = 1:nrow(all_data))
但这返回了一个错误消息。
我知道至少有三个类似的Stack Overflow主题,建议使用模式或names.to。然而,当我尝试使用这些函数时,我在如何转换一些变量而保留其他变量方面感到困惑。
我确信我在使用reshape时做错了什么,但我不知道是什么:`)
非常感谢您的帮助。提前非常感谢!
英文:
I am working with a dataset in wide format that I would like to transform to a long format for statistical analyses (linear models). However, I am stuck because I have multiple variables that 'change' (for lack of a better word) or are 'varying' variables.
Let me try to explain using some mock-data:
pptns = id of participants
educ = education level
I had two conditions (exp and CTRL)
In those two conditions, I measured an area using 3 different tools (1,2 and 3), and at two different timepoints (before and after)
so for participant there was a total of 12 different area calculations
I've also collected other data, e.g. heart rate (HR) in both conditions and at three different timepoints (before, during, after)
Now my question is, how do I reshape a wide data frame that has condition, tool, timepoint_of_area, and timepoint_of_HR, as variables that 'vary' across participants?
I am also struggling to imagine what this long dataset would look like, but I guess it would be like this:
I've tried to reshape one variable at a time, so that I can reshape the wide dataset into a long one step-by-step. E.g. first reshape Condition like this:
and then for example tool.
However, I don't know how to code that in R. This was my attempt:
long_1 <- reshape(wide, direction = 'long',
varying=c('area_exp_1_before', 'area_exp_2_before', 'area_exp_3_before',
'area_exp_1_after', 'area_exp_2_after', 'area_exp_3_after',
'area_CTRL_1_before', 'area_CTRL_2_before', 'area_3_Brush_before',
'area_CTRL_1_after', 'area_CTRL_2_after', 'area_3_Brush_after'),
timevar='Condition',
times=c('exp', 'CTRL'),
v.names=c('1_before', '2_before', '3_before', '1_after', '2_after', '3_after'),
idvar = "pptns",
ids = 1:nrow(all_data)
but this returns an error message.
I know that there are at least three Stack Overflow topics that are similar, that recommend the use of e.g. patterns or names.to. However, when I try to use these functions, I get lost in how to transform some variables, while leaving others.
I'm sure I'm doing something wrong using reshape but I don't know what :`)
Any help would be tremendously appreciated. Thank you a lot in advance!
答案1
得分: 0
以下是您要翻译的内容:
虽然没有提供数据,但我创建了一些示例数据,这可能不完全与您的数据集相同,但我希望它能帮助您理解。
对我来说,首先以长格式创建示例数据,然后再转换成宽格式会更容易,而不是一开始就创建宽格式(也许在长格式中收集数据也更容易?)。
#创建示例数据
library(tidyr)
set.seed(123) #使df在每次完整运行时提供相同的HR和年龄值
#n = 参与者数量
n = 2
l <- list(
pptn=1:n,
condition=c("exp", "CTRL"),
time_condition=c("before", "after"),
tool=c(1,2,3)
)
#如果需要,可以扩展此列表以包含额外的测量变量
measurement_vars <- names(l)[-1] #-1用于排除pptn
#创建包含所有可能组合的数据框
df <- expand.grid(l, stringsAsFactors = FALSE)
names(df) <- names(l)
#每个组合的HR不同(因此对于df中的每个记录)
df$HR <- sample(40:120, nrow(df)) #随机数据
#创建宽格式
df_wide <- pivot_wider(df, names_from=measurement_vars, values_from = "HR")
#为df_wide的每个pptn(因此为df_wide的每个记录)添加年龄
df_wide$age <- sample(18:65, nrow(df_wide)) #随机数据
#如果需要,可以以相同的方式添加身高和体重
现在,我们有了df_wide示例数据,您可以将其(再次)转换为长格式。
df_long <- pivot_longer(df_wide, cols=-c("pptn", "age"), names_sep = "_", names_to=measurement_vars, values_to = "HR")
`measurement_vars`是灵活的,但如果您想以与年龄相同的方式添加身高,您需要将其添加到pivot_longer函数的cols参数中。
希望这能帮助您解决问题。
<details>
<summary>英文:</summary>
While no data is provided I created some example data, which is probably not exactly the same dataset as yours, but I hope it will help you understand.
It is easier for me to create example data in long format and then turn it into wide, then create wide format at first (maybe also easier to collect data in long format?).
#Create example data
library(tidyr)
set.seed(123) #So the df delivers the same HR and age values every full run
#n = number of participants
n = 2
l <- list(
pptn=1:n,
condition=c("exp", "CTRL"),
time_condition=c("before", "after"),
tool=c(1,2,3)
)
#Expand this list with extra measurement vars if you want
measurement_vars <- names(l)[-1] #-1 to exclude pptn
#Create data.frame with every combination possible
df <- expand.grid(l, stringsAsFactors = F)
names(df) <- names(l)
#HR differs per combination (thus for every record in df)
df$HR <- sample(40:120, nrow(df)) #random data
#Create wide format
df_wide <- pivot_wider(df, names_from=measurement_vars, values_from = "HR")
#Add an age for every pptn (thus for every record of df_wide)
df_wide$age <- sample(18:65, nrow(df_wide)) #random data
#Add height and weight in the same way if you want
And now that we have our df_wide example data you can change it (back) to long.
df_long <- pivot_longer(df_wide, cols=-c("pptn", "age"), names_sep = "_", names_to=measurement_vars, values_to = "HR")
`Measurement_vars` is made flexible, but if you want to add for instance height to your df_wide in the same way as age, you need to add it to the cols argument in the pivot_longer function.
I Hope this helps you figure things out.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论