2023年5月18日 00:19:08go评论92阅读模式

英文:

How do I reshape into long format when I have multiple 'varying' variables? in R

问题

I am working with a dataset in wide format that I would like to transform to a long format for statistical analyses (linear models). However, I am stuck because I have multiple variables that 'change' (for lack of a better word) or are 'varying' variables.
让我试着用一些模拟数据来解释一下：

如何在R中有多个“变化”变量时重新塑造为长格式？

pptns = 参与者的ID
educ = 教育水平

我有两个条件（exp和CTRL）

在这两个条件下，我使用3种不同的工具（1、2和3）以及两个不同的时间点（之前和之后）测量了一个区域，所以对于每个参与者，总共有12个不同的区域计算

我还收集了其他数据，例如两个条件下的心率（HR）以及三个不同的时间点（之前、期间、之后）

现在我的问题是，如何重塑一个宽数据框，其中有条件、工具、面积时间点和心率时间点作为在参与者之间“变化”的变量？

我也很难想象这个长数据集会是什么样子，但我猜它会是这样的：

如何在R中有多个“变化”变量时重新塑造为长格式？

我试图逐个变量地重塑，以便可以逐步将宽数据集重塑为长数据集。例如，首先重塑条件如下：

如何在R中有多个“变化”变量时重新塑造为长格式？

然后，例如，工具。

然而，我不知道如何在R中编写代码。这是我的尝试：

long_1 <- reshape(wide, direction = 'long',
varying=c('area_exp_1_before', 'area_exp_2_before', 'area_exp_3_before',
'area_exp_1_after', 'area_exp_2_after', 'area_exp_3_after',
'area_CTRL_1_before', 'area_CTRL_2_before', 'area_3_Brush_before',
'area_CTRL_1_after', 'area_CTRL_2_after', 'area_3_Brush_after'),
timevar='Condition',
times=c('exp', 'CTRL'),
v.names=c('1_before', '2_before', '3_before', '1_after', '2_after', '3_after'),
idvar = "pptns",
ids = 1:nrow(all_data))

但这返回了一个错误消息。

我知道至少有三个类似的Stack Overflow主题，建议使用模式或names.to。然而，当我尝试使用这些函数时，我在如何转换一些变量而保留其他变量方面感到困惑。

我确信我在使用reshape时做错了什么，但我不知道是什么：`)

非常感谢您的帮助。提前非常感谢！

英文:

如何在R中有多个“变化”变量时重新塑造为长格式？

pptns = id of participants
educ = education level

I had two conditions (exp and CTRL)

In those two conditions, I measured an area using 3 different tools (1,2 and 3), and at two different timepoints (before and after)
so for participant there was a total of 12 different area calculations

I've also collected other data, e.g. heart rate (HR) in both conditions and at three different timepoints (before, during, after)

Now my question is, how do I reshape a wide data frame that has condition, tool, timepoint_of_area, and timepoint_of_HR, as variables that 'vary' across participants?

I am also struggling to imagine what this long dataset would look like, but I guess it would be like this:

如何在R中有多个“变化”变量时重新塑造为长格式？

I've tried to reshape one variable at a time, so that I can reshape the wide dataset into a long one step-by-step. E.g. first reshape Condition like this:

如何在R中有多个“变化”变量时重新塑造为长格式？

and then for example tool.

However, I don't know how to code that in R. This was my attempt:

long_1 &lt;- reshape(wide, direction = &#39;long&#39;,
varying=c(&#39;area_exp_1_before&#39;, &#39;area_exp_2_before&#39;, &#39;area_exp_3_before&#39;,
&#39;area_exp_1_after&#39;, &#39;area_exp_2_after&#39;, &#39;area_exp_3_after&#39;,
&#39;area_CTRL_1_before&#39;, &#39;area_CTRL_2_before&#39;, &#39;area_3_Brush_before&#39;,
&#39;area_CTRL_1_after&#39;, &#39;area_CTRL_2_after&#39;, &#39;area_3_Brush_after&#39;),
timevar=&#39;Condition&#39;,
times=c(&#39;exp&#39;, &#39;CTRL&#39;),
v.names=c(&#39;1_before&#39;, &#39;2_before&#39;, &#39;3_before&#39;, &#39;1_after&#39;, &#39;2_after&#39;, &#39;3_after&#39;),
idvar = &quot;pptns&quot;,
ids = 1:nrow(all_data)

but this returns an error message.

I know that there are at least three Stack Overflow topics that are similar, that recommend the use of e.g. patterns or names.to. However, when I try to use these functions, I get lost in how to transform some variables, while leaving others.

I'm sure I'm doing something wrong using reshape but I don't know what :`)

Any help would be tremendously appreciated. Thank you a lot in advance!

答案1

得分: 0

以下是您要翻译的内容：

虽然没有提供数据，但我创建了一些示例数据，这可能不完全与您的数据集相同，但我希望它能帮助您理解。

对我来说，首先以长格式创建示例数据，然后再转换成宽格式会更容易，而不是一开始就创建宽格式（也许在长格式中收集数据也更容易？）。

#创建示例数据
library(tidyr)
set.seed(123) #使df在每次完整运行时提供相同的HR和年龄值
#n = 参与者数量
n = 2
l <- list(
  pptn=1:n,
  condition=c("exp", "CTRL"),
  time_condition=c("before", "after"),
  tool=c(1,2,3)
)
#如果需要，可以扩展此列表以包含额外的测量变量
measurement_vars <- names(l)[-1] #-1用于排除pptn
#创建包含所有可能组合的数据框
df <- expand.grid(l, stringsAsFactors = FALSE)
names(df) <- names(l)
#每个组合的HR不同（因此对于df中的每个记录）
df$HR <- sample(40:120, nrow(df)) #随机数据
#创建宽格式
df_wide <- pivot_wider(df, names_from=measurement_vars, values_from = "HR")
#为df_wide的每个pptn（因此为df_wide的每个记录）添加年龄
df_wide$age <- sample(18:65, nrow(df_wide)) #随机数据
#如果需要，可以以相同的方式添加身高和体重
现在，我们有了df_wide示例数据，您可以将其（再次）转换为长格式。
df_long <- pivot_longer(df_wide, cols=-c("pptn", "age"), names_sep = "_", names_to=measurement_vars, values_to = "HR")
`measurement_vars`是灵活的，但如果您想以与年龄相同的方式添加身高，您需要将其添加到pivot_longer函数的cols参数中。
希望这能帮助您解决问题。
<details>
<summary>英文:</summary>
While no data is provided I created some example data, which is probably not exactly the same dataset as yours, but I hope it will help you understand. 
It is easier for me to create example data in long format and then turn it into wide, then create wide format at first (maybe also easier to collect data in long format?). 
    #Create example data
    library(tidyr)
    set.seed(123) #So the df delivers the same HR and age values every full run
    #n = number of participants
    n = 2
    l &lt;- list(
      pptn=1:n,
      condition=c(&quot;exp&quot;, &quot;CTRL&quot;),
      time_condition=c(&quot;before&quot;, &quot;after&quot;),
      tool=c(1,2,3)
    )
    #Expand this list with extra measurement vars if you want
    measurement_vars &lt;- names(l)[-1] #-1 to exclude pptn
    #Create data.frame with every combination possible
    df &lt;- expand.grid(l, stringsAsFactors = F)
    names(df) &lt;- names(l)
    #HR differs per combination (thus for every record in df)
    df$HR &lt;- sample(40:120, nrow(df)) #random data
    #Create wide format
    df_wide &lt;- pivot_wider(df, names_from=measurement_vars, values_from = &quot;HR&quot;)
    #Add an age for every pptn (thus for every record of df_wide)
    df_wide$age &lt;- sample(18:65, nrow(df_wide)) #random data
    #Add height and weight in the same way if you want
And now that we have our df_wide example data you can change it (back) to long.
    df_long &lt;- pivot_longer(df_wide, cols=-c(&quot;pptn&quot;, &quot;age&quot;), names_sep = &quot;_&quot;, names_to=measurement_vars, values_to = &quot;HR&quot;)
`Measurement_vars` is made flexible, but if you want to add for instance height to your df_wide in the same way as age, you need to add it to the cols argument in the pivot_longer function.
I Hope this helps you figure things out.
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中有多个“变化”变量时重新塑造为长格式？

问题

答案1

如何在R中使用ggplot绘制xts时间序列？

如何在R中按连续的开始时间和结束时间分组？

建立一个新的pd.dataframe，其中包含来自自定义函数的统计信息。

我想分割一个字符串，然后统计数据框中列的值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论