2023年8月5日 05:49:11go评论134阅读模式

英文:

R: re-arrange dataframe columns based on factor

问题

假设我有一个值的数据框：

exp <- as.factor(c(rep('UT',3),rep('NC',3),rep('PC',3)))
fact <- as.factor(rep(c('A','B','C'),3))
set.seed(10)
avg <- rnorm(9,10,1)
sd <- rnorm(9,2,0.5)
df <- data.frame(exp,fact,avg,sd)

因此，df有三种实验处理，具有因子A-C，每个因子都有平均值和标准差。

     exp fact       avg       sd
1  UT    A 10.018746 1.871761
2  UT    B  9.815747 2.550890
3  UT    C  8.628669 2.377891
4  NC    A  9.400832 1.880883
5  NC    B 10.294545 2.493722
6  NC    C 10.389794 2.370695
7  PC    A  8.791924 2.044674
8  PC    B  9.636324 1.522528
9  PC    C  8.373327 1.902425

是否有一种有效的方法来重新排列数据框，如下所示：

[1] "exp"   "A.avg" "A.sd"  "B.avg" "B.sd"  "C.avg" "C.sd"

在这种情况下，我们只有3行，每个处理一行。

我怀疑解决方案可能与dplyr或tidyverse有关...

你可以使用dplyr和tidyr库来重新排列数据框，以得到所需的格式。下面是一个示例代码：

library(dplyr)
library(tidyr)
df_new <- df %>%
  pivot_wider(names_from = fact, values_from = c(avg, sd)) %>%
  rename_with(~paste0(.$fact, ".", .), -exp)
colnames(df_new) <- c("exp", paste0(unique(df$fact), c(".avg", ".sd")))
df_new

这将产生以下输出：

     exp   A.avg     A.sd   B.avg     B.sd   C.avg     C.sd
1  UT    10.018746 1.871761 9.815747 2.550890 8.628669 2.377891
2  NC    9.400832  1.880883 10.294545 2.493722 10.389794 2.370695
3  PC    8.791924  2.044674 9.636324  1.522528 8.373327 1.902425

这样，你就得到了所需的数据框格式，每个处理的平均值和标准差都被列在一起，并具有相应的列名。

英文:

Suppose I have a dataframe of values:

exp &lt;- as.factor(c(rep(&#39;UT&#39;,3),rep(&#39;NC&#39;,3),rep(&#39;PC&#39;,3)))
fact &lt;- as.factor(rep(c(&#39;A&#39;,&#39;B&#39;,&#39;C&#39;),3))
set.seed(10)
avg &lt;- rnorm(9,10,1)
sd &lt;- rnorm(9,2,0.5)
df &lt;- data.frame(exp,fact,avg,sd)

So df has three experimental treatments, with factors A-C, each with avg and sd.

  exp fact       avg       sd
1  UT    A 10.018746 1.871761
2  UT    B  9.815747 2.550890
3  UT    C  8.628669 2.377891
4  NC    A  9.400832 1.880883
5  NC    B 10.294545 2.493722
6  NC    C 10.389794 2.370695
7  PC    A  8.791924 2.044674
8  PC    B  9.636324 1.522528
9  PC    C  8.373327 1.902425

Is there an efficient way to re-arrange the dataframe in this way:

[1] &quot;exp&quot;   &quot;A.avg&quot; &quot;A.sd&quot;  &quot;B.avg&quot; &quot;B.sd&quot;  &quot;C.avg&quot; &quot;C.sd&quot;

In this case, we'd have just 3 rows, one for each treatment.

I suspect the solution lies with dplyr or tidyverse...

答案1

得分: 1

你可以使用 tidyr::pivot_wider

library(dplyr)
library(tidyr)
df %>%
  pivot_wider(names_from = fact,
              values_from = c(avg, sd),
              names_glue = "{fact}.{.value}")
#------
# A tibble: 3 x 7
  exp   A.avg B.avg C.avg  A.sd  B.sd  C.sd
  <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 UT     10.0  9.82  8.63  1.87  2.55  2.38
2 NC      9.4 10.3  10.4   1.88  2.49  2.37
3 PC      8.79  9.64  8.37  2.04  1.52  1.90

英文:

You can use tidyr::pivot_wider

library(dplyr)
library(tidyr)
df %&gt;%
  pivot_wider(names_from = fact,
              values_from = c(avg, sd),
              names_glue = &quot;{fact}.{.value}&quot;)  
#------
# A tibble: 3 x 7
  exp   A.avg B.avg C.avg  A.sd  B.sd  C.sd
  &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 UT    10.0   9.82  8.63  1.87  2.55  2.38
2 NC     9.40 10.3  10.4   1.88  2.49  2.37
3 PC     8.79  9.64  8.37  2.04  1.52  1.90

答案2

得分: 1

Here is the translation of the code part you provided:

df %>% pivot_longer(c(avg, sd)) %>%
  pivot_wider(id_cols = exp, names_from = c(fact, name), values_from = value, names_sep = '.')

Please note that code translations may not be perfect, and it's essential to verify the translated code for correctness.

英文:

alternatively

df %&gt;% pivot_longer(c(avg,sd)) %&gt;% 
  pivot_wider(id_cols = exp, names_from = c(fact,name), values_from = value, names_sep = &#39;.&#39;)

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

# A tibble: 3 &#215; 7
  exp   A.avg  A.sd B.avg  B.sd C.avg  C.sd
  &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 UT    10.0   1.87  9.82  2.55  8.63  2.38
2 NC     9.40  1.88 10.3   2.49 10.4   2.37
3 PC     8.79  2.04  9.64  1.52  8.37  1.90

答案3

得分: 1

使用reshape的方法

reshape(df, timevar="fact", idvar="exp", direction="wide")
  exp     avg.A     sd.A     avg.B     sd.B     avg.C     sd.C
1  UT 10.018746 1.871761  9.815747 2.550890  8.628669 2.377891
4  NC  9.400832 1.880883 10.294545 2.493722 10.389794 2.370695
7  PC  8.791924 2.044674  9.636324 1.522528  8.373327 1.902425

英文:

An approach using reshape

reshape(df, timevar=&quot;fact&quot;, idvar=&quot;exp&quot;, direction=&quot;wide&quot;)
  exp     avg.A     sd.A     avg.B     sd.B     avg.C     sd.C
1  UT 10.018746 1.871761  9.815747 2.550890  8.628669 2.377891
4  NC  9.400832 1.880883 10.294545 2.493722 10.389794 2.370695
7  PC  8.791924 2.044674  9.636324 1.522528  8.373327 1.902425

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据因子重新排列数据框列：

问题

答案1

答案2

答案3

使用select函数选择数据集中的所有行，除了一行。

将int执行Apply(str)会创建一堆\n字符吗？

打开文件夹中的所有 XML 文件并仅保存少量信息。

OpenAI ChatGPT (GPT-3.5) API错误 400: “‘user’ 不是类型为 ‘object’ 的对象”

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论