根据因子重新排列数据框列:

huangapple go评论134阅读模式
英文:

R: re-arrange dataframe columns based on factor

问题

假设我有一个值的数据框:

  1. exp <- as.factor(c(rep('UT',3),rep('NC',3),rep('PC',3)))
  2. fact <- as.factor(rep(c('A','B','C'),3))
  3. set.seed(10)
  4. avg <- rnorm(9,10,1)
  5. sd <- rnorm(9,2,0.5)
  6. df <- data.frame(exp,fact,avg,sd)

因此,df有三种实验处理,具有因子A-C,每个因子都有平均值和标准差。

  1. exp fact avg sd
  2. 1 UT A 10.018746 1.871761
  3. 2 UT B 9.815747 2.550890
  4. 3 UT C 8.628669 2.377891
  5. 4 NC A 9.400832 1.880883
  6. 5 NC B 10.294545 2.493722
  7. 6 NC C 10.389794 2.370695
  8. 7 PC A 8.791924 2.044674
  9. 8 PC B 9.636324 1.522528
  10. 9 PC C 8.373327 1.902425

是否有一种有效的方法来重新排列数据框,如下所示:

  1. [1] "exp" "A.avg" "A.sd" "B.avg" "B.sd" "C.avg" "C.sd"

在这种情况下,我们只有3行,每个处理一行。

我怀疑解决方案可能与dplyr或tidyverse有关...


你可以使用dplyr和tidyr库来重新排列数据框,以得到所需的格式。下面是一个示例代码:

  1. library(dplyr)
  2. library(tidyr)
  3. df_new <- df %>%
  4. pivot_wider(names_from = fact, values_from = c(avg, sd)) %>%
  5. rename_with(~paste0(.$fact, ".", .), -exp)
  6. colnames(df_new) <- c("exp", paste0(unique(df$fact), c(".avg", ".sd")))
  7. df_new

这将产生以下输出:

  1. exp A.avg A.sd B.avg B.sd C.avg C.sd
  2. 1 UT 10.018746 1.871761 9.815747 2.550890 8.628669 2.377891
  3. 2 NC 9.400832 1.880883 10.294545 2.493722 10.389794 2.370695
  4. 3 PC 8.791924 2.044674 9.636324 1.522528 8.373327 1.902425

这样,你就得到了所需的数据框格式,每个处理的平均值和标准差都被列在一起,并具有相应的列名。

英文:

Suppose I have a dataframe of values:

  1. exp &lt;- as.factor(c(rep(&#39;UT&#39;,3),rep(&#39;NC&#39;,3),rep(&#39;PC&#39;,3)))
  2. fact &lt;- as.factor(rep(c(&#39;A&#39;,&#39;B&#39;,&#39;C&#39;),3))
  3. set.seed(10)
  4. avg &lt;- rnorm(9,10,1)
  5. sd &lt;- rnorm(9,2,0.5)
  6. df &lt;- data.frame(exp,fact,avg,sd)

So df has three experimental treatments, with factors A-C, each with avg and sd.

  1. exp fact avg sd
  2. 1 UT A 10.018746 1.871761
  3. 2 UT B 9.815747 2.550890
  4. 3 UT C 8.628669 2.377891
  5. 4 NC A 9.400832 1.880883
  6. 5 NC B 10.294545 2.493722
  7. 6 NC C 10.389794 2.370695
  8. 7 PC A 8.791924 2.044674
  9. 8 PC B 9.636324 1.522528
  10. 9 PC C 8.373327 1.902425

Is there an efficient way to re-arrange the dataframe in this way:

  1. [1] &quot;exp&quot; &quot;A.avg&quot; &quot;A.sd&quot; &quot;B.avg&quot; &quot;B.sd&quot; &quot;C.avg&quot; &quot;C.sd&quot;

In this case, we'd have just 3 rows, one for each treatment.

I suspect the solution lies with dplyr or tidyverse...

答案1

得分: 1

你可以使用 tidyr::pivot_wider

  1. library(dplyr)
  2. library(tidyr)
  3. df %>%
  4. pivot_wider(names_from = fact,
  5. values_from = c(avg, sd),
  6. names_glue = "{fact}.{.value}")
  7. #------
  8. # A tibble: 3 x 7
  9. exp A.avg B.avg C.avg A.sd B.sd C.sd
  10. <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  11. 1 UT 10.0 9.82 8.63 1.87 2.55 2.38
  12. 2 NC 9.4 10.3 10.4 1.88 2.49 2.37
  13. 3 PC 8.79 9.64 8.37 2.04 1.52 1.90
英文:

You can use tidyr::pivot_wider

  1. library(dplyr)
  2. library(tidyr)
  3. df %&gt;%
  4. pivot_wider(names_from = fact,
  5. values_from = c(avg, sd),
  6. names_glue = &quot;{fact}.{.value}&quot;)
  7. #------
  8. # A tibble: 3 x 7
  9. exp A.avg B.avg C.avg A.sd B.sd C.sd
  10. &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  11. 1 UT 10.0 9.82 8.63 1.87 2.55 2.38
  12. 2 NC 9.40 10.3 10.4 1.88 2.49 2.37
  13. 3 PC 8.79 9.64 8.37 2.04 1.52 1.90

答案2

得分: 1

Here is the translation of the code part you provided:

  1. df %>% pivot_longer(c(avg, sd)) %>%
  2. pivot_wider(id_cols = exp, names_from = c(fact, name), values_from = value, names_sep = '.')

Please note that code translations may not be perfect, and it's essential to verify the translated code for correctness.

英文:

alternatively

  1. df %&gt;% pivot_longer(c(avg,sd)) %&gt;%
  2. pivot_wider(id_cols = exp, names_from = c(fact,name), values_from = value, names_sep = &#39;.&#39;)

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

  1. # A tibble: 3 &#215; 7
  2. exp A.avg A.sd B.avg B.sd C.avg C.sd
  3. &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  4. 1 UT 10.0 1.87 9.82 2.55 8.63 2.38
  5. 2 NC 9.40 1.88 10.3 2.49 10.4 2.37
  6. 3 PC 8.79 2.04 9.64 1.52 8.37 1.90

答案3

得分: 1

使用reshape的方法

  1. reshape(df, timevar="fact", idvar="exp", direction="wide")
  2. exp avg.A sd.A avg.B sd.B avg.C sd.C
  3. 1 UT 10.018746 1.871761 9.815747 2.550890 8.628669 2.377891
  4. 4 NC 9.400832 1.880883 10.294545 2.493722 10.389794 2.370695
  5. 7 PC 8.791924 2.044674 9.636324 1.522528 8.373327 1.902425
英文:

An approach using reshape

  1. reshape(df, timevar=&quot;fact&quot;, idvar=&quot;exp&quot;, direction=&quot;wide&quot;)
  2. exp avg.A sd.A avg.B sd.B avg.C sd.C
  3. 1 UT 10.018746 1.871761 9.815747 2.550890 8.628669 2.377891
  4. 4 NC 9.400832 1.880883 10.294545 2.493722 10.389794 2.370695
  5. 7 PC 8.791924 2.044674 9.636324 1.522528 8.373327 1.902425

huangapple
  • 本文由 发表于 2023年8月5日 05:49:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76839257.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定