根据因子重新排列数据框列:

huangapple go评论105阅读模式
英文:

R: re-arrange dataframe columns based on factor

问题

假设我有一个值的数据框:

exp <- as.factor(c(rep('UT',3),rep('NC',3),rep('PC',3)))
fact <- as.factor(rep(c('A','B','C'),3))
set.seed(10)
avg <- rnorm(9,10,1)
sd <- rnorm(9,2,0.5)
df <- data.frame(exp,fact,avg,sd)

因此,df有三种实验处理,具有因子A-C,每个因子都有平均值和标准差。

     exp fact       avg       sd
1  UT    A 10.018746 1.871761
2  UT    B  9.815747 2.550890
3  UT    C  8.628669 2.377891
4  NC    A  9.400832 1.880883
5  NC    B 10.294545 2.493722
6  NC    C 10.389794 2.370695
7  PC    A  8.791924 2.044674
8  PC    B  9.636324 1.522528
9  PC    C  8.373327 1.902425

是否有一种有效的方法来重新排列数据框,如下所示:

[1] "exp"   "A.avg" "A.sd"  "B.avg" "B.sd"  "C.avg" "C.sd"

在这种情况下,我们只有3行,每个处理一行。

我怀疑解决方案可能与dplyr或tidyverse有关...


你可以使用dplyr和tidyr库来重新排列数据框,以得到所需的格式。下面是一个示例代码:

library(dplyr)
library(tidyr)

df_new <- df %>%
  pivot_wider(names_from = fact, values_from = c(avg, sd)) %>%
  rename_with(~paste0(.$fact, ".", .), -exp)

colnames(df_new) <- c("exp", paste0(unique(df$fact), c(".avg", ".sd")))

df_new

这将产生以下输出:

     exp   A.avg     A.sd   B.avg     B.sd   C.avg     C.sd
1  UT    10.018746 1.871761 9.815747 2.550890 8.628669 2.377891
2  NC    9.400832  1.880883 10.294545 2.493722 10.389794 2.370695
3  PC    8.791924  2.044674 9.636324  1.522528 8.373327 1.902425

这样,你就得到了所需的数据框格式,每个处理的平均值和标准差都被列在一起,并具有相应的列名。

英文:

Suppose I have a dataframe of values:

exp &lt;- as.factor(c(rep(&#39;UT&#39;,3),rep(&#39;NC&#39;,3),rep(&#39;PC&#39;,3)))
fact &lt;- as.factor(rep(c(&#39;A&#39;,&#39;B&#39;,&#39;C&#39;),3))
set.seed(10)
avg &lt;- rnorm(9,10,1)
sd &lt;- rnorm(9,2,0.5)
df &lt;- data.frame(exp,fact,avg,sd)

So df has three experimental treatments, with factors A-C, each with avg and sd.

  exp fact       avg       sd
1  UT    A 10.018746 1.871761
2  UT    B  9.815747 2.550890
3  UT    C  8.628669 2.377891
4  NC    A  9.400832 1.880883
5  NC    B 10.294545 2.493722
6  NC    C 10.389794 2.370695
7  PC    A  8.791924 2.044674
8  PC    B  9.636324 1.522528
9  PC    C  8.373327 1.902425

Is there an efficient way to re-arrange the dataframe in this way:

[1] &quot;exp&quot;   &quot;A.avg&quot; &quot;A.sd&quot;  &quot;B.avg&quot; &quot;B.sd&quot;  &quot;C.avg&quot; &quot;C.sd&quot;

In this case, we'd have just 3 rows, one for each treatment.

I suspect the solution lies with dplyr or tidyverse...

答案1

得分: 1

你可以使用 tidyr::pivot_wider

library(dplyr)
library(tidyr)

df %>%
  pivot_wider(names_from = fact,
              values_from = c(avg, sd),
              names_glue = "{fact}.{.value}")

#------
# A tibble: 3 x 7
  exp   A.avg B.avg C.avg  A.sd  B.sd  C.sd
  <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 UT     10.0  9.82  8.63  1.87  2.55  2.38
2 NC      9.4 10.3  10.4   1.88  2.49  2.37
3 PC      8.79  9.64  8.37  2.04  1.52  1.90
英文:

You can use tidyr::pivot_wider

library(dplyr)
library(tidyr)


df %&gt;%
  pivot_wider(names_from = fact,
              values_from = c(avg, sd),
              names_glue = &quot;{fact}.{.value}&quot;)  

#------
# A tibble: 3 x 7
  exp   A.avg B.avg C.avg  A.sd  B.sd  C.sd
  &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 UT    10.0   9.82  8.63  1.87  2.55  2.38
2 NC     9.40 10.3  10.4   1.88  2.49  2.37
3 PC     8.79  9.64  8.37  2.04  1.52  1.90

答案2

得分: 1

Here is the translation of the code part you provided:

df %>% pivot_longer(c(avg, sd)) %>%
  pivot_wider(id_cols = exp, names_from = c(fact, name), values_from = value, names_sep = '.')

Please note that code translations may not be perfect, and it's essential to verify the translated code for correctness.

英文:

alternatively

df %&gt;% pivot_longer(c(avg,sd)) %&gt;% 
  pivot_wider(id_cols = exp, names_from = c(fact,name), values_from = value, names_sep = &#39;.&#39;)

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

# A tibble: 3 &#215; 7
  exp   A.avg  A.sd B.avg  B.sd C.avg  C.sd
  &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 UT    10.0   1.87  9.82  2.55  8.63  2.38
2 NC     9.40  1.88 10.3   2.49 10.4   2.37
3 PC     8.79  2.04  9.64  1.52  8.37  1.90

答案3

得分: 1

使用reshape的方法

reshape(df, timevar="fact", idvar="exp", direction="wide")
  exp     avg.A     sd.A     avg.B     sd.B     avg.C     sd.C
1  UT 10.018746 1.871761  9.815747 2.550890  8.628669 2.377891
4  NC  9.400832 1.880883 10.294545 2.493722 10.389794 2.370695
7  PC  8.791924 2.044674  9.636324 1.522528  8.373327 1.902425
英文:

An approach using reshape

reshape(df, timevar=&quot;fact&quot;, idvar=&quot;exp&quot;, direction=&quot;wide&quot;)
  exp     avg.A     sd.A     avg.B     sd.B     avg.C     sd.C
1  UT 10.018746 1.871761  9.815747 2.550890  8.628669 2.377891
4  NC  9.400832 1.880883 10.294545 2.493722 10.389794 2.370695
7  PC  8.791924 2.044674  9.636324 1.522528  8.373327 1.902425

huangapple
  • 本文由 发表于 2023年8月5日 05:49:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76839257.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定