重排 R 数据框架(根据特定条件更改为宽格式,重命名和重新排列列)

huangapple go评论89阅读模式
英文:

Rearranging a R dataframe (changing to wide format based on certain conditions, renaming and reshuffling columns)

问题

我有一个数据框,看起来像这样:

  1. example <- data.frame(
  2. date = c("6/1/22", "6/2/22", "6/3/22",
  3. "6/1/22", "6/2/22", "6/3/22",
  4. "6/1/22", "6/2/22", "6/3/22",
  5. "6/1/22", "6/2/22", "6/3/22"),
  6. sub = c(1101, 1101, 1101,
  7. 1102, 1102, 1102,
  8. 2101, 2101, 2101,
  9. 2102, 2102, 2102),
  10. express_p = c("eg1", "eg2", "eg3", "eg4",
  11. "eg5", "eg6", "eg7", "eg8",
  12. "eg9", "eg10", "eg11", "eg12"),
  13. p_express = c("a", "b", "c", "d",
  14. "e", "f", "g", "h",
  15. "i", "j", "k", "l")
  16. )

我想将其转换为宽格式,并重新排列列的顺序。结果应如下所示:

  1. example_clean <- data.frame(
  2. date = c("6/1/22", "6/2/22", "6/3/22", "6/1/22", "6/2/22", "6/3/22"),
  3. subA = c(1101, 1101, 1101, 1102, 1102, 1102),
  4. subB = c(2101, 2101, 2101, 2102, 2102, 2102),
  5. express_p_A = c("eg1", "eg2", "eg3", "eg7", "eg8", "eg9"),
  6. p_express_B = c("d", "e", "f", "j", "k", "l"),
  7. express_p_B = c("eg4", "eg5", "eg6", "eg10", "eg11", "eg12"),
  8. p_express_A = c("a", "b", "c", "g", "h", "i")
  9. )

基本上,我将具有相同最后3位数字的sub中的所有数字配对在同一行上。然后,列的顺序也应重新排列(并重命名),以便一个主题的express_p位于其对应伙伴的p_express的右边(例如,1101的express_p位于2101的p_express的左边)。编辑:它还按日期进行分类。

有人知道如何优雅地做到这一点吗?

谢谢!

英文:

I have a dataframe that looks something like this:

  1. example &lt;- data.frame(
  2. date = c(&quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;,
  3. &quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;,
  4. &quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;,
  5. &quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;),
  6. sub = c(1101, 1101, 1101,
  7. 1102, 1102, 1102,
  8. 2101, 2101, 2101,
  9. 2102, 2102, 2102),
  10. express_p = c(&quot;eg1&quot;, &quot;eg2&quot;, &quot;eg3&quot;, &quot;eg4&quot;,
  11. &quot;eg5&quot;, &quot;eg6&quot;, &quot;eg7&quot;, &quot;eg8&quot;,
  12. &quot;eg9&quot;, &quot;eg10&quot;, &quot;eg11&quot;, &quot;eg12&quot;),
  13. p_express = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;,
  14. &quot;e&quot;, &quot;f&quot;, &quot;g&quot;, &quot;h&quot;,
  15. &quot;i&quot;, &quot;j&quot;, &quot;k&quot;, &quot;l&quot;)
  16. )

I want to make it into a wider format and also reshuffle the column order. This is how the end result should look like:

  1. example_clean &lt;- data.frame(
  2. date = c(&quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;, &quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;),
  3. subA = c(1101, 1101, 1101, 1102, 1102, 1102),
  4. subB = c(2101, 2101, 2101, 2102, 2102, 2102),
  5. express_p_A = c(&quot;eg1&quot;, &quot;eg2&quot;, &quot;eg3&quot;, &quot;eg7&quot;, &quot;eg8&quot;, &quot;eg9&quot;),
  6. p_express_B = c(&quot;d&quot;, &quot;e&quot;, &quot;f&quot;, &quot;j&quot;, &quot;k&quot;, &quot;l&quot;),
  7. express_p_B = c(&quot;eg4&quot;, &quot;eg5&quot;, &quot;eg6&quot;, &quot;eg10&quot;, &quot;eg11&quot;, &quot;eg12&quot;),
  8. p_express_A = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;g&quot;, &quot;h&quot;, &quot;i&quot;)
  9. )

Essentially, I am pairing up all the numbers in sub that have the same 3 last digits to be in the saw row. Then, the order of the columns should also be reshuffled (and renamed) such that express_p for one subject is right beside p_express of its corresponding partner (e.g. 1101's express_p is to the left of 2101's p_express). Edit: It is also categorized by date.

Does anyone know an elegant way to do this?

Thank you!

答案1

得分: 3

你需要在进行数据透视之前定义两列:一个用于添加到新列(A 和 B)的前缀,另一个用于识别行的分组:

  1. library(tidyr)
  2. library(dplyr)
  3. example %>%
  4. group_by(gp = sub('.', '', sub)) %>%
  5. mutate(name = LETTERS[1:n()]) %>%
  6. pivot_wider(values_from = sub:p_express)
  7. gp sub_A sub_B express_p_A express_p_B p_express_A p_express_B
  8. <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
  9. 1 101 1101 2101 eg1 eg3 a c
  10. 2 102 1102 2102 eg2 eg4 b d
英文:

You have to define two columns before pivoting: one for the prefix you're adding to the new columns (A and B) and one for identifying the groups of rows:

  1. library(tidyr)
  2. library(dplyr)
  3. example %&gt;%
  4. group_by(gp = sub(&#39;.&#39;, &#39;&#39;, sub)) %&gt;%
  5. mutate(name = LETTERS[1:n()]) %&gt;%
  6. pivot_wider(values_from = sub:p_express)
  7. gp sub_A sub_B express_p_A express_p_B p_express_A p_express_B
  8. &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  9. 1 101 1101 2101 eg1 eg3 a c
  10. 2 102 1102 2102 eg2 eg4 b d

huangapple
  • 本文由 发表于 2023年6月1日 15:33:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76379620.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定