2023年6月1日 15:33:24go评论89阅读模式

英文:

Rearranging a R dataframe (changing to wide format based on certain conditions, renaming and reshuffling columns)

问题

我有一个数据框，看起来像这样：

example <- data.frame(
  date = c("6/1/22", "6/2/22", "6/3/22",
           "6/1/22", "6/2/22", "6/3/22",
           "6/1/22", "6/2/22", "6/3/22",
           "6/1/22", "6/2/22", "6/3/22"),
  sub = c(1101, 1101, 1101, 
          1102, 1102, 1102,
          2101, 2101, 2101,
          2102, 2102, 2102),
  express_p = c("eg1", "eg2", "eg3", "eg4",
                "eg5", "eg6", "eg7", "eg8",
                "eg9", "eg10", "eg11", "eg12"),
  p_express = c("a", "b", "c", "d",
                "e", "f", "g", "h",
                "i", "j", "k", "l")
)

我想将其转换为宽格式，并重新排列列的顺序。结果应如下所示：

example_clean <- data.frame(
  date = c("6/1/22", "6/2/22", "6/3/22", "6/1/22", "6/2/22", "6/3/22"),
  subA = c(1101, 1101, 1101, 1102, 1102, 1102),
  subB = c(2101, 2101, 2101, 2102, 2102, 2102),
  express_p_A = c("eg1", "eg2", "eg3", "eg7", "eg8", "eg9"),
  p_express_B = c("d", "e", "f", "j", "k", "l"),
  express_p_B = c("eg4", "eg5", "eg6", "eg10", "eg11", "eg12"),
  p_express_A = c("a", "b", "c", "g", "h", "i")
)

基本上，我将具有相同最后3位数字的sub中的所有数字配对在同一行上。然后，列的顺序也应重新排列（并重命名），以便一个主题的express_p位于其对应伙伴的p_express的右边（例如，1101的express_p位于2101的p_express的左边）。编辑：它还按日期进行分类。

有人知道如何优雅地做到这一点吗？

谢谢！

英文:

I have a dataframe that looks something like this:

example &lt;- data.frame(
  date = c(&quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;,
          &quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;,
          &quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;,
          &quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;),
  sub = c(1101, 1101, 1101, 
          1102, 1102, 1102,
          2101, 2101, 2101,
          2102, 2102, 2102),
  express_p = c(&quot;eg1&quot;, &quot;eg2&quot;, &quot;eg3&quot;, &quot;eg4&quot;,
                &quot;eg5&quot;, &quot;eg6&quot;, &quot;eg7&quot;, &quot;eg8&quot;,
                &quot;eg9&quot;, &quot;eg10&quot;, &quot;eg11&quot;, &quot;eg12&quot;),
  p_express = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;,
                &quot;e&quot;, &quot;f&quot;, &quot;g&quot;, &quot;h&quot;,
                &quot;i&quot;, &quot;j&quot;, &quot;k&quot;, &quot;l&quot;)
)

I want to make it into a wider format and also reshuffle the column order. This is how the end result should look like:

example_clean &lt;- data.frame(
  date = c(&quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;, &quot;6/1/22&quot;, &quot;6/2/22&quot;, &quot;6/3/22&quot;),
  subA = c(1101, 1101, 1101, 1102, 1102, 1102),
  subB = c(2101, 2101, 2101, 2102, 2102, 2102),
  express_p_A = c(&quot;eg1&quot;, &quot;eg2&quot;, &quot;eg3&quot;, &quot;eg7&quot;, &quot;eg8&quot;, &quot;eg9&quot;),
  p_express_B = c(&quot;d&quot;, &quot;e&quot;, &quot;f&quot;, &quot;j&quot;, &quot;k&quot;, &quot;l&quot;),
  express_p_B = c(&quot;eg4&quot;, &quot;eg5&quot;, &quot;eg6&quot;, &quot;eg10&quot;, &quot;eg11&quot;, &quot;eg12&quot;),
  p_express_A = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;g&quot;, &quot;h&quot;, &quot;i&quot;)
)

Essentially, I am pairing up all the numbers in sub that have the same 3 last digits to be in the saw row. Then, the order of the columns should also be reshuffled (and renamed) such that express_p for one subject is right beside p_express of its corresponding partner (e.g. 1101's express_p is to the left of 2101's p_express). Edit: It is also categorized by date.

Does anyone know an elegant way to do this?

Thank you!

答案1

得分: 3

你需要在进行数据透视之前定义两列：一个用于添加到新列（A 和 B）的前缀，另一个用于识别行的分组：

library(tidyr)
library(dplyr)
example %>%
  group_by(gp = sub('.', '', sub)) %>%
  mutate(name = LETTERS[1:n()]) %>%
  pivot_wider(values_from = sub:p_express)
  gp    sub_A sub_B express_p_A express_p_B p_express_A p_express_B
  <chr> <dbl> <dbl> <chr>       <chr>       <chr>       <chr>
1 101   1101  2101 eg1         eg3         a           c
2 102   1102  2102 eg2         eg4         b           d

英文:

You have to define two columns before pivoting: one for the prefix you're adding to the new columns (A and B) and one for identifying the groups of rows:

library(tidyr)
library(dplyr)
example %&gt;% 
  group_by(gp = sub(&#39;.&#39;, &#39;&#39;, sub)) %&gt;% 
  mutate(name = LETTERS[1:n()]) %&gt;% 
  pivot_wider(values_from = sub:p_express)
  gp    sub_A sub_B express_p_A express_p_B p_express_A p_express_B
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;       &lt;chr&gt;       &lt;chr&gt;       &lt;chr&gt;      
1 101    1101  2101 eg1         eg3         a           c          
2 102    1102  2102 eg2         eg4         b           d

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重排 R 数据框架（根据特定条件更改为宽格式，重命名和重新排列列）

问题

答案1

如何在使用rmarkdown生成PDF输出时交叉引用列表中的kable元素？

无法遮蔽空间对象，即使将shapefile的CRS转换为与栅格的CRS匹配。

如何可能将QCA_mm类对象强制转换为表格？

我如下翻译：如何在R中计算变量在一段时间内更改之间的平均天数？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。