英文:
Rearranging a R dataframe (changing to wide format based on certain conditions, renaming and reshuffling columns)
问题
我有一个数据框,看起来像这样:
example <- data.frame(
date = c("6/1/22", "6/2/22", "6/3/22",
"6/1/22", "6/2/22", "6/3/22",
"6/1/22", "6/2/22", "6/3/22",
"6/1/22", "6/2/22", "6/3/22"),
sub = c(1101, 1101, 1101,
1102, 1102, 1102,
2101, 2101, 2101,
2102, 2102, 2102),
express_p = c("eg1", "eg2", "eg3", "eg4",
"eg5", "eg6", "eg7", "eg8",
"eg9", "eg10", "eg11", "eg12"),
p_express = c("a", "b", "c", "d",
"e", "f", "g", "h",
"i", "j", "k", "l")
)
我想将其转换为宽格式,并重新排列列的顺序。结果应如下所示:
example_clean <- data.frame(
date = c("6/1/22", "6/2/22", "6/3/22", "6/1/22", "6/2/22", "6/3/22"),
subA = c(1101, 1101, 1101, 1102, 1102, 1102),
subB = c(2101, 2101, 2101, 2102, 2102, 2102),
express_p_A = c("eg1", "eg2", "eg3", "eg7", "eg8", "eg9"),
p_express_B = c("d", "e", "f", "j", "k", "l"),
express_p_B = c("eg4", "eg5", "eg6", "eg10", "eg11", "eg12"),
p_express_A = c("a", "b", "c", "g", "h", "i")
)
基本上,我将具有相同最后3位数字的sub
中的所有数字配对在同一行上。然后,列的顺序也应重新排列(并重命名),以便一个主题的express_p
位于其对应伙伴的p_express
的右边(例如,1101的express_p
位于2101的p_express
的左边)。编辑:它还按日期进行分类。
有人知道如何优雅地做到这一点吗?
谢谢!
英文:
I have a dataframe that looks something like this:
example <- data.frame(
date = c("6/1/22", "6/2/22", "6/3/22",
"6/1/22", "6/2/22", "6/3/22",
"6/1/22", "6/2/22", "6/3/22",
"6/1/22", "6/2/22", "6/3/22"),
sub = c(1101, 1101, 1101,
1102, 1102, 1102,
2101, 2101, 2101,
2102, 2102, 2102),
express_p = c("eg1", "eg2", "eg3", "eg4",
"eg5", "eg6", "eg7", "eg8",
"eg9", "eg10", "eg11", "eg12"),
p_express = c("a", "b", "c", "d",
"e", "f", "g", "h",
"i", "j", "k", "l")
)
I want to make it into a wider format and also reshuffle the column order. This is how the end result should look like:
example_clean <- data.frame(
date = c("6/1/22", "6/2/22", "6/3/22", "6/1/22", "6/2/22", "6/3/22"),
subA = c(1101, 1101, 1101, 1102, 1102, 1102),
subB = c(2101, 2101, 2101, 2102, 2102, 2102),
express_p_A = c("eg1", "eg2", "eg3", "eg7", "eg8", "eg9"),
p_express_B = c("d", "e", "f", "j", "k", "l"),
express_p_B = c("eg4", "eg5", "eg6", "eg10", "eg11", "eg12"),
p_express_A = c("a", "b", "c", "g", "h", "i")
)
Essentially, I am pairing up all the numbers in sub
that have the same 3 last digits to be in the saw row. Then, the order of the columns should also be reshuffled (and renamed) such that express_p for one subject is right beside p_express of its corresponding partner (e.g. 1101's express_p is to the left of 2101's p_express). Edit: It is also categorized by date.
Does anyone know an elegant way to do this?
Thank you!
答案1
得分: 3
你需要在进行数据透视之前定义两列:一个用于添加到新列(A 和 B)的前缀,另一个用于识别行的分组:
library(tidyr)
library(dplyr)
example %>%
group_by(gp = sub('.', '', sub)) %>%
mutate(name = LETTERS[1:n()]) %>%
pivot_wider(values_from = sub:p_express)
gp sub_A sub_B express_p_A express_p_B p_express_A p_express_B
<chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 101 1101 2101 eg1 eg3 a c
2 102 1102 2102 eg2 eg4 b d
英文:
You have to define two columns before pivoting: one for the prefix you're adding to the new columns (A and B) and one for identifying the groups of rows:
library(tidyr)
library(dplyr)
example %>%
group_by(gp = sub('.', '', sub)) %>%
mutate(name = LETTERS[1:n()]) %>%
pivot_wider(values_from = sub:p_express)
gp sub_A sub_B express_p_A express_p_B p_express_A p_express_B
<chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 101 1101 2101 eg1 eg3 a c
2 102 1102 2102 eg2 eg4 b d
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论