英文:
R | How to arrange in custom order character vectors of a df column?
问题
我有一个看起来像这样的数据框:
| 水果 | X | Y | Z | 
|---|---|---|---|
| 苹果,香蕉,橙子,木瓜 | a | f | k | 
| 香蕉,橙子,葡萄 | b | g | l | 
| 橙子,香蕉 | c | h | m | 
| 葡萄 | d | i | n | 
| 香蕉,葡萄,橙子,苹果,木瓜 | e | j | o | 
我想在每一行中设置自定义的出现顺序,如下所示:
- 苹果
 - 橙子
 - 木瓜
 - 香蕉
 - 葡萄
 
因此,该列将如下所示:
| 水果 | X | Y | Z | 
|---|---|---|---|
| 苹果,橙子,木瓜,香蕉 | a | f | k | 
| 橙子,香蕉,葡萄 | b | g | l | 
| 橙子,香蕉 | c | h | m | 
| 葡萄 | d | i | n | 
| 苹果,橙子,木瓜,香蕉,葡萄 | e | j | o | 
我该如何做?我尝试了其他帖子中的建议,但它们都是关于排列数据框行,这不是我需要的...
附言:是否有办法在管道内完成这个操作?
英文:
I have a dataframe that looks like this:
| Fruit | X | Y | Z | 
|---|---|---|---|
| apple, banana, orange, papaya | a | f | k | 
| banana, orange, grape | b | g | l | 
| orange, banana | c | h | m | 
| grape | d | i | n | 
| banana, grape, orange, apple, papaya | e | j | o | 
And I want to set a custom order of appearance in each row. Like:
- Apple
 - Orange
 - Papaya
 - Banana
 - Grape
 
So the column would look like:
| Fruit | X | Y | Z | 
|---|---|---|---|
| apple, orange, papaya, banana | a | f | k | 
| orange, banana, grape | b | g | l | 
| orange, banana | c | h | m | 
| grape | d | i | n | 
| apple, orange, papaya, banana, grape | e | j | o | 
How can I do this??? I've tried suggestions from other posts, but they're all about arranging dataframe rows, which isn't what I need...
P.S.: is there any way to do this inside a pipe?
答案1
得分: 4
以下是您的代码的中文翻译:
首先,我们可以使用下面的代码来实现:
library(dplyr)
library(stringr)
library(purrr)
df1 <- df1 %>%
   mutate(Fruit = map_chr(strsplit(Fruit, ",\\s*"), 
        ~ toString(.x[order(match(.x,
  c("apple", "orange", "papaya", "banana", "grape")))])))
-输出
df1
                                  Fruit X Y Z
1        apple, orange, papaya, banana a f k
2                orange, banana, grape b g l
3                       orange, banana c h m
4                                grape d i n
5 apple, orange, papaya, banana, grape e j o
或者使用 separate_longer_delim:
library(tidyr)
df1 <- df1 %>%
  mutate(rn = row_number()) %>%
  separate_longer_delim(Fruit, delim = regex(",\\s*")) %>%
  arrange(rn, factor(Fruit, 
   levels = c("apple", "orange", "papaya", "banana", "grape"))) %>%
  reframe(Fruit = str_c(Fruit, collapse = ", "),
    .by = c("rn", "X", "Y", "Z")) %>%
  select(-rn) %>%
  relocate(Fruit, .before = 1)
-输出
df1
                                 Fruit X Y Z
1        apple, orange, papaya, banana a f k
2                orange, banana, grape b g l
3                       orange, banana c h m
4                                grape d i n
5 apple, orange, papaya, banana, grape e j o
如果列是 list 类型,我们可以不使用 strsplit,而是使用下面的代码:
df1 <- df1 %>%
   mutate(Fruit = map(Fruit, 
  ~ .x[order(match(.x, c("apple", "orange", "papaya", "banana", "grape")))]))
或者使用 unnest:
df1 <- df1 %>% 
  mutate(rn = row_number()) %>% 
  unnest(Fruit) %>% 
  arrange(rn, factor(Fruit, 
   levels = c("apple", "orange", "papaya", "banana", "grape"))) %>% 
  reframe(Fruit = list(Fruit),
    .by = c("rn", "X", "Y", "Z")) %>% 
  select(-rn) %>%
  relocate(Fruit, .before = 1)
-输出
df1
# A tibble: 5 × 4
  Fruit     X     Y     Z    
1 <chr [4]> a     f     k    
2 <chr [3]> b     g     l    
3 <chr [2]> c     h     m    
4 <chr [1]> d     i     n    
5 <chr [5]> e     j     o    
最后,这是您的数据:
df1 <- structure(list(Fruit = c("apple, banana, orange, papaya", "banana, orange, grape", 
"orange, banana", "grape", "banana, grape, orange, apple, papaya"
), X = c("a", "b", "c", "d", "e"), Y = c("f", "g", "h", "i", 
"j"), Z = c("k", "l", "m", "n", "o")), class = "data.frame", row.names = c(NA, 
-5L))
英文:
We could do
library(dplyr)
library(stringr)
library(purrr)
df1 <- df1 %>%
   mutate(Fruit = map_chr(strsplit(Fruit, ",\\s*"), 
        ~ toString(.x[order(match(.x,
  c("apple", "orange", "papaya", "banana", "grape")))])))
-output
df1
                                  Fruit X Y Z
1        apple, orange, papaya, banana a f k
2                orange, banana, grape b g l
3                       orange, banana c h m
4                                grape d i n
5 apple, orange, papaya, banana, grape e j o
Or using separate_longer_delim
library(tidyr)
df1 <- df1 %>%
  mutate(rn = row_number()) %>%
  separate_longer_delim(Fruit, delim = regex(",\\s*")) %>% 
  arrange(rn, factor(Fruit, 
   levels = c("apple", "orange", "papaya", "banana", "grape"))) %>% 
  reframe(Fruit = str_c(Fruit, collapse = ", "),
    .by = c("rn", "X", "Y", "Z")) %>% 
  select(-rn) %>%
  relocate(Fruit, .before = 1)
-output
df1
                                  Fruit X Y Z
1        apple, orange, papaya, banana a f k
2                orange, banana, grape b g l
3                       orange, banana c h m
4                                grape d i n
5 apple, orange, papaya, banana, grape e j o
If the column is list, we don't need the strsplit, instead
df1 <- df1 %>%
   mutate(Fruit = map(Fruit, 
  ~ .x[order(match(.x, c("apple", "orange", "papaya", "banana", "grape")))]))
Or with unnest
df1 <- df1 %>% 
  mutate(rn = row_number()) %>% 
  unnest(Fruit) %>% 
  arrange(rn, factor(Fruit, 
   levels = c("apple", "orange", "papaya", "banana", "grape"))) %>% 
  reframe(Fruit = list(Fruit),
    .by = c("rn", "X", "Y", "Z")) %>% 
  select(-rn) %>%
  relocate(Fruit, .before = 1)
-output
df1
# A tibble: 5 × 4
  Fruit     X     Y     Z    
  <list>    <chr> <chr> <chr>
1 <chr [4]> a     f     k    
2 <chr [3]> b     g     l    
3 <chr [2]> c     h     m    
4 <chr [1]> d     i     n    
5 <chr [5]> e     j     o    
data
df1 <- structure(list(Fruit = c("apple, banana, orange, papaya", "banana, orange, grape", 
"orange, banana", "grape", "banana, grape, orange, apple, papaya"
), X = c("a", "b", "c", "d", "e"), Y = c("f", "g", "h", "i", 
"j"), Z = c("k", "l", "m", "n", "o")), class = "data.frame", row.names = c(NA, 
-5L))
答案2
得分: 3
以下是翻译好的部分:
主要特点是使用 separate_rows,然后创建具有以下级别的 factor 类:
library(dplyr)
library(tidyr)
df %>%
  group_by(group = row_number()) %>%
  separate_rows(Fruit) %>%
  mutate(Fruit= factor(Fruit, levels = c("apple", "orange", "papaya", "banana", "grape"))) %>%
  arrange(Fruit, .by_group = TRUE) %>%
  summarise(Fruit = toString(Fruit)) %>%
  bind_cols(df[2:4]) %>%
  select(-group)
  Fruit                                X     Y     Z    
  <chr>                                <chr> <chr> <chr>
1 apple, orange, papaya, banana        a     f     k    
2 orange, banana, grape                b     g     l    
3 orange, banana                       c     h     m    
4 grape                                d     i     n    
5 apple, orange, papaya, banana, grape e     j     o    
英文:
Here is one more (a tidyverse solution):
Main feature is to use separate_rows and then create factor class with the levels:
library(dplyr)
library(tidyr)
df %>% 
  group_by(group = row_number()) %>% 
  separate_rows(Fruit) %>% 
  mutate(Fruit= factor(Fruit, levels = c("apple", "orange", "papaya", "banana", "grape"))) %>% 
  arrange(Fruit, .by_group = TRUE) %>% 
  summarise(Fruit = toString(Fruit)) %>% 
  bind_cols(df[2:4]) %>% 
  select(-group)
  Fruit                                X     Y     Z    
  <chr>                                <chr> <chr> <chr>
1 apple, orange, papaya, banana        a     f     k    
2 orange, banana, grape                b     g     l    
3 orange, banana                       c     h     m    
4 grape                                d     i     n    
5 apple, orange, papaya, banana, grape e     j     o    
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论