如何按自定义顺序排列数据框列中的字符向量?

huangapple go评论87阅读模式
英文:

R | How to arrange in custom order character vectors of a df column?

问题

我有一个看起来像这样的数据框:

水果 X Y Z
苹果,香蕉,橙子,木瓜 a f k
香蕉,橙子,葡萄 b g l
橙子,香蕉 c h m
葡萄 d i n
香蕉,葡萄,橙子,苹果,木瓜 e j o

我想在每一行中设置自定义的出现顺序,如下所示:

  1. 苹果
  2. 橙子
  3. 木瓜
  4. 香蕉
  5. 葡萄

因此,该列将如下所示:

水果 X Y Z
苹果,橙子,木瓜,香蕉 a f k
橙子,香蕉,葡萄 b g l
橙子,香蕉 c h m
葡萄 d i n
苹果,橙子,木瓜,香蕉,葡萄 e j o

我该如何做?我尝试了其他帖子中的建议,但它们都是关于排列数据框行,这不是我需要的...

附言:是否有办法在管道内完成这个操作?

英文:

I have a dataframe that looks like this:

Fruit X Y Z
apple, banana, orange, papaya a f k
banana, orange, grape b g l
orange, banana c h m
grape d i n
banana, grape, orange, apple, papaya e j o

And I want to set a custom order of appearance in each row. Like:

  1. Apple
  2. Orange
  3. Papaya
  4. Banana
  5. Grape

So the column would look like:

Fruit X Y Z
apple, orange, papaya, banana a f k
orange, banana, grape b g l
orange, banana c h m
grape d i n
apple, orange, papaya, banana, grape e j o

How can I do this??? I've tried suggestions from other posts, but they're all about arranging dataframe rows, which isn't what I need...

P.S.: is there any way to do this inside a pipe?

答案1

得分: 4

以下是您的代码的中文翻译:

首先,我们可以使用下面的代码来实现:

  1. library(dplyr)
  2. library(stringr)
  3. library(purrr)
  4. df1 <- df1 %>%
  5. mutate(Fruit = map_chr(strsplit(Fruit, ",\\s*"),
  6. ~ toString(.x[order(match(.x,
  7. c("apple", "orange", "papaya", "banana", "grape")))])))

-输出

  1. df1
  2. Fruit X Y Z
  3. 1 apple, orange, papaya, banana a f k
  4. 2 orange, banana, grape b g l
  5. 3 orange, banana c h m
  6. 4 grape d i n
  7. 5 apple, orange, papaya, banana, grape e j o

或者使用 separate_longer_delim

  1. library(tidyr)
  2. df1 <- df1 %>%
  3. mutate(rn = row_number()) %>%
  4. separate_longer_delim(Fruit, delim = regex(",\\s*")) %>%
  5. arrange(rn, factor(Fruit,
  6. levels = c("apple", "orange", "papaya", "banana", "grape"))) %>%
  7. reframe(Fruit = str_c(Fruit, collapse = ", "),
  8. .by = c("rn", "X", "Y", "Z")) %>%
  9. select(-rn) %>%
  10. relocate(Fruit, .before = 1)

-输出

  1. df1
  2. Fruit X Y Z
  3. 1 apple, orange, papaya, banana a f k
  4. 2 orange, banana, grape b g l
  5. 3 orange, banana c h m
  6. 4 grape d i n
  7. 5 apple, orange, papaya, banana, grape e j o

如果列是 list 类型,我们可以不使用 strsplit,而是使用下面的代码:

  1. df1 <- df1 %>%
  2. mutate(Fruit = map(Fruit,
  3. ~ .x[order(match(.x, c("apple", "orange", "papaya", "banana", "grape")))]))

或者使用 unnest

  1. df1 <- df1 %>%
  2. mutate(rn = row_number()) %>%
  3. unnest(Fruit) %>%
  4. arrange(rn, factor(Fruit,
  5. levels = c("apple", "orange", "papaya", "banana", "grape"))) %>%
  6. reframe(Fruit = list(Fruit),
  7. .by = c("rn", "X", "Y", "Z")) %>%
  8. select(-rn) %>%
  9. relocate(Fruit, .before = 1)

-输出

  1. df1
  2. # A tibble: 5 × 4
  3. Fruit X Y Z
  4. 1 <chr [4]> a f k
  5. 2 <chr [3]> b g l
  6. 3 <chr [2]> c h m
  7. 4 <chr [1]> d i n
  8. 5 <chr [5]> e j o

最后,这是您的数据:

  1. df1 <- structure(list(Fruit = c("apple, banana, orange, papaya", "banana, orange, grape",
  2. "orange, banana", "grape", "banana, grape, orange, apple, papaya"
  3. ), X = c("a", "b", "c", "d", "e"), Y = c("f", "g", "h", "i",
  4. "j"), Z = c("k", "l", "m", "n", "o")), class = "data.frame", row.names = c(NA,
  5. -5L))
英文:

We could do

  1. library(dplyr)
  2. library(stringr)
  3. library(purrr)
  4. df1 &lt;- df1 %&gt;%
  5. mutate(Fruit = map_chr(strsplit(Fruit, &quot;,\\s*&quot;),
  6. ~ toString(.x[order(match(.x,
  7. c(&quot;apple&quot;, &quot;orange&quot;, &quot;papaya&quot;, &quot;banana&quot;, &quot;grape&quot;)))])))

-output

  1. df1
  2. Fruit X Y Z
  3. 1 apple, orange, papaya, banana a f k
  4. 2 orange, banana, grape b g l
  5. 3 orange, banana c h m
  6. 4 grape d i n
  7. 5 apple, orange, papaya, banana, grape e j o

Or using separate_longer_delim

  1. library(tidyr)
  2. df1 &lt;- df1 %&gt;%
  3. mutate(rn = row_number()) %&gt;%
  4. separate_longer_delim(Fruit, delim = regex(&quot;,\\s*&quot;)) %&gt;%
  5. arrange(rn, factor(Fruit,
  6. levels = c(&quot;apple&quot;, &quot;orange&quot;, &quot;papaya&quot;, &quot;banana&quot;, &quot;grape&quot;))) %&gt;%
  7. reframe(Fruit = str_c(Fruit, collapse = &quot;, &quot;),
  8. .by = c(&quot;rn&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot;)) %&gt;%
  9. select(-rn) %&gt;%
  10. relocate(Fruit, .before = 1)

-output

  1. df1
  2. Fruit X Y Z
  3. 1 apple, orange, papaya, banana a f k
  4. 2 orange, banana, grape b g l
  5. 3 orange, banana c h m
  6. 4 grape d i n
  7. 5 apple, orange, papaya, banana, grape e j o

If the column is list, we don't need the strsplit, instead

  1. df1 &lt;- df1 %&gt;%
  2. mutate(Fruit = map(Fruit,
  3. ~ .x[order(match(.x, c(&quot;apple&quot;, &quot;orange&quot;, &quot;papaya&quot;, &quot;banana&quot;, &quot;grape&quot;)))]))

Or with unnest

  1. df1 &lt;- df1 %&gt;%
  2. mutate(rn = row_number()) %&gt;%
  3. unnest(Fruit) %&gt;%
  4. arrange(rn, factor(Fruit,
  5. levels = c(&quot;apple&quot;, &quot;orange&quot;, &quot;papaya&quot;, &quot;banana&quot;, &quot;grape&quot;))) %&gt;%
  6. reframe(Fruit = list(Fruit),
  7. .by = c(&quot;rn&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot;)) %&gt;%
  8. select(-rn) %&gt;%
  9. relocate(Fruit, .before = 1)

-output

  1. df1
  2. # A tibble: 5 &#215; 4
  3. Fruit X Y Z
  4. &lt;list&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  5. 1 &lt;chr [4]&gt; a f k
  6. 2 &lt;chr [3]&gt; b g l
  7. 3 &lt;chr [2]&gt; c h m
  8. 4 &lt;chr [1]&gt; d i n
  9. 5 &lt;chr [5]&gt; e j o

data

  1. df1 &lt;- structure(list(Fruit = c(&quot;apple, banana, orange, papaya&quot;, &quot;banana, orange, grape&quot;,
  2. &quot;orange, banana&quot;, &quot;grape&quot;, &quot;banana, grape, orange, apple, papaya&quot;
  3. ), X = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;, &quot;e&quot;), Y = c(&quot;f&quot;, &quot;g&quot;, &quot;h&quot;, &quot;i&quot;,
  4. &quot;j&quot;), Z = c(&quot;k&quot;, &quot;l&quot;, &quot;m&quot;, &quot;n&quot;, &quot;o&quot;)), class = &quot;data.frame&quot;, row.names = c(NA,
  5. -5L))

答案2

得分: 3

以下是翻译好的部分:

主要特点是使用 separate_rows,然后创建具有以下级别的 factor 类:

  1. library(dplyr)
  2. library(tidyr)
  3. df %>%
  4. group_by(group = row_number()) %>%
  5. separate_rows(Fruit) %>%
  6. mutate(Fruit= factor(Fruit, levels = c("apple", "orange", "papaya", "banana", "grape"))) %>%
  7. arrange(Fruit, .by_group = TRUE) %>%
  8. summarise(Fruit = toString(Fruit)) %>%
  9. bind_cols(df[2:4]) %>%
  10. select(-group)
  1. Fruit X Y Z
  2. <chr> <chr> <chr> <chr>
  3. 1 apple, orange, papaya, banana a f k
  4. 2 orange, banana, grape b g l
  5. 3 orange, banana c h m
  6. 4 grape d i n
  7. 5 apple, orange, papaya, banana, grape e j o
英文:

Here is one more (a tidyverse solution):

Main feature is to use separate_rows and then create factor class with the levels:

  1. library(dplyr)
  2. library(tidyr)
  3. df %&gt;%
  4. group_by(group = row_number()) %&gt;%
  5. separate_rows(Fruit) %&gt;%
  6. mutate(Fruit= factor(Fruit, levels = c(&quot;apple&quot;, &quot;orange&quot;, &quot;papaya&quot;, &quot;banana&quot;, &quot;grape&quot;))) %&gt;%
  7. arrange(Fruit, .by_group = TRUE) %&gt;%
  8. summarise(Fruit = toString(Fruit)) %&gt;%
  9. bind_cols(df[2:4]) %&gt;%
  10. select(-group)
  1. Fruit X Y Z
  2. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  3. 1 apple, orange, papaya, banana a f k
  4. 2 orange, banana, grape b g l
  5. 3 orange, banana c h m
  6. 4 grape d i n
  7. 5 apple, orange, papaya, banana, grape e j o

huangapple
  • 本文由 发表于 2023年2月19日 05:12:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75496426.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定