重命名重复的列值,按另一列分组。

huangapple go评论101阅读模式
英文:

Rename duplicated column values group by another column

问题

我有一个数据集,其中相同的id出现在不同的组中:

  1. df <- read.table(text='id group
  2. 1 A
  3. 2 A
  4. 2 A
  5. 1 B
  6. 1 B
  7. 2 B
  8. 2 C
  9. 2 C
  10. 1 C
  11. 2 D
  12. 1 D
  13. 1 D', header=TRUE)

我想要重命名在另一列group中分组的重复值,位于列id下。预期的输出是:

  1. id group
  2. 1 A
  3. 2 A
  4. 2 A
  5. 1_2 B
  6. 1_2 B
  7. 2_2 B
  8. 2_3 C
  9. 2_3 C
  10. 1_3 C
  11. 2_4 D
  12. 1_4 D
  13. 1_4 D

我该如何做到这一点?

英文:

I have a dataset in which the same id appears in different groups:

  1. df &lt;- read.table(text=&#39;id group
  2. 1 A
  3. 2 A
  4. 2 A
  5. 1 B
  6. 1 B
  7. 2 B
  8. 2 C
  9. 2 C
  10. 1 C
  11. 2 D
  12. 1 D
  13. 1 D&#39;, header=TRUE)

I want to rename the duplicated values under column id that are grouped by another column group. The expected output is:

  1. id group
  2. 1 A
  3. 2 A
  4. 2 A
  5. 1_2 B
  6. 1_2 B
  7. 2_2 B
  8. 2_3 C
  9. 2_3 C
  10. 1_3 C
  11. 2_4 D
  12. 1_4 D
  13. 1_4 D

How do I do that?

答案1

得分: 1

这里是使用data.table的方法,利用rleid()为每个唯一的idgroup组合生成一个运行长度ID。然后,我们可以将该数字粘贴到现有的id上,当它大于1时。

  1. library(data.table)
  2. setDT(df)
  3. df[, id_num := rleid(group), id][
  4. ,
  5. id := fifelse(
  6. id_num == 1,
  7. as.character(id),
  8. paste(id, id_num, sep = "_")
  9. )
  10. ][, `:=`(id_num = NULL)]
  11. print(df)
  12. # id group
  13. # 1: 1 A
  14. # 2: 2 A
  15. # 3: 2 A
  16. # 4: 1_2 B
  17. # 5: 1_2 B
  18. # 6: 2_2 B
  19. # 7: 2_3 C
  20. # 8: 2_3 C
  21. # 9: 1_3 C
  22. # 10: 2_4 D
  23. # 11: 1_4 D
  24. # 12: 1_4 D

请注意,上面的代码段是使用R语言编写的,并且其中的HTML编码(如&lt;&gt;)已保留,以确保在HTML环境中正确显示。

英文:

Here is a data.table approach using rleid() to generate a run-length id for each unique id and group combination. We can then just paste() that number to the existing id, where it is &gt;1.

  1. library(data.table)
  2. setDT(df)
  3. df[, id_num := rleid(group), id][
  4. ,
  5. id := fifelse(
  6. id_num == 1,
  7. as.character(id),
  8. paste(id, id_num, sep = &quot;_&quot;)
  9. )
  10. ][, `:=`(id_num = NULL)]
  11. print(df)
  12. # id group
  13. # &lt;char&gt; &lt;char&gt;
  14. # 1: 1 A
  15. # 2: 2 A
  16. # 3: 2 A
  17. # 4: 1_2 B
  18. # 5: 1_2 B
  19. # 6: 2_2 B
  20. # 7: 2_3 C
  21. # 8: 2_3 C
  22. # 9: 1_3 C
  23. # 10: 2_4 D
  24. # 11: 1_4 D
  25. # 12: 1_4 D

答案2

得分: 1

感谢@SamR的data.table答案。我能够将他/她的代码转换为使用Chatgpt的tidyverse版本:

  1. df %>%
  2. mutate(id_num = data.table::rleid(group)) %>%
  3. mutate(id = ifelse(id_num == 1, as.character(id), paste(id, id_num, sep = "_"))) %>%
  4. select(-id_num)
英文:

Thanks to @SamR's data.table answer. I was able to convert his/her code to a tidyverse version using Chatgpt:

  1. df %&gt;%
  2. mutate(id_num = data.table::rleid(group)) %&gt;%
  3. mutate(id = ifelse(id_num == 1, as.character(id), paste(id, id_num, sep = &quot;_&quot;))) %&gt;%
  4. select(-id_num)

huangapple
  • 本文由 发表于 2023年7月5日 00:44:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76614547.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定