英文:
Rename duplicated column values group by another column
问题
我有一个数据集,其中相同的id出现在不同的组中:
df <- read.table(text='id group
1 A
2 A
2 A
1 B
1 B
2 B
2 C
2 C
1 C
2 D
1 D
1 D', header=TRUE)
我想要重命名在另一列group
中分组的重复值,位于列id
下。预期的输出是:
id group
1 A
2 A
2 A
1_2 B
1_2 B
2_2 B
2_3 C
2_3 C
1_3 C
2_4 D
1_4 D
1_4 D
我该如何做到这一点?
英文:
I have a dataset in which the same id appears in different groups:
df <- read.table(text='id group
1 A
2 A
2 A
1 B
1 B
2 B
2 C
2 C
1 C
2 D
1 D
1 D', header=TRUE)
I want to rename the duplicated values under column id
that are grouped by another column group
. The expected output is:
id group
1 A
2 A
2 A
1_2 B
1_2 B
2_2 B
2_3 C
2_3 C
1_3 C
2_4 D
1_4 D
1_4 D
How do I do that?
答案1
得分: 1
这里是使用data.table
的方法,利用rleid()
为每个唯一的id
和group
组合生成一个运行长度ID。然后,我们可以将该数字粘贴到现有的id
上,当它大于1时。
library(data.table)
setDT(df)
df[, id_num := rleid(group), id][
,
id := fifelse(
id_num == 1,
as.character(id),
paste(id, id_num, sep = "_")
)
][, `:=`(id_num = NULL)]
print(df)
# id group
# 1: 1 A
# 2: 2 A
# 3: 2 A
# 4: 1_2 B
# 5: 1_2 B
# 6: 2_2 B
# 7: 2_3 C
# 8: 2_3 C
# 9: 1_3 C
# 10: 2_4 D
# 11: 1_4 D
# 12: 1_4 D
请注意,上面的代码段是使用R语言编写的,并且其中的HTML编码(如<
和>
)已保留,以确保在HTML环境中正确显示。
英文:
Here is a data.table
approach using rleid()
to generate a run-length id for each unique id
and group
combination. We can then just paste()
that number to the existing id
, where it is >1
.
library(data.table)
setDT(df)
df[, id_num := rleid(group), id][
,
id := fifelse(
id_num == 1,
as.character(id),
paste(id, id_num, sep = "_")
)
][, `:=`(id_num = NULL)]
print(df)
# id group
# <char> <char>
# 1: 1 A
# 2: 2 A
# 3: 2 A
# 4: 1_2 B
# 5: 1_2 B
# 6: 2_2 B
# 7: 2_3 C
# 8: 2_3 C
# 9: 1_3 C
# 10: 2_4 D
# 11: 1_4 D
# 12: 1_4 D
答案2
得分: 1
感谢@SamR的data.table
答案。我能够将他/她的代码转换为使用Chatgpt的tidyverse
版本:
df %>%
mutate(id_num = data.table::rleid(group)) %>%
mutate(id = ifelse(id_num == 1, as.character(id), paste(id, id_num, sep = "_"))) %>%
select(-id_num)
英文:
Thanks to @SamR's data.table
answer. I was able to convert his/her code to a tidyverse
version using Chatgpt:
df %>%
mutate(id_num = data.table::rleid(group)) %>%
mutate(id = ifelse(id_num == 1, as.character(id), paste(id, id_num, sep = "_"))) %>%
select(-id_num)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论