英文:
How to add column with altered ID-names for duplicates within same day
问题
我明白你的需求,你想要在数据框中添加一个名为"SampleID"的新列,以区分每个唯一样本。你可以使用以下代码来实现这一目标:
library(dplyr)
df <- df %>%
group_by(Days, ID, Collected) %>%
mutate(SampleID = paste(ID, row_number(), sep = "-"))
这将根据"Days"、"ID"和"Collected"分组,并使用row_number()
函数创建唯一的"SampleID"。这个代码片段将为每个唯一的样本分配一个唯一的"SampleID",如你所需。
英文:
I have a data frame with Days, ID, Date collected and a count value (number of hatched eggs) for several samples each day. The ID stems from the replicate (mother) from which the sample (a number of eggs) was taken, so it requires the information from the "Date collected" column in order to distinguish them as separate samples in for instance a plot.
I want to add a new column called sampleID in which I give each unique sample its own ID.
Example data:
d1 <- as.Date('2021-06-07')
d2 <- as.Date('2021-06-08')
d3 <- as.Date('2021-06-09')
df <- data.frame(Days = c(1,1,2,2,2,2,3,3,3,3,3),
ID = c(2,5,2,2,5,9,2,2,5,5,9),
Collected =c(d1,d1,d2,d1,d1,d2,d1,d2,d1,d3,d2))
I would like an output to look like:
Days | ID | Collected | SampleID | Count |
---|---|---|---|---|
1 | 2 | 2021-06-07 | 2-1 | 3 |
1 | 5 | 2021-06-07 | 5-1 | 5 |
2 | 2 | 2021-06-08 | 2-1 | 4 |
2 | 2 | 2021-06-07 | 2-2 | 1 |
2 | 5 | 2021-06-07 | 5-1 | 7 |
2 | 9 | 2021-06-08 | 9-1 | 2 |
3 | 2 | 2021-06-07 | 2-1 | 8 |
3 | 2 | 2021-06-08 | 2-2 | 5 |
3 | 5 | 2021-06-07 | 5-1 | 7 |
3 | 5 | 2021-06-09 | 5-2 | 2 |
3 | 9 | 2021-06-08 | 9-1 | 2 |
and I have been trying something like:
df <- df %>%
group_by(Days) %>%
mutate(ReplicateID = case_when(ID == ID & Collected != Collected ~ paste(as.character(ID)+"-1")))
Which doesn't work, but even if it did it would not be able to add -2 or -3 to ID's repeated more than once within the same day.. So I am kind of lost and would appreciate some help!
答案1
得分: 1
以下是代码的翻译部分:
library(dplyr)
d1 <- as.Date('2021-06-07')
d2 <- as.Date('2021-06-08')
d3 <- as.Date('2021-06-09')
df <- data.frame(Days = c(1,1,2,2,2,2,3,3,3,3,3),
ID = c(2,5,2,2,5,9,2,2,5,5,9),
Collected =c(d1,d1,d2,d1,d1,d2,d1,d2,d1,d3,d2))
df %>%
arrange(Days,ID,Collected) %>%
group_by(Days,ID) %>%
mutate(SampleID = paste(ID,row_number(),sep = '-'))
英文:
Maybe something like this?
library(dplyr)
d1 <- as.Date('2021-06-07')
d2 <- as.Date('2021-06-08')
d3 <- as.Date('2021-06-09')
df <- data.frame(Days = c(1,1,2,2,2,2,3,3,3,3,3),
ID = c(2,5,2,2,5,9,2,2,5,5,9),
Collected =c(d1,d1,d2,d1,d1,d2,d1,d2,d1,d3,d2))
df |>
arrange(Days,ID,Collected) |>
group_by(Days,ID) |>
mutate(SampleID = paste(ID,row_number(),sep = '-'))
答案2
得分: 1
使用 ave
结合 paste
和 seq_along
的一种基本方法。
df$SampleID <- ave(df$ID, df$ID, df$Days, FUN = function(x) paste(x, seq_along(x), sep = "_"))
在数据框 df
中,这将创建一个名为 SampleID
的新列,包含根据 ID
和 Days
组合生成的值,用下划线分隔。
结果如下:
# Days ID Collected SampleID
#1 1 2 1970-01-01 2_1
#2 1 5 1970-01-01 5_1
#3 2 2 1970-01-01 2_1
#4 2 2 1970-01-01 2_2
#5 2 5 1970-01-01 5_1
#6 2 9 1970-01-01 9_1
#7 3 2 1970-01-01 2_1
#8 3 2 1970-01-01 2_2
#9 3 5 1970-01-01 5_1
#10 3 5 1970-01-01 5_2
#11 3 9 1970-01-01 9_1
这段代码在数据框 df
中创建了一个新列 SampleID
,该列基于 ID
和 Days
列的组合生成唯一的值,用下划线分隔。
英文:
A base way using ave
with paste
and seq_along
.
df$SampleID <- ave(df$ID, df$ID, df$Days, FUN=\(x) paste(x, seq_along(x), sep="_"))
df
# Days ID Collected SampleID
#1 1 2 1970-01-01 2_1
#2 1 5 1970-01-01 5_1
#3 2 2 1970-01-01 2_1
#4 2 2 1970-01-01 2_2
#5 2 5 1970-01-01 5_1
#6 2 9 1970-01-01 9_1
#7 3 2 1970-01-01 2_1
#8 3 2 1970-01-01 2_2
#9 3 5 1970-01-01 5_1
#10 3 5 1970-01-01 5_2
#11 3 9 1970-01-01 9_1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论