2023年3月20日 22:42:30go评论100阅读模式

英文:

How to add column with altered ID-names for duplicates within same day

问题

我明白你的需求，你想要在数据框中添加一个名为"SampleID"的新列，以区分每个唯一样本。你可以使用以下代码来实现这一目标：

library(dplyr)
df <- df %>%
  group_by(Days, ID, Collected) %>%
  mutate(SampleID = paste(ID, row_number(), sep = "-"))

这将根据"Days"、"ID"和"Collected"分组，并使用row_number()函数创建唯一的"SampleID"。这个代码片段将为每个唯一的样本分配一个唯一的"SampleID"，如你所需。

英文:

I have a data frame with Days, ID, Date collected and a count value (number of hatched eggs) for several samples each day. The ID stems from the replicate (mother) from which the sample (a number of eggs) was taken, so it requires the information from the "Date collected" column in order to distinguish them as separate samples in for instance a plot.

I want to add a new column called sampleID in which I give each unique sample its own ID.

Example data:

d1 &lt;- as.Date(&#39;2021-06-07&#39;)
d2 &lt;- as.Date(&#39;2021-06-08&#39;)
d3 &lt;- as.Date(&#39;2021-06-09&#39;)

df &lt;- data.frame(Days = c(1,1,2,2,2,2,3,3,3,3,3),
ID = c(2,5,2,2,5,9,2,2,5,5,9),
Collected =c(d1,d1,d2,d1,d1,d2,d1,d2,d1,d3,d2))

I would like an output to look like:

Days	ID	Collected	SampleID	Count
1	2	2021-06-07	2-1	3
1	5	2021-06-07	5-1	5
2	2	2021-06-08	2-1	4
2	2	2021-06-07	2-2	1
2	5	2021-06-07	5-1	7
2	9	2021-06-08	9-1	2
3	2	2021-06-07	2-1	8
3	2	2021-06-08	2-2	5
3	5	2021-06-07	5-1	7
3	5	2021-06-09	5-2	2
3	9	2021-06-08	9-1	2

and I have been trying something like:

df &lt;- df %&gt;% 
group_by(Days) %&gt;% 
mutate(ReplicateID = case_when(ID == ID &amp; Collected != Collected ~ paste(as.character(ID)+&quot;-1&quot;)))

Which doesn't work, but even if it did it would not be able to add -2 or -3 to ID's repeated more than once within the same day.. So I am kind of lost and would appreciate some help!

答案1

得分: 1

以下是代码的翻译部分：

library(dplyr)
d1 <- as.Date('2021-06-07')
d2 <- as.Date('2021-06-08')
d3 <- as.Date('2021-06-09')
df <- data.frame(Days = c(1,1,2,2,2,2,3,3,3,3,3),
                 ID = c(2,5,2,2,5,9,2,2,5,5,9),
                 Collected =c(d1,d1,d2,d1,d1,d2,d1,d2,d1,d3,d2))
df %>%
  arrange(Days,ID,Collected) %>%
  group_by(Days,ID) %>%
  mutate(SampleID = paste(ID,row_number(),sep = '-'))

英文:

Maybe something like this?

library(dplyr)
d1 &lt;- as.Date(&#39;2021-06-07&#39;)
d2 &lt;- as.Date(&#39;2021-06-08&#39;)
d3 &lt;- as.Date(&#39;2021-06-09&#39;)
df &lt;- data.frame(Days = c(1,1,2,2,2,2,3,3,3,3,3),
                 ID = c(2,5,2,2,5,9,2,2,5,5,9),
                 Collected =c(d1,d1,d2,d1,d1,d2,d1,d2,d1,d3,d2))
df |&gt;
  arrange(Days,ID,Collected) |&gt;
  group_by(Days,ID) |&gt;
  mutate(SampleID = paste(ID,row_number(),sep = &#39;-&#39;))

答案2

得分: 1

使用 ave 结合 paste 和 seq_along 的一种基本方法。

df$SampleID <- ave(df$ID, df$ID, df$Days, FUN = function(x) paste(x, seq_along(x), sep = "_"))

在数据框 df 中，这将创建一个名为 SampleID 的新列，包含根据 ID 和 Days 组合生成的值，用下划线分隔。

结果如下：

#   Days ID  Collected SampleID
#1     1  2 1970-01-01      2_1
#2     1  5 1970-01-01      5_1
#3     2  2 1970-01-01      2_1
#4     2  2 1970-01-01      2_2
#5     2  5 1970-01-01      5_1
#6     2  9 1970-01-01      9_1
#7     3  2 1970-01-01      2_1
#8     3  2 1970-01-01      2_2
#9     3  5 1970-01-01      5_1
#10    3  5 1970-01-01      5_2
#11    3  9 1970-01-01      9_1

这段代码在数据框 df 中创建了一个新列 SampleID，该列基于 ID 和 Days 列的组合生成唯一的值，用下划线分隔。

英文:

A base way using ave with paste and seq_along.

df$SampleID &lt;- ave(df$ID, df$ID, df$Days, FUN=\(x) paste(x, seq_along(x), sep=&quot;_&quot;))
df
#   Days ID  Collected SampleID
#1     1  2 1970-01-01      2_1
#2     1  5 1970-01-01      5_1
#3     2  2 1970-01-01      2_1
#4     2  2 1970-01-01      2_2
#5     2  5 1970-01-01      5_1
#6     2  9 1970-01-01      9_1
#7     3  2 1970-01-01      2_1
#8     3  2 1970-01-01      2_2
#9     3  5 1970-01-01      5_1
#10    3  5 1970-01-01      5_2
#11    3  9 1970-01-01      9_1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在同一天内对重复项添加具有更改后ID名称的列

问题

答案1

答案2

plotly 在从 ggplot 转换时删除了分组图例（按颜色、按符号）。

获取一个URL中<head>标签中的<title>。

无错误，但使用R进行网页抓取时导致空数据框。

Pandas从嵌套记录列表创建多级索引

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。