2023年3月9日 13:25:34go评论171阅读模式

英文:

How to count unique genes at each timepoint across a time-course analysis?

问题

我有时间序列数据，每个时间点有一列基因。

我需要一种快速的方法来计算每个后续时间点的唯一（新颖）基因数量。

例如，如果我比较时间点3和时间点2，那么时间点3中有哪些新颖基因。然后，对于时间点4，与时间点2和3相比有哪些新颖基因。依此类推。

我有14个时间点和多个数据集，因此需要一种高效的方式来计算每个时间点有多少个新颖基因。

以下是数据的一小部分示例：

（原文中的数据部分未提供中文翻译，仅提供数据的描述。）

我尝试在Excel中手动完成这项任务，但这个过程耗时且容易出现人为错误。

非常感谢任何帮助。

英文:

I have time-course data with a list of genes(rows) at each timepoint (cols).

I need a quick way to count unique(novel) genes at each subsequent timepoint.

For example, if I compare timepoint 3 to timepoint 2, which genes are novel in timepoint 3. Then, for timepoint 4, which genes are novel compared to timepoints 2 and 3. And so on.
I have 14 timepoints and multiple datasets, so need an efficient way to calculate how many genes are novel at each timepoint.

This is a tiny sample of the data:

    X1           X2           X3           X4
1   LOC115711925 LOC115694843 LOC115696797 LOC115721738
2   LOC115697141 LOC115695410 LOC115705991 LOC115698757
3   LOC115695663 LOC115695505 LOC115720646 LOC115704937
4   LOC115697811 LOC115695663 LOC115709480 LOC115724472
5   LOC115710226 LOC115695751 LOC115707388 LOC115702544
6   LOC115699430 LOC115695753 LOC115711243 LOC115705803
7   LOC115719329 LOC115695880 LOC115701282 LOC115711243
8   LOC115709251 LOC115695882 LOC115695751 LOC115698778
9   LOC115716776 LOC115695990 LOC115698262 LOC115707330
10  LOC115707556 LOC115696236 LOC115715294 LOC115718803
11  LOC115717016 LOC115696976 LOC115720841 LOC115720837
12  LOC115703186 LOC115696984 LOC115698132 LOC115719149
13  LOC115715930 LOC115696989 LOC115702328 LOC115712227
14  LOC115719149 LOC115697003 LOC115720788 LOC115724518
15  LOC115694843 LOC115697717 LOC115712291 LOC115701008
16  LOC115702383 LOC115697737 LOC115717255 LOC115700185
17  LOC115718171 LOC115697757 LOC115720540 LOC115699220
18  LOC115716727 LOC115697813 LOC115709300 LOC115707967
19  LOC115721947 LOC115697989 LOC115710741 LOC115705222
20  LOC115707802 LOC115698069 LOC115699007 LOC115716814
21  LOC115707848 LOC115698103 LOC115718118 LOC115712507

I have tried to do this manually in excel, but the process is time consuming and prone to human error.
Very thankful for any help.

答案1

得分: 2

在基本的R中，你可以这样做：

aggregate(.~ind, subset(stack(df), !duplicated(values)), length)
  ind values
1  X1     21
2  X2     19
3  X3     18
4  X4     17

如果你不想考虑 X1，那么可以这样做：

aggregate(.~ind, subset(stack(df, -1), !duplicated(values)), length)
  ind values
1  X2     21
2  X3     18
3  X4     18

英文:

In base R you could do:

aggregate(.~ind, subset(stack(df), !duplicated(values)), length)
  ind values
1  X1     21
2  X2     19
3  X3     18
4  X4     17

If you do not want to take into consideration X1 then you do:

aggregate(.~ind, subset(stack(df, -1), !duplicated(values)), length)
  ind values
1  X2     21
2  X3     18
3  X4     18

答案2

得分: 0

我已经在数据框中添加了一些重复项（原始数据框输出）如下：

X2 X3 X4
19 20 20

我们可以使用purrr的map2和map_int与setdiff结合使用：

library(purrr)
library(dplyr)

map2(df[-1], df[-ncol(df)], setdiff) %>%
  map_int(., length)

输出结果：

X2 X3 X4
19 18 18

修改后的数据：

df <- structure(list(X1 = c("LOC115711925", "LOC115697141", "LOC115695663", 
"LOC115697811", "LOC115710226", "LOC115699430", "LOC115719329", 
"LOC115709251", "LOC115716776", "LOC115707556", "LOC115717016", 
"LOC115703186", "LOC115715930", "LOC115719149", "LOC115694843", 
"LOC115702383", "LOC115718171", "LOC115716727", "LOC115721947", 
"LOC115707802", "LOC115707848"), X2 = c("LOC115711925", "LOC115695410", 
"LOC115695505", "LOC115695663", "LOC115695751", "LOC115695753", 
"LOC115695880", "LOC115695882", "LOC115695990", "LOC115696236", 
"LOC115696976", "LOC115696984", "LOC115696989", "LOC115697003", 
"LOC115697717", "LOC115697737", "LOC115697757", "LOC115697813", 
"LOC115697989", "LOC115698069", "LOC115698103"), X3 = c("LOC115696797", 
"LOC115705991", "LOC115720646", "LOC115709480", "LOC115707388", 
"LOC115711243", "LOC115711925", "LOC115695751", "LOC115698262", 
"LOC115711925", "LOC115720841", "LOC115698132", "LOC115702328", 
"LOC115720788", "LOC115712291", "LOC115717255", "LOC115720540", 
"LOC115709300", "LOC115710741", "LOC115699007", "LOC115718118"
), X4 = c("LOC115721738", "LOC115698757", "LOC115704937", "LOC115724472", 
"LOC115702544", "LOC115705803", "LOC115711243", "LOC115698778", 
"LOC115707330", "LOC115718803", "LOC115711925", "LOC115719149", 
"LOC115712227", "LOC115711925", "LOC115701008", "LOC115700185", 
"LOC115699220", "LOC115707967", "LOC115705222", "LOC115716814", 
"LOC115712507")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
"15", "16", "17", "18", "19", "20", "21"))

注意：这是代码的翻译和数据的重现，不包括代码部分。

英文:

I have added some duplicates to the data frame (original data frame output) would have been:

X2 X3 X4 
19 20 20

We could use purrr map2 and map_int combined with setdiff

library(purrr)
library(dplyr)

map2(df[-1], df[-ncol(df)], setdiff) %&gt;% 
  map_int(., length)

output:

X2 X3 X4 
19 18 18

modifed data:

df &lt;- structure(list(X1 = c(&quot;LOC115711925&quot;, &quot;LOC115697141&quot;, &quot;LOC115695663&quot;, 
&quot;LOC115697811&quot;, &quot;LOC115710226&quot;, &quot;LOC115699430&quot;, &quot;LOC115719329&quot;, 
&quot;LOC115709251&quot;, &quot;LOC115716776&quot;, &quot;LOC115707556&quot;, &quot;LOC115717016&quot;, 
&quot;LOC115703186&quot;, &quot;LOC115715930&quot;, &quot;LOC115719149&quot;, &quot;LOC115694843&quot;, 
&quot;LOC115702383&quot;, &quot;LOC115718171&quot;, &quot;LOC115716727&quot;, &quot;LOC115721947&quot;, 
&quot;LOC115707802&quot;, &quot;LOC115707848&quot;), X2 = c(&quot;LOC115711925&quot;, &quot;LOC115695410&quot;, 
&quot;LOC115695505&quot;, &quot;LOC115695663&quot;, &quot;LOC115695751&quot;, &quot;LOC115695753&quot;, 
&quot;LOC115695880&quot;, &quot;LOC115695882&quot;, &quot;LOC115695990&quot;, &quot;LOC115696236&quot;, 
&quot;LOC115696976&quot;, &quot;LOC115696984&quot;, &quot;LOC115696989&quot;, &quot;LOC115697003&quot;, 
&quot;LOC115697717&quot;, &quot;LOC115697737&quot;, &quot;LOC115697757&quot;, &quot;LOC115697813&quot;, 
&quot;LOC115697989&quot;, &quot;LOC115698069&quot;, &quot;LOC115698103&quot;), X3 = c(&quot;LOC115696797&quot;, 
&quot;LOC115705991&quot;, &quot;LOC115720646&quot;, &quot;LOC115709480&quot;, &quot;LOC115707388&quot;, 
&quot;LOC115711243&quot;, &quot;LOC115711925&quot;, &quot;LOC115695751&quot;, &quot;LOC115698262&quot;, 
&quot;LOC115711925&quot;, &quot;LOC115720841&quot;, &quot;LOC115698132&quot;, &quot;LOC115702328&quot;, 
&quot;LOC115720788&quot;, &quot;LOC115712291&quot;, &quot;LOC115717255&quot;, &quot;LOC115720540&quot;, 
&quot;LOC115709300&quot;, &quot;LOC115710741&quot;, &quot;LOC115699007&quot;, &quot;LOC115718118&quot;
), X4 = c(&quot;LOC115721738&quot;, &quot;LOC115698757&quot;, &quot;LOC115704937&quot;, &quot;LOC115724472&quot;, 
&quot;LOC115702544&quot;, &quot;LOC115705803&quot;, &quot;LOC115711243&quot;, &quot;LOC115698778&quot;, 
&quot;LOC115707330&quot;, &quot;LOC115718803&quot;, &quot;LOC115711925&quot;, &quot;LOC115719149&quot;, 
&quot;LOC115712227&quot;, &quot;LOC115711925&quot;, &quot;LOC115701008&quot;, &quot;LOC115700185&quot;, 
&quot;LOC115699220&quot;, &quot;LOC115707967&quot;, &quot;LOC115705222&quot;, &quot;LOC115716814&quot;, 
&quot;LOC115712507&quot;)), class = &quot;data.frame&quot;, row.names = c(&quot;1&quot;, &quot;2&quot;, 
&quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;, &quot;7&quot;, &quot;8&quot;, &quot;9&quot;, &quot;10&quot;, &quot;11&quot;, &quot;12&quot;, &quot;13&quot;, &quot;14&quot;, 
&quot;15&quot;, &quot;16&quot;, &quot;17&quot;, &quot;18&quot;, &quot;19&quot;, &quot;20&quot;, &quot;21&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在时间序列分析中计算每个时间点的唯一基因数？

问题

答案1

答案2

使用`svycontrast`在函数内部时，当对比涉及到反引号和`I()`时。

如何将”Clean Code”原则应用于R，以及一些适用于R的替代方法是什么？

使用前一行数值的滚动函数 [R]

获取子集函数中的变量值 – R

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论