2023年2月27日 16:51:52go评论96阅读模式

英文:

Find rows in common between columns belonging to specific groups using R

问题

# Your code here

英文:

I have a data frame of 8 columns and 6 rows. All rows are numeric and contains positive and negative values.

I have another file with 8 columns and associated group name (A, B, C or D) for each column. The first 2 columns are group A, the following 2 columns are group B, the next 2 columns are group C, and the last 2 columns are group D.

I want a code in R that allows me to calculate for each column, how many of the rows with values above 0 (X > 0) in a column have also values above 0 (x > 0) in the other columns. And the output should be grouped based on the four groups. Something like this:

Data file:

	S1  S2  S3  S4  S5  S6  S7  S8
R1  2   1   2  -1  -3   5   4  -3 
R2  4  -6   1   2   1   2   1   5
R3  3   2  -3  -9  -5  -1   4   9
R4  4  -4  -4  -6   4  -7   6   6
R5  6  -5   2   2  -7  -6   7  -6
R6  4   4  -3   3  -2   3  -4   2

Group file:

S1        S2       S3       S4       S5      S6      S7     S8
GroupA   GroupA  GroupB   GroupB   GroupC  GroupC  GroupD  GroupD

Expected output file

	       S1	S2   S3	 S4  S5  S6  S7  S8
Group A    6    3    3   3   2   3   5   4
Group B    4    2    3   3   1   3   3   2
Group C    4    1    2   2   2   3   3   3
Group D    6    3    3   3   2   3   5   5

Explanation for the values obtained in the expected output file:

Example 1: S1 and GroupB

The value obtained is 4, this is because S1 has values greater than 0 in all 6 rows, while R1, R2, R5 and R6 are greater than 0 in at least one of the samples of group B (S3 and S4).

Example 2: S3 and GroupD

The value obtained is 3, this is because S3 has values greater than 0 in R1, R2 and R5, and the rows are also greater than 0 in at least one of the samples of group D (S7 and S8).

答案1

得分: 1

希望我理解你的意思是正确的。

df <- read.table(
  text = "    S1  S2  S3  S4  S5  S6  S7  S8
R1  2   1   2  -1  -3   5   4  -3 
R2  4  -6   1   2   1   2   1   5
R3  3   2  -3  -9  -5  -1   4   9
R4  4  -4  -4  -6   4  -7   6   6
R5  6  -5   2   2  -7  -6   7  -6
R6  4   4  -3   3  -2   3  -4   2"
)
groupings_new <- matrix(c(rep("Group A", 2), rep("Group B", 2), rep("Group C", 2), rep("Group D", 2)), byrow = TRUE, nrow = 1)
colnames(groupings_new) <- paste0("S", 1:8)
# 获取大于0的值的TRUE/FALSE
info_df <- df > 0
res <- lapply(seq_len(ncol(info_df)), function(i) {
  grouping_info <- groupings_new[, , drop = TRUE]
  
  # 检查矩阵和兴趣列中的值是否都大于0
  compare_df <- info_df & matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
  
  # 按组拆分
  res <- lapply(unique(grouping_info), function(one_group) {
    group_index <- grouping_info == one_group
    # 检查哪些行是感兴趣的（值大于0），以及有多少个
    sum(rowSums(compare_df[, group_index, drop = FALSE]) > 0)
  })
  res_clean <- data.frame(unlist(res))
  colnames(res_clean) <- colnames(info_df[, i, drop = FALSE])
  rownames(res_clean) <- unique(grouping_info)
  res_clean
})
do.call(cbind, res)
#>         S1 S2 S3 S4 S5 S6 S7 S8
#> Group A  6  3  3  3  2  3  5  4
#> Group B  4  2  3  3  1  3  3  2
#> Group C  4  2  2  2  2  3  3  3
#> Group D  6  3  3  3  2  3  5  4

英文:

I hope I understood correctly what you want to achieve.

df &lt;- read.table(
  text = &quot;    S1  S2  S3  S4  S5  S6  S7  S8
R1  2   1   2  -1  -3   5   4  -3 
R2  4  -6   1   2   1   2   1   5
R3  3   2  -3  -9  -5  -1   4   9
R4  4  -4  -4  -6   4  -7   6   6
R5  6  -5   2   2  -7  -6   7  -6
R6  4   4  -3   3  -2   3  -4   2&quot;
)
groupings_new &lt;- matrix(c(rep(&quot;Group A&quot;, 2), rep(&quot;Group B&quot;, 2), rep(&quot;Group C&quot;, 2), rep(&quot;Group D&quot;, 2)), byrow = TRUE, nrow = 1)
colnames(groupings_new) &lt;- paste0(&quot;S&quot;, 1:8)
# get TRUE/FALSE if a value is greater than 0
info_df &lt;- df &gt; 0
res &lt;- lapply(seq_len(ncol(info_df)), function(i) {
  grouping_info &lt;- groupings_new[, , drop = TRUE]
  
  # check if a value in the matrix and in the column of interest are both greater
  # than 0
  compare_df &lt;- info_df &amp; matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
  
  # split by groups
  res &lt;- lapply(unique(grouping_info), function(one_group) {
    group_index &lt;- grouping_info == one_group
    # check which rows are of interest (values greater than 0) and how many are
    # there
    sum(rowSums(compare_df[, group_index, drop = FALSE]) &gt; 0)
  })
  res_clean &lt;- data.frame(unlist(res))
  colnames(res_clean) &lt;- colnames(info_df[, i, drop = FALSE])
  rownames(res_clean) &lt;- unique(grouping_info)
  res_clean
})
do.call(cbind, res)
#&gt;         S1 S2 S3 S4 S5 S6 S7 S8
#&gt; Group A  6  3  3  3  2  3  5  4
#&gt; Group B  4  2  3  3  1  3  3  2
#&gt; Group C  4  2  2  2  2  3  3  3
#&gt; Group D  6  3  3  3  2  3  5  4

<sup>Created on 2023-02-27 by the reprex package (v1.0.0)</sup>

My output is not completely the same as your expected output because I think there are a few mistakes.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中查找属于特定组的列之间的共同行。

问题

答案1

在R中合并矩阵的行/列名称

Pandas：遍历数据框并根据条件应用更改。

去掉所有距单词右边至少2个空格的数字和逗号。

执行K均值聚类分析时，如何将数据重新组织为各个簇？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。