在R中查找属于特定组的列之间的共同行。

huangapple go评论69阅读模式
英文:

Find rows in common between columns belonging to specific groups using R

问题

# Your code here
英文:

I have a data frame of 8 columns and 6 rows. All rows are numeric and contains positive and negative values.

I have another file with 8 columns and associated group name (A, B, C or D) for each column. The first 2 columns are group A, the following 2 columns are group B, the next 2 columns are group C, and the last 2 columns are group D.

I want a code in R that allows me to calculate for each column, how many of the rows with values above 0 (X > 0) in a column have also values above 0 (x > 0) in the other columns. And the output should be grouped based on the four groups. Something like this:

Data file:

	S1  S2  S3  S4  S5  S6  S7  S8
R1  2   1   2  -1  -3   5   4  -3 
R2  4  -6   1   2   1   2   1   5
R3  3   2  -3  -9  -5  -1   4   9
R4  4  -4  -4  -6   4  -7   6   6
R5  6  -5   2   2  -7  -6   7  -6
R6  4   4  -3   3  -2   3  -4   2

Group file:

S1        S2       S3       S4       S5      S6      S7     S8
GroupA   GroupA  GroupB   GroupB   GroupC  GroupC  GroupD  GroupD

Expected output file

	       S1	S2   S3	 S4  S5  S6  S7  S8
Group A    6    3    3   3   2   3   5   4
Group B    4    2    3   3   1   3   3   2
Group C    4    1    2   2   2   3   3   3
Group D    6    3    3   3   2   3   5   5

Explanation for the values obtained in the expected output file:

Example 1: S1 and GroupB

The value obtained is 4, this is because S1 has values greater than 0 in all 6 rows, while R1, R2, R5 and R6 are greater than 0 in at least one of the samples of group B (S3 and S4).

Example 2: S3 and GroupD

The value obtained is 3, this is because S3 has values greater than 0 in R1, R2 and R5, and the rows are also greater than 0 in at least one of the samples of group D (S7 and S8).

答案1

得分: 1

希望我理解你的意思是正确的。

df <- read.table(
  text = "    S1  S2  S3  S4  S5  S6  S7  S8
R1  2   1   2  -1  -3   5   4  -3 
R2  4  -6   1   2   1   2   1   5
R3  3   2  -3  -9  -5  -1   4   9
R4  4  -4  -4  -6   4  -7   6   6
R5  6  -5   2   2  -7  -6   7  -6
R6  4   4  -3   3  -2   3  -4   2"
)

groupings_new <- matrix(c(rep("Group A", 2), rep("Group B", 2), rep("Group C", 2), rep("Group D", 2)), byrow = TRUE, nrow = 1)
colnames(groupings_new) <- paste0("S", 1:8)

# 获取大于0的值的TRUE/FALSE
info_df <- df > 0

res <- lapply(seq_len(ncol(info_df)), function(i) {
  grouping_info <- groupings_new[, , drop = TRUE]
  
  # 检查矩阵和兴趣列中的值是否都大于0
  compare_df <- info_df & matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
  
  # 按组拆分
  res <- lapply(unique(grouping_info), function(one_group) {
    group_index <- grouping_info == one_group
    # 检查哪些行是感兴趣的(值大于0),以及有多少个
    sum(rowSums(compare_df[, group_index, drop = FALSE]) > 0)
  })
  res_clean <- data.frame(unlist(res))
  colnames(res_clean) <- colnames(info_df[, i, drop = FALSE])
  rownames(res_clean) <- unique(grouping_info)
  res_clean
})

do.call(cbind, res)
#>         S1 S2 S3 S4 S5 S6 S7 S8
#> Group A  6  3  3  3  2  3  5  4
#> Group B  4  2  3  3  1  3  3  2
#> Group C  4  2  2  2  2  3  3  3
#> Group D  6  3  3  3  2  3  5  4
英文:

I hope I understood correctly what you want to achieve.

df &lt;- read.table(
  text = &quot;    S1  S2  S3  S4  S5  S6  S7  S8
R1  2   1   2  -1  -3   5   4  -3 
R2  4  -6   1   2   1   2   1   5
R3  3   2  -3  -9  -5  -1   4   9
R4  4  -4  -4  -6   4  -7   6   6
R5  6  -5   2   2  -7  -6   7  -6
R6  4   4  -3   3  -2   3  -4   2&quot;
)

groupings_new &lt;- matrix(c(rep(&quot;Group A&quot;, 2), rep(&quot;Group B&quot;, 2), rep(&quot;Group C&quot;, 2), rep(&quot;Group D&quot;, 2)), byrow = TRUE, nrow = 1)
colnames(groupings_new) &lt;- paste0(&quot;S&quot;, 1:8)

# get TRUE/FALSE if a value is greater than 0
info_df &lt;- df &gt; 0

res &lt;- lapply(seq_len(ncol(info_df)), function(i) {
  grouping_info &lt;- groupings_new[, , drop = TRUE]
  
  # check if a value in the matrix and in the column of interest are both greater
  # than 0
  compare_df &lt;- info_df &amp; matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
  
  # split by groups
  res &lt;- lapply(unique(grouping_info), function(one_group) {
    group_index &lt;- grouping_info == one_group
    # check which rows are of interest (values greater than 0) and how many are
    # there
    sum(rowSums(compare_df[, group_index, drop = FALSE]) &gt; 0)
  })
  res_clean &lt;- data.frame(unlist(res))
  colnames(res_clean) &lt;- colnames(info_df[, i, drop = FALSE])
  rownames(res_clean) &lt;- unique(grouping_info)
  res_clean
})

do.call(cbind, res)
#&gt;         S1 S2 S3 S4 S5 S6 S7 S8
#&gt; Group A  6  3  3  3  2  3  5  4
#&gt; Group B  4  2  3  3  1  3  3  2
#&gt; Group C  4  2  2  2  2  3  3  3
#&gt; Group D  6  3  3  3  2  3  5  4

<sup>Created on 2023-02-27 by the reprex package (v1.0.0)</sup>

My output is not completely the same as your expected output because I think there are a few mistakes.

huangapple
  • 本文由 发表于 2023年2月27日 16:51:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75578389.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定