在R中查找属于特定组的列之间的共同行。

huangapple go评论95阅读模式
英文:

Find rows in common between columns belonging to specific groups using R

问题

  1. # Your code here
英文:

I have a data frame of 8 columns and 6 rows. All rows are numeric and contains positive and negative values.

I have another file with 8 columns and associated group name (A, B, C or D) for each column. The first 2 columns are group A, the following 2 columns are group B, the next 2 columns are group C, and the last 2 columns are group D.

I want a code in R that allows me to calculate for each column, how many of the rows with values above 0 (X > 0) in a column have also values above 0 (x > 0) in the other columns. And the output should be grouped based on the four groups. Something like this:

Data file:

  1. S1 S2 S3 S4 S5 S6 S7 S8
  2. R1 2 1 2 -1 -3 5 4 -3
  3. R2 4 -6 1 2 1 2 1 5
  4. R3 3 2 -3 -9 -5 -1 4 9
  5. R4 4 -4 -4 -6 4 -7 6 6
  6. R5 6 -5 2 2 -7 -6 7 -6
  7. R6 4 4 -3 3 -2 3 -4 2

Group file:

  1. S1 S2 S3 S4 S5 S6 S7 S8
  2. GroupA GroupA GroupB GroupB GroupC GroupC GroupD GroupD

Expected output file

  1. S1 S2 S3 S4 S5 S6 S7 S8
  2. Group A 6 3 3 3 2 3 5 4
  3. Group B 4 2 3 3 1 3 3 2
  4. Group C 4 1 2 2 2 3 3 3
  5. Group D 6 3 3 3 2 3 5 5

Explanation for the values obtained in the expected output file:

Example 1: S1 and GroupB

The value obtained is 4, this is because S1 has values greater than 0 in all 6 rows, while R1, R2, R5 and R6 are greater than 0 in at least one of the samples of group B (S3 and S4).

Example 2: S3 and GroupD

The value obtained is 3, this is because S3 has values greater than 0 in R1, R2 and R5, and the rows are also greater than 0 in at least one of the samples of group D (S7 and S8).

答案1

得分: 1

希望我理解你的意思是正确的。

  1. df <- read.table(
  2. text = " S1 S2 S3 S4 S5 S6 S7 S8
  3. R1 2 1 2 -1 -3 5 4 -3
  4. R2 4 -6 1 2 1 2 1 5
  5. R3 3 2 -3 -9 -5 -1 4 9
  6. R4 4 -4 -4 -6 4 -7 6 6
  7. R5 6 -5 2 2 -7 -6 7 -6
  8. R6 4 4 -3 3 -2 3 -4 2"
  9. )
  10. groupings_new <- matrix(c(rep("Group A", 2), rep("Group B", 2), rep("Group C", 2), rep("Group D", 2)), byrow = TRUE, nrow = 1)
  11. colnames(groupings_new) <- paste0("S", 1:8)
  12. # 获取大于0的值的TRUE/FALSE
  13. info_df <- df > 0
  14. res <- lapply(seq_len(ncol(info_df)), function(i) {
  15. grouping_info <- groupings_new[, , drop = TRUE]
  16. # 检查矩阵和兴趣列中的值是否都大于0
  17. compare_df <- info_df & matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
  18. # 按组拆分
  19. res <- lapply(unique(grouping_info), function(one_group) {
  20. group_index <- grouping_info == one_group
  21. # 检查哪些行是感兴趣的(值大于0),以及有多少个
  22. sum(rowSums(compare_df[, group_index, drop = FALSE]) > 0)
  23. })
  24. res_clean <- data.frame(unlist(res))
  25. colnames(res_clean) <- colnames(info_df[, i, drop = FALSE])
  26. rownames(res_clean) <- unique(grouping_info)
  27. res_clean
  28. })
  29. do.call(cbind, res)
  30. #> S1 S2 S3 S4 S5 S6 S7 S8
  31. #> Group A 6 3 3 3 2 3 5 4
  32. #> Group B 4 2 3 3 1 3 3 2
  33. #> Group C 4 2 2 2 2 3 3 3
  34. #> Group D 6 3 3 3 2 3 5 4
英文:

I hope I understood correctly what you want to achieve.

  1. df &lt;- read.table(
  2. text = &quot; S1 S2 S3 S4 S5 S6 S7 S8
  3. R1 2 1 2 -1 -3 5 4 -3
  4. R2 4 -6 1 2 1 2 1 5
  5. R3 3 2 -3 -9 -5 -1 4 9
  6. R4 4 -4 -4 -6 4 -7 6 6
  7. R5 6 -5 2 2 -7 -6 7 -6
  8. R6 4 4 -3 3 -2 3 -4 2&quot;
  9. )
  10. groupings_new &lt;- matrix(c(rep(&quot;Group A&quot;, 2), rep(&quot;Group B&quot;, 2), rep(&quot;Group C&quot;, 2), rep(&quot;Group D&quot;, 2)), byrow = TRUE, nrow = 1)
  11. colnames(groupings_new) &lt;- paste0(&quot;S&quot;, 1:8)
  12. # get TRUE/FALSE if a value is greater than 0
  13. info_df &lt;- df &gt; 0
  14. res &lt;- lapply(seq_len(ncol(info_df)), function(i) {
  15. grouping_info &lt;- groupings_new[, , drop = TRUE]
  16. # check if a value in the matrix and in the column of interest are both greater
  17. # than 0
  18. compare_df &lt;- info_df &amp; matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
  19. # split by groups
  20. res &lt;- lapply(unique(grouping_info), function(one_group) {
  21. group_index &lt;- grouping_info == one_group
  22. # check which rows are of interest (values greater than 0) and how many are
  23. # there
  24. sum(rowSums(compare_df[, group_index, drop = FALSE]) &gt; 0)
  25. })
  26. res_clean &lt;- data.frame(unlist(res))
  27. colnames(res_clean) &lt;- colnames(info_df[, i, drop = FALSE])
  28. rownames(res_clean) &lt;- unique(grouping_info)
  29. res_clean
  30. })
  31. do.call(cbind, res)
  32. #&gt; S1 S2 S3 S4 S5 S6 S7 S8
  33. #&gt; Group A 6 3 3 3 2 3 5 4
  34. #&gt; Group B 4 2 3 3 1 3 3 2
  35. #&gt; Group C 4 2 2 2 2 3 3 3
  36. #&gt; Group D 6 3 3 3 2 3 5 4

<sup>Created on 2023-02-27 by the reprex package (v1.0.0)</sup>

My output is not completely the same as your expected output because I think there are a few mistakes.

huangapple
  • 本文由 发表于 2023年2月27日 16:51:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75578389.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定