英文:
Find rows in common between columns belonging to specific groups using R
问题
# Your code here
英文:
I have a data frame of 8 columns and 6 rows. All rows are numeric and contains positive and negative values.
I have another file with 8 columns and associated group name (A, B, C or D) for each column. The first 2 columns are group A, the following 2 columns are group B, the next 2 columns are group C, and the last 2 columns are group D.
I want a code in R that allows me to calculate for each column, how many of the rows with values above 0 (X > 0) in a column have also values above 0 (x > 0) in the other columns. And the output should be grouped based on the four groups. Something like this:
Data file:
S1 S2 S3 S4 S5 S6 S7 S8
R1 2 1 2 -1 -3 5 4 -3
R2 4 -6 1 2 1 2 1 5
R3 3 2 -3 -9 -5 -1 4 9
R4 4 -4 -4 -6 4 -7 6 6
R5 6 -5 2 2 -7 -6 7 -6
R6 4 4 -3 3 -2 3 -4 2
Group file:
S1 S2 S3 S4 S5 S6 S7 S8
GroupA GroupA GroupB GroupB GroupC GroupC GroupD GroupD
Expected output file
S1 S2 S3 S4 S5 S6 S7 S8
Group A 6 3 3 3 2 3 5 4
Group B 4 2 3 3 1 3 3 2
Group C 4 1 2 2 2 3 3 3
Group D 6 3 3 3 2 3 5 5
Explanation for the values obtained in the expected output file:
Example 1: S1 and GroupB
The value obtained is 4, this is because S1 has values greater than 0 in all 6 rows, while R1, R2, R5 and R6 are greater than 0 in at least one of the samples of group B (S3 and S4).
Example 2: S3 and GroupD
The value obtained is 3, this is because S3 has values greater than 0 in R1, R2 and R5, and the rows are also greater than 0 in at least one of the samples of group D (S7 and S8).
答案1
得分: 1
希望我理解你的意思是正确的。
df <- read.table(
text = " S1 S2 S3 S4 S5 S6 S7 S8
R1 2 1 2 -1 -3 5 4 -3
R2 4 -6 1 2 1 2 1 5
R3 3 2 -3 -9 -5 -1 4 9
R4 4 -4 -4 -6 4 -7 6 6
R5 6 -5 2 2 -7 -6 7 -6
R6 4 4 -3 3 -2 3 -4 2"
)
groupings_new <- matrix(c(rep("Group A", 2), rep("Group B", 2), rep("Group C", 2), rep("Group D", 2)), byrow = TRUE, nrow = 1)
colnames(groupings_new) <- paste0("S", 1:8)
# 获取大于0的值的TRUE/FALSE
info_df <- df > 0
res <- lapply(seq_len(ncol(info_df)), function(i) {
grouping_info <- groupings_new[, , drop = TRUE]
# 检查矩阵和兴趣列中的值是否都大于0
compare_df <- info_df & matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
# 按组拆分
res <- lapply(unique(grouping_info), function(one_group) {
group_index <- grouping_info == one_group
# 检查哪些行是感兴趣的(值大于0),以及有多少个
sum(rowSums(compare_df[, group_index, drop = FALSE]) > 0)
})
res_clean <- data.frame(unlist(res))
colnames(res_clean) <- colnames(info_df[, i, drop = FALSE])
rownames(res_clean) <- unique(grouping_info)
res_clean
})
do.call(cbind, res)
#> S1 S2 S3 S4 S5 S6 S7 S8
#> Group A 6 3 3 3 2 3 5 4
#> Group B 4 2 3 3 1 3 3 2
#> Group C 4 2 2 2 2 3 3 3
#> Group D 6 3 3 3 2 3 5 4
英文:
I hope I understood correctly what you want to achieve.
df <- read.table(
text = " S1 S2 S3 S4 S5 S6 S7 S8
R1 2 1 2 -1 -3 5 4 -3
R2 4 -6 1 2 1 2 1 5
R3 3 2 -3 -9 -5 -1 4 9
R4 4 -4 -4 -6 4 -7 6 6
R5 6 -5 2 2 -7 -6 7 -6
R6 4 4 -3 3 -2 3 -4 2"
)
groupings_new <- matrix(c(rep("Group A", 2), rep("Group B", 2), rep("Group C", 2), rep("Group D", 2)), byrow = TRUE, nrow = 1)
colnames(groupings_new) <- paste0("S", 1:8)
# get TRUE/FALSE if a value is greater than 0
info_df <- df > 0
res <- lapply(seq_len(ncol(info_df)), function(i) {
grouping_info <- groupings_new[, , drop = TRUE]
# check if a value in the matrix and in the column of interest are both greater
# than 0
compare_df <- info_df & matrix(rep(info_df[, i], ncol(info_df)), nrow = nrow(info_df))
# split by groups
res <- lapply(unique(grouping_info), function(one_group) {
group_index <- grouping_info == one_group
# check which rows are of interest (values greater than 0) and how many are
# there
sum(rowSums(compare_df[, group_index, drop = FALSE]) > 0)
})
res_clean <- data.frame(unlist(res))
colnames(res_clean) <- colnames(info_df[, i, drop = FALSE])
rownames(res_clean) <- unique(grouping_info)
res_clean
})
do.call(cbind, res)
#> S1 S2 S3 S4 S5 S6 S7 S8
#> Group A 6 3 3 3 2 3 5 4
#> Group B 4 2 3 3 1 3 3 2
#> Group C 4 2 2 2 2 3 3 3
#> Group D 6 3 3 3 2 3 5 4
<sup>Created on 2023-02-27 by the reprex package (v1.0.0)</sup>
My output is not completely the same as your expected output because I think there are a few mistakes.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论