英文:
how to filter dataframe based on rownames using strsplit
问题
I have a dataframe:
dput(gene_exp[1:5, 1:5])
structure(list(en_Adipose_Subcutaneous.db = c(0.0531016390078734,
-0.00413407782001034, -0.035434632568444, 0.00968736935965742,
0.0523714252287003), en_Adipose_Visceral_Omentum.db = c(0, 0,
0, 0, 0), en_Adrenal_Gland.db = c(0, 0, 0, 0, 0), en_Artery_Aorta.db = c(0,
0, 0, 0, 0), en_Artery_Coronary.db = c(0, 0, 0, 0, 0)), row.names = c("rs1041770_ENSG00000283633.1",
"rs12628452_ENSG00000283633.1", "rs915675_ENSG00000283633.1",
"rs11089130_ENSG00000283633.1", "rs36061596_ENSG00000283633.1"
), class = "data.frame")
I want to filter this dataframe for gene1 only. I wrote this code:
gene <- gene_exp %>% filter(unlist(strsplit(rownames(gene_exp), "_")) %in% "ENSG00000283633.1")
Error in filter():
ℹ In argument: `unlist(strsplit(rownames(gene_exp), "_")) %in%
"ENSG00000283633.1"`.
Caused by error:
! `..1` must be of size 5956 or 1, not size 11902.
Run `rlang::last_trace()` to see where the error occurred.
Is there any other way to solve this? Thank you.
其他方法来解决这个问题吗?谢谢。
英文:
I have a dataframe:
dput(gene_exp[1:5,1:5])
structure(list(en_Adipose_Subcutaneous.db = c(0.0531016390078734,
-0.00413407782001034, -0.035434632568444, 0.00968736935965742,
0.0523714252287003), en_Adipose_Visceral_Omentum.db = c(0, 0,
0, 0, 0), en_Adrenal_Gland.db = c(0, 0, 0, 0, 0), en_Artery_Aorta.db = c(0,
0, 0, 0, 0), en_Artery_Coronary.db = c(0, 0, 0, 0, 0)), row.names = c("rs1041770_ENSG00000283633.1",
"rs12628452_ENSG00000283633.1", "rs915675_ENSG00000283633.1",
"rs11089130_ENSG00000283633.1", "rs36061596_ENSG00000283633.1"
), class = "data.frame")
I want to filter this dataframe for gene1 only:
I wrote this code:
gene <- gene_exp %>% filter(unlist(strsplit(rownames(gene_exp), "_")) %in% "ENSG00000283633.1")
Error in `filter()`:
ℹ In argument: `unlist(strsplit(rownames(gene_exp), "_")) %in%
"ENSG00000283633.1"`.
Caused by error:
! `..1` must be of size 5956 or 1, not size 11902.
Run `rlang::last_trace()` to see where the error occurred.
Is there any other way to solve this.
Thank you.
答案1
得分: 2
我不建议在使用 tidyverse
时使用行名(rownames),因为 tidy 原则之一是数据应该存储在列中,而不是在其他属性中,比如行名。我会将基因名称添加到我的数据作为一个正式的列,然后在其上进行筛选。所以,例如:
library(tidyverse)
your_data %>%
rownames_to_column() %>%
separate(rowname, into = c('rs', 'gene_name'), sep = '_') %>%
filter(gene_name == 'ENSG00000283633.1')
英文:
I don't recommend using rownames if you are using the tidyverse
, since one of the tidy principles is that data should live in columns (and so not in other attributes like rownames). I would add the gene name into my data as a proper column, then filter on that. So, for example:
library(tidyverse)
your_data %>%
rownames_to_column() %>%
separate(rowname, into = c('rs', 'gene_name'), sep = '_') %>%
filter(gene_name == 'ENSG00000283633.1')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论