如何使用strsplit基于行名称筛选数据框。

huangapple go评论76阅读模式
英文:

how to filter dataframe based on rownames using strsplit

问题

I have a dataframe:

dput(gene_exp[1:5, 1:5])
structure(list(en_Adipose_Subcutaneous.db = c(0.0531016390078734, 
-0.00413407782001034, -0.035434632568444, 0.00968736935965742, 
0.0523714252287003), en_Adipose_Visceral_Omentum.db = c(0, 0, 
0, 0, 0), en_Adrenal_Gland.db = c(0, 0, 0, 0, 0), en_Artery_Aorta.db = c(0, 
0, 0, 0, 0), en_Artery_Coronary.db = c(0, 0, 0, 0, 0)), row.names = c("rs1041770_ENSG00000283633.1", 
"rs12628452_ENSG00000283633.1", "rs915675_ENSG00000283633.1", 
"rs11089130_ENSG00000283633.1", "rs36061596_ENSG00000283633.1"
), class = "data.frame")

I want to filter this dataframe for gene1 only. I wrote this code:

gene <- gene_exp %>% filter(unlist(strsplit(rownames(gene_exp), "_")) %in% "ENSG00000283633.1")
Error in filter():
ℹ In argument: `unlist(strsplit(rownames(gene_exp), "_")) %in%
"ENSG00000283633.1"`.
Caused by error:

! `..1` must be of size 5956 or 1, not size 11902.
Run `rlang::last_trace()` to see where the error occurred.

Is there any other way to solve this? Thank you.

其他方法来解决这个问题吗?谢谢。

英文:

I have a dataframe:

dput(gene_exp[1:5,1:5])
structure(list(en_Adipose_Subcutaneous.db = c(0.0531016390078734, 
-0.00413407782001034, -0.035434632568444, 0.00968736935965742, 
0.0523714252287003), en_Adipose_Visceral_Omentum.db = c(0, 0, 
0, 0, 0), en_Adrenal_Gland.db = c(0, 0, 0, 0, 0), en_Artery_Aorta.db = c(0, 
0, 0, 0, 0), en_Artery_Coronary.db = c(0, 0, 0, 0, 0)), row.names = c(&quot;rs1041770_ENSG00000283633.1&quot;, 
&quot;rs12628452_ENSG00000283633.1&quot;, &quot;rs915675_ENSG00000283633.1&quot;, 
&quot;rs11089130_ENSG00000283633.1&quot;, &quot;rs36061596_ENSG00000283633.1&quot;
), class = &quot;data.frame&quot;)

I want to filter this dataframe for gene1 only:
I wrote this code:

gene &lt;- gene_exp %&gt;% filter(unlist(strsplit(rownames(gene_exp), &quot;_&quot;)) %in% &quot;ENSG00000283633.1&quot;)
Error in `filter()`:
ℹ In argument: `unlist(strsplit(rownames(gene_exp), &quot;_&quot;)) %in%
  &quot;ENSG00000283633.1&quot;`.
Caused by error:

! `..1` must be of size 5956 or 1, not size 11902.
Run `rlang::last_trace()` to see where the error occurred.

Is there any other way to solve this.
Thank you.

答案1

得分: 2

我不建议在使用 tidyverse 时使用行名(rownames),因为 tidy 原则之一是数据应该存储在列中,而不是在其他属性中,比如行名。我会将基因名称添加到我的数据作为一个正式的列,然后在其上进行筛选。所以,例如:

library(tidyverse)

your_data %>%
  rownames_to_column() %>%
  separate(rowname, into = c('rs', 'gene_name'), sep = '_') %>%
  filter(gene_name == 'ENSG00000283633.1')
英文:

I don't recommend using rownames if you are using the tidyverse, since one of the tidy principles is that data should live in columns (and so not in other attributes like rownames). I would add the gene name into my data as a proper column, then filter on that. So, for example:

library(tidyverse)

your_data %&gt;% 
  rownames_to_column() %&gt;% 
  separate(rowname, into = c(&#39;rs&#39;, &#39;gene_name&#39;), sep = &#39;_&#39;) %&gt;% 
  filter(gene_name == &#39;ENSG00000283633.1&#39;)

huangapple
  • 本文由 发表于 2023年5月30日 04:04:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360062.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定