删除数据框中所有列包含相同内容或为空的行。

huangapple go评论62阅读模式
英文:

R: remove rows in data frame for which all columns contain same content or nothing

问题

我有一个数据框:

# 创建一个数据框
V1 = c("gene_1", "gene_1", "", "")
V2 = c("gene_2", "gene_2", "", "")
V3 = c("gene_3", "gene_3", "gene_4", "")
V4 = c("gene_4", "gene_4", "", "")
V5 = c("gene_5", "gene_5", "gene_8", "")
V6 = c("gene_6", "gene_6", "gene_6", "gene_7")
df = as.data.frame(rbind(V1, V2, V3, V4, V5, V6))

数据框df看起来像这样:

      V1     V2     V3     V4
1 gene_1 gene_1            
2 gene_2 gene_2            
3 gene_3 gene_3 gene_4       
4 gene_4 gene_4            
5 gene_5 gene_5 gene_8       
6 gene_6 gene_6 gene_6 gene_7

现在,我想要删除所有只包含相同基因标签的行,结果如下:

      V1     V2     V3     V4
3 gene_3 gene_3 gene_4       
5 gene_5 gene_5 gene_8       
6 gene_6 gene_6 gene_6 gene_7

我在Stack Overflow上找到了一些类似的问题,包括这里,但这些解决方案都不适用于我的确切问题。我觉得这应该很简单,但似乎找不到如何处理的方法。

英文:

I have a data frame:

# create a data frame
V1 = c("gene_1", "gene_1", "", "")
V2 = c("gene_2", "gene_2", "", "")
V3 = c("gene_3", "gene_3", "gene_4", "")
V4 = c("gene_4", "gene_4", "", "")
V5 = c("gene_5", "gene_5", "gene_8", "")
V6 = c("gene_6", "gene_6", "gene_6", "gene_7")
df = as.data.frame(rbind(V1, V2, V3, V4, V5, V6))

The data frame df looks like this:

> V1 V2 V3 V4
> V1 gene_1 gene_1
> V2 gene_2 gene_2
> V3 gene_3 gene_3 gene_4
> V4 gene_4 gene_4
> V5 gene_5 gene_5 gene_8
> V6 gene_6 gene_6 gene_6 gene_7

Now, I want to remove all the rows that have only labels of the same gene, resulting in:

> V1 V2 V3 V4
> V3 gene_3 gene_3 gene_4
> V5 gene_5 gene_5 gene_8
> V6 gene_6 gene_6 gene_6 gene_7

I found several similar questions on stack overflow, including here but none of these solutions work for my exact issue. I feel like this should be easy, but I can't seem to find how to go about this.

答案1

得分: 0

我找到了一个解决方案,基于我在这里找到的另一篇帖子:

df[df == '' | is.na(df)] <- NA
df %>% filter(if_any(V2:V4, ~ .x != V1))

给出结果:

>      V1     V2     V3     V4
>     V3 gene_3 gene_3 gene_4   <NA>
>     V5 gene_5 gene_5 gene_8   <NA>
>     V6 gene_6 gene_6 gene_6 gene_7
英文:

I found a solution, based on another post that I found here:

df[df == &#39;&#39;] &lt;- NA
df %&gt;% filter(if_any(V2:V4, ~ .x != V1))       

Gives:

> V1 V2 V3 V4
> V3 gene_3 gene_3 gene_4 <NA>
> V5 gene_5 gene_5 gene_8 <NA>
> V6 gene_6 gene_6 gene_6 gene_7

huangapple
  • 本文由 发表于 2023年2月8日 09:22:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75380536.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定