2023年2月8日 09:22:10go评论69阅读模式

英文:

R: remove rows in data frame for which all columns contain same content or nothing

问题

我有一个数据框：

# 创建一个数据框
V1 = c("gene_1", "gene_1", "", "")
V2 = c("gene_2", "gene_2", "", "")
V3 = c("gene_3", "gene_3", "gene_4", "")
V4 = c("gene_4", "gene_4", "", "")
V5 = c("gene_5", "gene_5", "gene_8", "")
V6 = c("gene_6", "gene_6", "gene_6", "gene_7")
df = as.data.frame(rbind(V1, V2, V3, V4, V5, V6))

数据框df看起来像这样：

      V1     V2     V3     V4
1 gene_1 gene_1            
2 gene_2 gene_2            
3 gene_3 gene_3 gene_4       
4 gene_4 gene_4            
5 gene_5 gene_5 gene_8       
6 gene_6 gene_6 gene_6 gene_7

现在，我想要删除所有只包含相同基因标签的行，结果如下：

      V1     V2     V3     V4
3 gene_3 gene_3 gene_4       
5 gene_5 gene_5 gene_8       
6 gene_6 gene_6 gene_6 gene_7

我在Stack Overflow上找到了一些类似的问题，包括这里，但这些解决方案都不适用于我的确切问题。我觉得这应该很简单，但似乎找不到如何处理的方法。

英文:

I have a data frame:

# create a data frame
V1 = c(&quot;gene_1&quot;, &quot;gene_1&quot;, &quot;&quot;, &quot;&quot;)
V2 = c(&quot;gene_2&quot;, &quot;gene_2&quot;, &quot;&quot;, &quot;&quot;)
V3 = c(&quot;gene_3&quot;, &quot;gene_3&quot;, &quot;gene_4&quot;, &quot;&quot;)
V4 = c(&quot;gene_4&quot;, &quot;gene_4&quot;, &quot;&quot;, &quot;&quot;)
V5 = c(&quot;gene_5&quot;, &quot;gene_5&quot;, &quot;gene_8&quot;, &quot;&quot;)
V6 = c(&quot;gene_6&quot;, &quot;gene_6&quot;, &quot;gene_6&quot;, &quot;gene_7&quot;)
df = as.data.frame(rbind(V1, V2, V3, V4, V5, V6))

The data frame df looks like this:

> V1 V2 V3 V4
> V1 gene_1 gene_1
> V2 gene_2 gene_2
> V3 gene_3 gene_3 gene_4
> V4 gene_4 gene_4
> V5 gene_5 gene_5 gene_8
> V6 gene_6 gene_6 gene_6 gene_7

Now, I want to remove all the rows that have only labels of the same gene, resulting in:

> V1 V2 V3 V4
> V3 gene_3 gene_3 gene_4
> V5 gene_5 gene_5 gene_8
> V6 gene_6 gene_6 gene_6 gene_7

I found several similar questions on stack overflow, including here but none of these solutions work for my exact issue. I feel like this should be easy, but I can't seem to find how to go about this.

答案1

得分: 0

我找到了一个解决方案，基于我在这里找到的另一篇帖子：

df[df == '' | is.na(df)] <- NA
df %>% filter(if_any(V2:V4, ~ .x != V1))

给出结果：

>      V1     V2     V3     V4
>     V3 gene_3 gene_3 gene_4   <NA>
>     V5 gene_5 gene_5 gene_8   <NA>
>     V6 gene_6 gene_6 gene_6 gene_7

英文:

I found a solution, based on another post that I found here:

df[df == &#39;&#39;] &lt;- NA
df %&gt;% filter(if_any(V2:V4, ~ .x != V1))

Gives:

> V1 V2 V3 V4
> V3 gene_3 gene_3 gene_4 <NA>
> V5 gene_5 gene_5 gene_8 <NA>
> V6 gene_6 gene_6 gene_6 gene_7

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

删除数据框中所有列包含相同内容或为空的行。

问题

答案1

如何将数据框转换为时间序列（年度）。

2个标签在GTM上发送重复数据吗？

在R中基于另一列的子字符串创建新列？

DataFrame 操作在循环中非常低效，不知道如何修复它。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论