2023年4月4日 14:13:45go评论94阅读模式

英文:

generate R function to filter data from a new object

问题

尝试开发一个新的对象（用于生物信息学），所以我想从对象的某些元素中提取信息，并且我想创建一个类似于dplyr的filter函数的函数。对象有一个数据框，所以我想从中过滤信息，但问题是数据框的某些列始终存在，但有些列不是，所以我想创建一个函数，使用那些不总是存在的列进行提取。

类似于：

my_function(object, parametersXYZ){
    data <- object@gene_table
    # 生成过滤条件
    result <- filter(data, parametersXYZ)
    return(result)
}

在这个意义上，parametersXYZ 对应于列、逻辑运算符（==、!=、%in%）和要过滤的元素，因此函数将被使用如下：

my_function(myobject, genes == "rtxA")

这样函数将在gene列中过滤所有的rtxA元素。我在寻找示例但是没找到，所以我试图检查dplyr中filter的代码，但我不确定如何实现它!!!

这是数据框的一个例子：

myobject[["gene_table"]] %>% head()
        cluster         qseqid    bp nseqs     sample gene   VF
1 cluster_00001 IOHEFJOD_02210 15627    20 Sample-001 rtxA rtxA
2 cluster_00001 CJMKIBHP_00364 15621    20 Sample-002 rtxA rtxA
3 cluster_00001 JEJKLKDJ_00421 15621    20 Sample-003 rtxA rtxA
4 cluster_00001 MOOCIOKH_00638 15621    20 Sample-004 rtxA rtxA
5 cluster_00001 HJJCNJPA_01986 15621    20 Sample-005 rtxA rtxA
6 cluster_00001 MDIJOING_00449 15621    20 Sample-006 rtxA rtxA

前4列始终存在，但其余的可能有不同的名称，甚至可能有任意数量的列，这些列是我的问题！！！任何手册、建议或想法，非常感谢！！！

英文:

I’m trying to develop a new object (for bioinformatics), so I want to extract information from the some elements of the object, and I want to create a function similar to filter of dplyr. The object have a data frame, so I want to filter information from it, but the problem is that some columns of the data.frame are always present, but some of them not, so I want to create a function that extract using the columns that are not always present.

Something like:

my_function(object, parametersXYZ){
	data &lt;- object@gene_table
	#generate the filter 
	result &lt;- filter(data, parametersXYZ)
	return(results)
}

In this sense, parametersXYZ correspond to the column, the logical operator (==, !=, %in%) and the element to filter, so the function will be use like

my_function(myobject, genes == “rtxA”)

so the function will filter all the rtxA elements in the column gene. I was looking for examples but I just didn’t find it, so I tried to check the code of filter on dplyr, but I’m not sure how to implement it !!!

this is an example of the data.frame

myobject[[&quot;gene_table&quot;]] %&gt;% head()
        cluster         qseqid    bp nseqs     sample gene   VF
1 cluster_00001 IOHEFJOD_02210 15627    20 Sample-001 rtxA rtxA
2 cluster_00001 CJMKIBHP_00364 15621    20 Sample-002 rtxA rtxA
3 cluster_00001 JEJKLKDJ_00421 15621    20 Sample-003 rtxA rtxA
4 cluster_00001 MOOCIOKH_00638 15621    20 Sample-004 rtxA rtxA
5 cluster_00001 HJJCNJPA_01986 15621    20 Sample-005 rtxA rtxA
6 cluster_00001 MDIJOING_00449 15621    20 Sample-006 rtxA rtxA

the first 4 columns are always present, but the rest could present different names, or even could have any number of columns, those columns are my problem !!!

Any manual, suggestion or idea
Thanks so much !!!

答案1

得分: 2

这是一个获取评估的方法

 my_function <- function(object, parametersXYZ){
      filter(object@gene_table, {{parametersXYZ}})
}

-在不使用object@gene_table的情况下进行测试

> my_function(iris, Species == "setosa") %>% head
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

英文:

Here is one way to get evaluated

 my_function &lt;- function(object, parametersXYZ){
      filter(object@gene_table, {{parametersXYZ}})
}

-testing without using the object@gene_table

&gt; my_function(iris, Species == &quot;setosa&quot;) %&gt;% head
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
</details>
# 答案2
**得分**: 1
以下是翻译的代码部分：
```R
my_function <- function(object, parametersXYZ){
  par <- substitute(parametersXYZ)
  dat <- object@gene_table
  subset(dat, eval(par, dat))
}

英文:

In Base R you could do:

my_function &lt;- function(object, parametersXYZ){
  par &lt;- substitute(parametersXYZ)
  dat &lt;- object@gene_table
  subset(dat, eval(par, dat))
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

生成用于从新对象筛选数据的R函数

问题

答案1

ifelse() 从两个其他因子向量创建新的因子向量未返回预期值。

如何使用geom+line和来自6个不同列表（CSV文件）的分类数据。

在 RealmRecyclerViewAdapter 中的筛选不会隐藏被排除的元素。

如何使用dplyr库选择特定变量，然后将一个变量与特定值匹配。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。