生成用于从新对象筛选数据的R函数

huangapple go评论55阅读模式
英文:

generate R function to filter data from a new object

问题

尝试开发一个新的对象(用于生物信息学),所以我想从对象的某些元素中提取信息,并且我想创建一个类似于dplyr的filter函数的函数。对象有一个数据框,所以我想从中过滤信息,但问题是数据框的某些列始终存在,但有些列不是,所以我想创建一个函数,使用那些不总是存在的列进行提取。

类似于:

my_function(object, parametersXYZ){
    data <- object@gene_table
    # 生成过滤条件
    result <- filter(data, parametersXYZ)
    return(result)
}

在这个意义上,parametersXYZ 对应于逻辑运算符(==、!=、%in%)和要过滤的元素,因此函数将被使用如下:

my_function(myobject, genes == "rtxA")

这样函数将在gene列中过滤所有的rtxA元素。我在寻找示例但是没找到,所以我试图检查dplyr中filter的代码,但我不确定如何实现它!!!

这是数据框的一个例子:

myobject[["gene_table"]] %>% head()

        cluster         qseqid    bp nseqs     sample gene   VF
1 cluster_00001 IOHEFJOD_02210 15627    20 Sample-001 rtxA rtxA
2 cluster_00001 CJMKIBHP_00364 15621    20 Sample-002 rtxA rtxA
3 cluster_00001 JEJKLKDJ_00421 15621    20 Sample-003 rtxA rtxA
4 cluster_00001 MOOCIOKH_00638 15621    20 Sample-004 rtxA rtxA
5 cluster_00001 HJJCNJPA_01986 15621    20 Sample-005 rtxA rtxA
6 cluster_00001 MDIJOING_00449 15621    20 Sample-006 rtxA rtxA

前4列始终存在,但其余的可能有不同的名称,甚至可能有任意数量的列,这些列是我的问题!!! 任何手册、建议或想法,非常感谢!!!

英文:

I’m trying to develop a new object (for bioinformatics), so I want to extract information from the some elements of the object, and I want to create a function similar to filter of dplyr. The object have a data frame, so I want to filter information from it, but the problem is that some columns of the data.frame are always present, but some of them not, so I want to create a function that extract using the columns that are not always present.

Something like:

my_function(object, parametersXYZ){
	data &lt;- object@gene_table
	#generate the filter 
	result &lt;- filter(data, parametersXYZ)
	return(results)
}

In this sense, parametersXYZ correspond to the column, the logical operator (==, !=, %in%) and the element to filter, so the function will be use like

my_function(myobject, genes == “rtxA”)

so the function will filter all the rtxA elements in the column gene. I was looking for examples but I just didn’t find it, so I tried to check the code of filter on dplyr, but I’m not sure how to implement it !!!

this is an example of the data.frame

myobject[[&quot;gene_table&quot;]] %&gt;% head()

        cluster         qseqid    bp nseqs     sample gene   VF
1 cluster_00001 IOHEFJOD_02210 15627    20 Sample-001 rtxA rtxA
2 cluster_00001 CJMKIBHP_00364 15621    20 Sample-002 rtxA rtxA
3 cluster_00001 JEJKLKDJ_00421 15621    20 Sample-003 rtxA rtxA
4 cluster_00001 MOOCIOKH_00638 15621    20 Sample-004 rtxA rtxA
5 cluster_00001 HJJCNJPA_01986 15621    20 Sample-005 rtxA rtxA
6 cluster_00001 MDIJOING_00449 15621    20 Sample-006 rtxA rtxA

the first 4 columns are always present, but the rest could present different names, or even could have any number of columns, those columns are my problem !!!

Any manual, suggestion or idea
Thanks so much !!!

答案1

得分: 2

这是一个获取评估的方法

 my_function <- function(object, parametersXYZ){
      filter(object@gene_table, {{parametersXYZ}})
}

-在不使用object@gene_table的情况下进行测试

> my_function(iris, Species == "setosa") %>% head
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
英文:

Here is one way to get evaluated

 my_function &lt;- function(object, parametersXYZ){
      filter(object@gene_table, {{parametersXYZ}})
}

-testing without using the object@gene_table

&gt; my_function(iris, Species == &quot;setosa&quot;) %&gt;% head
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa


</details>



# 答案2
**得分**: 1

以下是翻译的代码部分:

```R
my_function <- function(object, parametersXYZ){
  par <- substitute(parametersXYZ)
  dat <- object@gene_table
  subset(dat, eval(par, dat))
}
英文:

In Base R you could do:

my_function &lt;- function(object, parametersXYZ){
  par &lt;- substitute(parametersXYZ)
  dat &lt;- object@gene_table
  subset(dat, eval(par, dat))
}

huangapple
  • 本文由 发表于 2023年4月4日 14:13:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75926029.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定