在R中,使用`sub`函数保留在“.”之前的所有内容的方法是:

huangapple go评论85阅读模式
英文:

how to keep everything before "." in R using sub

问题

Sure, here's the translated code part:

我有一个在R中的数据框:

    structure(list(chr = c(1, 1, 1, 1, 1), gene_id = c("ENSG00000223972.5", 
    "ENSG00000227232.5", "ENSG00000278267.1", "ENSG00000243485.5", 
    "ENSG00000237613.2"), gene_name = c("DDX11L1", "WASH7P", "MIR6859-1", 
    "MIR1302-2HG", "FAM138A"), start = c(11869, 14410, 17369, 29571, 
    34554), end = c(14403, 29553, 17436, 31109, 36081), gene_type = c("转录未加工伪基因", 
    "未加工伪基因", "miRNA", "lincRNA", "lincRNA")), row.names = c(NA, 
    -5L), class = c("tbl_df", "tbl", "data.frame"))

我想要编辑基因名,只保留“.”之前的数据,例如:

    ENSG00000223972.5 变为 ENSG00000223972
我尝试了这个:

    gene_annot_parsed1 <- sub(".*\\.", "", gene_annot_parsed$gene_id)

但它给出了这个输出:

    dput(gene_annot_parsed[1:2])
    c("NSG00000223972.5", "NSG00000227232.5")

我只想修改gene_id列,保留“.”之后的部分,保持其他列不变。
在我的情况下,它删除了“E”并删除了其他列。
有人知道如何解决这个问题吗?
谢谢。
英文:

I have a dataframe in R:

structure(list(chr = c(1, 1, 1, 1, 1), gene_id = c(&quot;ENSG00000223972.5&quot;, 
&quot;ENSG00000227232.5&quot;, &quot;ENSG00000278267.1&quot;, &quot;ENSG00000243485.5&quot;, 
&quot;ENSG00000237613.2&quot;), gene_name = c(&quot;DDX11L1&quot;, &quot;WASH7P&quot;, &quot;MIR6859-1&quot;, 
&quot;MIR1302-2HG&quot;, &quot;FAM138A&quot;), start = c(11869, 14410, 17369, 29571, 
34554), end = c(14403, 29553, 17436, 31109, 36081), gene_type = c(&quot;transcribed_unprocessed_pseudogene&quot;, 
&quot;unprocessed_pseudogene&quot;, &quot;miRNA&quot;, &quot;lincRNA&quot;, &quot;lincRNA&quot;)), row.names = c(NA, 
-5L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;))

I want to edit the gene name to only keep data before "."
for example:

ENSG00000223972.5 to ENSG00000223972

I did this:

gene_annot_parsed1 &lt;- sub(&quot;.*^.&quot;,&quot;&quot;,gene_annot_parsed$gene_id)

But it gives this output:

dput(gene_annot_parsed[1:2])
c(&quot;NSG00000223972.5&quot;, &quot;NSG00000227232.5&quot;)

I just want to modify the gene_id column to anything after "." and keep rest of the column same
In my case its removing "E" and removing other columns.
Does anyone know how to solve this.
Thank you.

答案1

得分: 1

gene_annot_parsed1 <- stringr::str_replace_all(gene_annot_parsed$gene_id, '(.*)\\.', '\')
英文:
gene_annot_parsed1  &lt;- stringr::str_replace_all(gene_annot_parsed$gene_id, &#39;(.*)\\.&#39;, &#39;\&#39;)

huangapple
  • 本文由 发表于 2023年5月18日 02:31:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76275205.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定