英文:
how to keep everything before "." in R using sub
问题
Sure, here's the translated code part:
我有一个在R中的数据框:
structure(list(chr = c(1, 1, 1, 1, 1), gene_id = c("ENSG00000223972.5",
"ENSG00000227232.5", "ENSG00000278267.1", "ENSG00000243485.5",
"ENSG00000237613.2"), gene_name = c("DDX11L1", "WASH7P", "MIR6859-1",
"MIR1302-2HG", "FAM138A"), start = c(11869, 14410, 17369, 29571,
34554), end = c(14403, 29553, 17436, 31109, 36081), gene_type = c("转录未加工伪基因",
"未加工伪基因", "miRNA", "lincRNA", "lincRNA")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
我想要编辑基因名,只保留“.”之前的数据,例如:
ENSG00000223972.5 变为 ENSG00000223972
我尝试了这个:
gene_annot_parsed1 <- sub(".*\\.", "", gene_annot_parsed$gene_id)
但它给出了这个输出:
dput(gene_annot_parsed[1:2])
c("NSG00000223972.5", "NSG00000227232.5")
我只想修改gene_id列,保留“.”之后的部分,保持其他列不变。
在我的情况下,它删除了“E”并删除了其他列。
有人知道如何解决这个问题吗?
谢谢。
英文:
I have a dataframe in R:
structure(list(chr = c(1, 1, 1, 1, 1), gene_id = c("ENSG00000223972.5",
"ENSG00000227232.5", "ENSG00000278267.1", "ENSG00000243485.5",
"ENSG00000237613.2"), gene_name = c("DDX11L1", "WASH7P", "MIR6859-1",
"MIR1302-2HG", "FAM138A"), start = c(11869, 14410, 17369, 29571,
34554), end = c(14403, 29553, 17436, 31109, 36081), gene_type = c("transcribed_unprocessed_pseudogene",
"unprocessed_pseudogene", "miRNA", "lincRNA", "lincRNA")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
I want to edit the gene name to only keep data before "."
for example:
ENSG00000223972.5 to ENSG00000223972
I did this:
gene_annot_parsed1 <- sub(".*^.","",gene_annot_parsed$gene_id)
But it gives this output:
dput(gene_annot_parsed[1:2])
c("NSG00000223972.5", "NSG00000227232.5")
I just want to modify the gene_id column to anything after "." and keep rest of the column same
In my case its removing "E" and removing other columns.
Does anyone know how to solve this.
Thank you.
答案1
得分: 1
gene_annot_parsed1 <- stringr::str_replace_all(gene_annot_parsed$gene_id, '(.*)\\.', '\')
英文:
gene_annot_parsed1 <- stringr::str_replace_all(gene_annot_parsed$gene_id, '(.*)\\.', '\')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论