英文:
remove specific data within the string in R
问题
I tried this function (gsub) but it deleted the specific element only. I'm wondering if I can use it to keep the gene symbol only (which always comes in the second place in the string) and delete everything else.
英文:
im new to R, i have this data frame and im trying to delet all the infromation from this column except the genes symbols which always comes secound in place within the string.
enter image description here
best regards!
i tried this function (gsub) but it deleted the specific element only . i`m wandring if i can use it to keep the gene symbol only ( which is always come in the secound place in the string) and delet every thing else
答案1
得分: 1
如果您的数据始终按照图像中显示的格式(其中基因ID始终是字符串的第三个“单词”)进行排列,那么stringr包中的word()
函数可以提取您想要的数据。
library(stringr)
dat = data.frame(gene_assignment = rep(c('idnumbers // geneID // Other stuff'), 10))
dat$geneID = word(dat$gene_assignment, 3)
请注意,这里有以下假设:
- 您的数据始终按照某些ID号码,后跟“ // ”,然后是基因ID,再跟一个空格,最后是其他内容的格式排列。
- 前面的ID号码和基因ID都不包含空格。
这些假设是必要的,因为word()
函数使用空格来确定每个单词的开始和结束。
英文:
If your data is consistently in the format shown in the image (where the gene ID is always the third "word" of the string), then the word()
function from the stringr package can extract the data you want.
library(stringr)
dat = data.frame(gene_assignment = rep(c('idnumbers // geneID // Other stuff'),10))
dat$geneID = word(dat$gene_assignment, 3)
Note that this makes the following assumptions:
- Your data is always in the format where there are some id numbers, followed by " // ", followed by the gene ID, followed by a space, and then anything else
- Neither the ID numbers in the front nor the gene ID ever contain a space in them
These assumptions are necessary because word()
uses spaces to determine when each word starts and ends.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论