在R中删除字符串中的特定数据

huangapple go评论62阅读模式
英文:

remove specific data within the string in R

问题

I tried this function (gsub) but it deleted the specific element only. I'm wondering if I can use it to keep the gene symbol only (which always comes in the second place in the string) and delete everything else.

英文:

im new to R, i have this data frame and im trying to delet all the infromation from this column except the genes symbols which always comes secound in place within the string.
enter image description here
best regards!

i tried this function (gsub) but it deleted the specific element only . i`m wandring if i can use it to keep the gene symbol only ( which is always come in the secound place in the string) and delet every thing else

答案1

得分: 1

如果您的数据始终按照图像中显示的格式(其中基因ID始终是字符串的第三个“单词”)进行排列,那么stringr包中的word()函数可以提取您想要的数据。

library(stringr)

dat = data.frame(gene_assignment = rep(c('idnumbers // geneID // Other stuff'), 10))

dat$geneID = word(dat$gene_assignment, 3)

请注意,这里有以下假设:

  1. 您的数据始终按照某些ID号码,后跟“ // ”,然后是基因ID,再跟一个空格,最后是其他内容的格式排列。
  2. 前面的ID号码和基因ID都不包含空格。

这些假设是必要的,因为word()函数使用空格来确定每个单词的开始和结束。

英文:

If your data is consistently in the format shown in the image (where the gene ID is always the third "word" of the string), then the word() function from the stringr package can extract the data you want.

library(stringr)

dat = data.frame(gene_assignment = rep(c('idnumbers // geneID // Other stuff'),10))

dat$geneID = word(dat$gene_assignment, 3)

Note that this makes the following assumptions:

  1. Your data is always in the format where there are some id numbers, followed by " // ", followed by the gene ID, followed by a space, and then anything else
  2. Neither the ID numbers in the front nor the gene ID ever contain a space in them

These assumptions are necessary because word() uses spaces to determine when each word starts and ends.

huangapple
  • 本文由 发表于 2023年2月6日 15:44:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75358543.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定