2023年2月6日 15:44:03go评论95阅读模式

英文:

remove specific data within the string in R

问题

I tried this function (gsub) but it deleted the specific element only. I'm wondering if I can use it to keep the gene symbol only (which always comes in the second place in the string) and delete everything else.

英文:

im new to R, i have this data frame and im trying to delet all the infromation from this column except the genes symbols which always comes secound in place within the string.
enter image description here
best regards!

i tried this function (gsub) but it deleted the specific element only . i`m wandring if i can use it to keep the gene symbol only ( which is always come in the secound place in the string) and delet every thing else

答案1

得分: 1

如果您的数据始终按照图像中显示的格式（其中基因ID始终是字符串的第三个“单词”）进行排列，那么stringr包中的word()函数可以提取您想要的数据。

library(stringr)
dat = data.frame(gene_assignment = rep(c('idnumbers // geneID // Other stuff'), 10))
dat$geneID = word(dat$gene_assignment, 3)

请注意，这里有以下假设：

您的数据始终按照某些ID号码，后跟“ // ”，然后是基因ID，再跟一个空格，最后是其他内容的格式排列。
前面的ID号码和基因ID都不包含空格。

这些假设是必要的，因为word()函数使用空格来确定每个单词的开始和结束。

英文:

If your data is consistently in the format shown in the image (where the gene ID is always the third "word" of the string), then the word() function from the stringr package can extract the data you want.

library(stringr)
dat = data.frame(gene_assignment = rep(c(&#39;idnumbers // geneID // Other stuff&#39;),10))
dat$geneID = word(dat$gene_assignment, 3)

Note that this makes the following assumptions:

Your data is always in the format where there are some id numbers, followed by " // ", followed by the gene ID, followed by a space, and then anything else
Neither the ID numbers in the front nor the gene ID ever contain a space in them

These assumptions are necessary because word() uses spaces to determine when each word starts and ends.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中删除字符串中的特定数据

问题

答案1

提取第二个下划线和点号之前的字符串： R

Laplace分布用于fitdist。

How to read a csv in R with backtick as string encoser and ¥‎ as escape character?

将固定值相加以获得累积总和。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。