英文:
How to replace specific string within string using dictionary from data frame?
问题
listn2 <- stringr::str_replace_all(string = listn,
pattern = ddf$old,
replacement = ddf$new)
英文:
I have a dictionary of bipartite taxonomic names (each composed of two words) in the form of a data frame (ddf
). Where there are columns with old (ddf$old
) and new (ddf$new
) names. I would like to find & replace the old names with the new names in a list of strings (listn
). However, the taxonomic names are a part of the string. Some old names may occur more than once and some may never appear in the list. The old names proceed with a string of different lengths and sometimes are followed by an additional string.
#sample list, the real has >300000 entries
listn <- c("AB001440.1.1538 Pseudomonas coronafaciens pv. atropurpurea", "HG530070.1.1349 Trueperella pyogenes",
"ET631036.837.2346 Jonquetella anthropi", "AB001448.1.1538 Pseudomonas savastanoi pv. phaseolicola",
"HG530249.1.1462 Paucibacter toxinivorans", "HG530235.1.1493 Paucibacter toxinivorans",
"AB001781.1.1507 Chlamydia psittaci", "AB001785.1.1507 Chlamydia felis",
"AB001804.1.1507 Chlamydia psittaci", "AB001794.1.1507 Chlamydia psittaci")
#sample dictionary, the real one has >400 entries
ddf <- data.frame(old = c("Lactobacillus casei", "Trueperella pyogenes", "Pseudomonas savastanoi"),
new = c("Newbacillus casei", "Newperella pyogenes", "Newudomonas savastanoi"))
I tried:
listn2 <- stringr::str_replace_all(string = listn,
pattern = ddf$old,
replacement = ddf$new)
list2 <- gsub(ddf$old, ddf$new, listn)
None of the methods worked. I expect to have two specific words replaced by another two words based on the dictionary from the data frame. The old and new names are of different lengths.
I tried ChatGPT, but it wasn't smart enough and provided such a script:
# Sample input data
selected_strings <- c("John likes apples", "Mary eats bananas", "David enjoys grapes")
dictionary <- data.frame(old_name = c("John", "Mary", "David"),
new_name = c("Peter", "Alice", "Michael"),
stringsAsFactors = FALSE)
# Function to replace bipartite names within selected strings
replace_names <- function(strings, dictionary) {
# Split the strings into individual words
words <- unlist(strsplit(strings, " "))
# Replace the names using the dictionary
replaced_words <- words
for (i in seq_len(nrow(dictionary))) {
replaced_words[words == dictionary$old_name[i]] <- dictionary$new_name[i]
}
# Reconstruct the modified strings
modified_strings <- sapply(strsplit(strings, " "), function(x) paste(replaced_words[x], collapse = " "))
return(modified_strings)
}
# Call the function to replace names within selected strings
modified_strings <- replace_names(selected_strings, dictionary)
# Print the modified strings
print(modified_strings)
答案1
得分: 1
你可以使用一个命名的替换:
library(tidyverse)
str_replace_all(listn, deframe(ddf))
同样可以使用以下方式:
str_replace_all(listn, setNames(ddf$new, ddf$old))
英文:
You can use a named replacement:
library(tidyverse)
str_replace_all(listn, deframe(ddf))
[1] "AB001440.1.1538 Pseudomonas coronafaciens pv. atropurpurea"
[2] "HG530070.1.1349 Newperella pyogenes"
[3] "ET631036.837.2346 Jonquetella anthropi"
[4] "AB001448.1.1538 Newudomonas savastanoi pv. phaseolicola"
[5] "HG530249.1.1462 Paucibacter toxinivorans"
[6] "HG530235.1.1493 Paucibacter toxinivorans"
[7] "AB001781.1.1507 Chlamydia psittaci"
[8] "AB001785.1.1507 Chlamydia felis"
[9] "AB001804.1.1507 Chlamydia psittaci"
[10] "AB001794.1.1507 Chlamydia psittaci"
same as:
str_replace_all(listn, setNames(ddf$new, ddf$old))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论