R: save a regex match to a new variable while removing the regex match from the existing variable using `str_extract()`

huangapple go评论67阅读模式
英文:

R: save a regex match to a new variable while removing the regex match from the existing variable using `str_extract()`

问题

我想将正则表达式匹配保存到新变量,并在同一个函数中从现有变量中移除正则表达式匹配。

所以在以下示例中,我想从句子中移除“Speaker. ”并将其保存到`new_variable`,同时也从`sentence`中移除“Speaker. ”。

我尝试使用`str_extract()`来实现这一点。我能够匹配所需的单词,但该单词未从`sentence`中移除。(这可能不是`str_extract()`的设计目的,那么`str_extract()`和`str_match()`之间有什么区别?)

library(stringr)

sentence <- "Speaker. I am not sure why this does not work."

for(line in sentence) {
new_variable <- str_extract(line, "^[[:alpha:]]+\. ")
}


我知道可以使用`str_replace()`从句子中删除正则表达式匹配,但如果可能的话,我更愿意使用一个函数来实现这一点。

library(stringr)

sentence <- "Speaker. I am not sure why this does not work."

for(line in sentence) {
new_variable <- str_extract(line, "^[[:alpha:]]+\. ")
sentence <- str_replace(sentence, test, "")
}

英文:

I want to save a regex match to a new variable and remove the regex match from the existing variable using one function.

So in the following example, I want to remove "Speaker. " from sentence and save it to new_variable while also removing "Speaker. " from sentence.

I tried to accomplish this with str_extract(). I am able to match the desired word, but the word is not removed from sentence. (This may not be what str_extract() was designed to do, but then what is the difference between str_extract() and str_match()??)

library(stringr)

sentence &lt;- &quot;Speaker. I am not sure why this does not work.&quot;

for(line in sentence) {
  new_variable &lt;- str_extract(line, &quot;^[[:alpha:]]+\\. &quot;) 
}

I know I can use str_replace() to remove the regex match from sentence, but I would prefer to do this with one function if possible.

library(stringr)

sentence &lt;- &quot;Speaker. I am not sure why this does not work.&quot;

for(line in sentence) {
  new_variable &lt;- str_extract(line, &quot;^[[:alpha:]]+\\. &quot;)
  sentence &lt;- str_replace(sentence, test, &quot;&quot;)
}

答案1

得分: 1

你可以使用 tidyr::separate_wider_regex 函数:

library(tidyr)

df <- data.frame(sentence = "Speaker. I am not sure why this does not work.")

separate_wider_regex(df, sentence, c(new_variable = "^[[:alpha:]]+\\. ", sentence = ".*"))
# A tibble: 1 × 2
  new_variable sentence                             
  <chr>        <chr>                                
1 "Speaker. "  I am not sure why this does not work.
英文:

You can use tidyr::separate_wider_regex

library(tidyr)

df &lt;- data.frame(sentence = &quot;Speaker. I am not sure why this does not work.&quot;)

separate_wider_regex(df, sentence, c(new_variable = &quot;^[[:alpha:]]+\\. &quot;, sentence = &quot;.*&quot;))
# A tibble: 1 &#215; 2
  new_variable sentence                             
  &lt;chr&gt;        &lt;chr&gt;                                
1 &quot;Speaker. &quot;  I am not sure why this does not work.

huangapple
  • 本文由 发表于 2023年3月21日 01:56:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793730.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定