R: save a regex match to a new variable while removing the regex match from the existing variable using `str_extract()`

huangapple go评论100阅读模式
英文:

R: save a regex match to a new variable while removing the regex match from the existing variable using `str_extract()`

问题

  1. 我想将正则表达式匹配保存到新变量,并在同一个函数中从现有变量中移除正则表达式匹配。
  2. 所以在以下示例中,我想从句子中移除“Speaker. ”并将其保存到`new_variable`,同时也从`sentence`中移除“Speaker. ”。
  3. 我尝试使用`str_extract()`来实现这一点。我能够匹配所需的单词,但该单词未从`sentence`中移除。(这可能不是`str_extract()`的设计目的,那么`str_extract()``str_match()`之间有什么区别?)

library(stringr)

sentence <- "Speaker. I am not sure why this does not work."

for(line in sentence) {
new_variable <- str_extract(line, "^[[:alpha:]]+\. ")
}

  1. 我知道可以使用`str_replace()`从句子中删除正则表达式匹配,但如果可能的话,我更愿意使用一个函数来实现这一点。

library(stringr)

sentence <- "Speaker. I am not sure why this does not work."

for(line in sentence) {
new_variable <- str_extract(line, "^[[:alpha:]]+\. ")
sentence <- str_replace(sentence, test, "")
}

英文:

I want to save a regex match to a new variable and remove the regex match from the existing variable using one function.

So in the following example, I want to remove "Speaker. " from sentence and save it to new_variable while also removing "Speaker. " from sentence.

I tried to accomplish this with str_extract(). I am able to match the desired word, but the word is not removed from sentence. (This may not be what str_extract() was designed to do, but then what is the difference between str_extract() and str_match()??)

  1. library(stringr)
  2. sentence &lt;- &quot;Speaker. I am not sure why this does not work.&quot;
  3. for(line in sentence) {
  4. new_variable &lt;- str_extract(line, &quot;^[[:alpha:]]+\\. &quot;)
  5. }

I know I can use str_replace() to remove the regex match from sentence, but I would prefer to do this with one function if possible.

  1. library(stringr)
  2. sentence &lt;- &quot;Speaker. I am not sure why this does not work.&quot;
  3. for(line in sentence) {
  4. new_variable &lt;- str_extract(line, &quot;^[[:alpha:]]+\\. &quot;)
  5. sentence &lt;- str_replace(sentence, test, &quot;&quot;)
  6. }

答案1

得分: 1

你可以使用 tidyr::separate_wider_regex 函数:

  1. library(tidyr)
  2. df <- data.frame(sentence = "Speaker. I am not sure why this does not work.")
  3. separate_wider_regex(df, sentence, c(new_variable = "^[[:alpha:]]+\\. ", sentence = ".*"))
  1. # A tibble: 1 × 2
  2. new_variable sentence
  3. <chr> <chr>
  4. 1 "Speaker. " I am not sure why this does not work.
英文:

You can use tidyr::separate_wider_regex

  1. library(tidyr)
  2. df &lt;- data.frame(sentence = &quot;Speaker. I am not sure why this does not work.&quot;)
  3. separate_wider_regex(df, sentence, c(new_variable = &quot;^[[:alpha:]]+\\. &quot;, sentence = &quot;.*&quot;))
  1. # A tibble: 1 &#215; 2
  2. new_variable sentence
  3. &lt;chr&gt; &lt;chr&gt;
  4. 1 &quot;Speaker. &quot; I am not sure why this does not work.

huangapple
  • 本文由 发表于 2023年3月21日 01:56:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/75793730.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定