在每个包含模式的字符串之前插入字符串。

huangapple go评论53阅读模式
英文:

Insert string in vector before every string containing pattern

问题

我有以下向量:

test <- c("这里有一些文本", "这里有一些其他文本", "这是我的公式", "2+2", "这是我的第二个公式", "4+4", "这里有更多文本", "这是我的最终公式", "6+6")

我希望的是,将包含字符串“公式”的每个实例,并在其前面插入一个随机字符串,例如“CCC”,以便我会得到类似以下内容:

test <- c("这里有一些文本", "这里有一些其他文本", "CCC", "这是我的公式", "2+2", "CCC", "这是我的第二个公式", "4+4", "这里有更多文本", "CCC", "这是我的最终公式", "6+6")
英文:

I have the following vector:

test <- c("here is some text", "here is some other text", "here is my formula", "2+2", "here is my second formula", "4+4", "here is even more text", "here is my final formula", "6+6")

What I'm hoping to do is take every instance of a string containing "formula" and inserting a random string like "CCC" in front of it so that I'd have something like the following:

test <- c("here is some text", "here is some other text", "CCC", "here is my formula", "2+2", "CCC", "here is my second formula", "4+4", "here is even more text", "CCC", "here is my final formula", "6+6")

答案1

得分: 5

These use only base R:

1) Use append as shown:

test2 <- test
ix <- rev(grep("formula", test)) - 1
for(i in ix) test2 <- append(test2, "CCC", i)
test2
##  [1] "here is some text"         "here is some other text"  
##  [3] "CCC"                       "here is my formula"       
##  [5] "2+2"                       "CCC"                      
##  [7] "here is my second formula" "4+4"                      
##  [9] "here is even more text"    "CCC"                      
## [11] "here is my final formula"  "6+6"         

2) Here are three different one-liners.

The first creates a matrix whose first row contains "CCC" and NA elements and whose second row is test. It then unravels that and removes the NAs.

The second iterates over the test, outputting the element if "formula" is not contained in it or a vector "CCC" followed by the element. This produces a list which is unlisted.

The third prefaces any element containing "formula" with "CCC\n" and then splits it up.

# 2a
c(na.omit(c(rbind(ifelse(grepl("formula", test), "CCC", NA), test)))

# 2b
unlist(lapply(test, function(x) if (grepl("formula", x)) c("CCC", x) else x))

# 2c
scan(text = sub("(.*formula)", "CCC\n\", test), what="", quiet=TRUE, sep="\n")
英文:

These use only base R:

1) Use append as shown:

test2 &lt;- test
ix &lt;- rev(grep(&quot;formula&quot;, test)) - 1
for(i in ix) test2 &lt;- append(test2, &quot;CCC&quot;, i)
test2
##  [1] &quot;here is some text&quot;         &quot;here is some other text&quot;  
##  [3] &quot;CCC&quot;                       &quot;here is my formula&quot;       
##  [5] &quot;2+2&quot;                       &quot;CCC&quot;                      
##  [7] &quot;here is my second formula&quot; &quot;4+4&quot;                      
##  [9] &quot;here is even more text&quot;    &quot;CCC&quot;                      
## [11] &quot;here is my final formula&quot;  &quot;6+6&quot;         

2) Here are three different one-liners.

The first creates a matrix whose first row contains "CCC" and NA elements and whose second row is test. It then unravels that and removes the NA's.

The second iterates over test outputting the element if formula is not contained in it or a vector "CCC" followed by the element. This produces a list which is unlisted.

The third prefaces any element containing formula with "CCC\n" and then splits it up.

# 2a
c(na.omit(c(rbind(ifelse(grepl(&quot;formula&quot;, test), &quot;CCC&quot;, NA), test))))

# 2b
unlist(lapply(test, function(x) if (grepl(&quot;formula&quot;, x)) c(&quot;CCC&quot;, x) else x))

# 2c
scan(text = sub(&quot;(.*formula)&quot;, &quot;CCC\n\&quot;, test), what=&quot;&quot;, quiet=TRUE, sep=&quot;\n&quot;)

答案2

得分: 2

Update: 改进后的代码:

library(tidyverse)

test %>%
  tibble(value = .) %>%
  mutate(index = row_number(), new_val = ifelse(str_detect(value, "formula"), "CCC", NA_character_)) %>%
  pivot_longer(c(new_val, value), values_drop_na = TRUE) %>%
  pull(value)

另一种方法如下:
我已经提到过了:对于我来说,在这种情况下,更容易使用数据框或表格来思考:

library(tidyverse)

test %>%
  as_tibble() %>%
  mutate(index = row_number()) %>%
  mutate(new_val = ifelse(str_detect(test, "formula"), "CCC", NA_character_)) %>%
  pivot_longer(c(new_val, value)) %>%
  drop_na() %>%
  pull(value)

输出:
[1] "here is some text" "here is some other text"
[3] "CCC" "here is my formula"
[5] "2+2" "CCC"
[7] "here is my second formula" "4+4"
[9] "here is even more text" "CCC"
[11] "here is my final formula" "6+6"

英文:

Update: Improved code:

library(tidyverse)

test %&gt;%
  tibble(value = .) %&gt;%
  mutate(index = row_number(), new_val = ifelse(str_detect(value, &quot;formula&quot;), &quot;CCC&quot;, NA_character_)) %&gt;%
  pivot_longer(c(new_val, value), values_drop_na = TRUE) %&gt;%
  pull(value)

Here is another approach:
I already mentioned this: For me it is much easier to think in terms of a data frame or a tibble in this situation:

library(tidyverse)

test %&gt;%
  as_tibble() %&gt;% 
  mutate(index = row_number()) %&gt;% 
  mutate(new_val = ifelse(str_detect(test, &quot;formula&quot;), &quot;CCC&quot;, NA_character_)) %&gt;% 
  pivot_longer(c(new_val, value)) %&gt;% 
  drop_na() %&gt;% 
  pull(value)

 [1] &quot;here is some text&quot;         &quot;here is some other text&quot;  
 [3] &quot;CCC&quot;                       &quot;here is my formula&quot;       
 [5] &quot;2+2&quot;                       &quot;CCC&quot;                      
 [7] &quot;here is my second formula&quot; &quot;4+4&quot;                      
 [9] &quot;here is even more text&quot;    &quot;CCC&quot;                      
[11] &quot;here is my final formula&quot;  &quot;6+6&quot;  

答案3

得分: 2

这里是`tidyverse`中的一个选项

library(dplyr)
library(stringr)
library(tidyr)
tibble(test) %>%
uncount(str_detect(test, 'formula') + 1) %>%
mutate(test = replace(test, duplicated(test, fromLast = TRUE), "CCC"))

-输出

一个 tibble: 12 × 1

test
<chr>
1 这里是一些文本
2 这里是一些其他文本
3 CCC
4 这是我的公式
5 2+2
6 CCC
7 这是我的第二个公式
8 4+4
9 这里有更多文本
10 CCC
11 这是我的最终公式
12 6+6

英文:

Here is one option in tidyverse

library(dplyr)
library(stringr)
library(tidyr)
tibble(test) %&gt;% 
 uncount(str_detect(test, &#39;formula&#39;) + 1) %&gt;%
  mutate(test = replace(test, duplicated(test, fromLast = TRUE), &quot;CCC&quot;))

-output

# A tibble: 12 &#215; 1
   test                     
   &lt;chr&gt;                    
 1 here is some text        
 2 here is some other text  
 3 CCC                      
 4 here is my formula       
 5 2+2                      
 6 CCC                      
 7 here is my second formula
 8 4+4                      
 9 here is even more text   
10 CCC                      
11 here is my final formula 
12 6+6            

答案4

得分: 1

test &lt;- c(&quot;这是一些文本&quot;, &quot;这是一些其他文本&quot;,
          &quot;这是我的公式&quot;, &quot;2+2&quot;, &quot;这是我的第二个公式&quot;,
          &quot;4+4&quot;, &quot;这是更多的文本&quot;, 
          &quot;这是我的最终公式&quot;, &quot;6+6&quot;)

j &lt;- cumsum(grepl(&quot;公式&quot;, test))
test &lt;- unname(unlist(by(test, j, FUN = \(x) c(&quot;CCC&quot;, x))))
if(j[1L] == 0) test &lt;- test[-1L]
test
#&gt;  [1] &quot;这是一些文本&quot;         &quot;这是一些其他文本&quot;  
#&gt;  [3] &quot;CCC&quot;                       &quot;这是我的公式&quot;       
#&gt;  [5] &quot;2+2&quot;                       &quot;CCC&quot;                      
#&gt;  [7] &quot;这是我的第二个公式&quot; &quot;4+4&quot;                      
#&gt;  [9] &quot;这是更多的文本&quot;    &quot;CCC&quot;                      
#&gt; [11] &quot;这是我的最终公式&quot;  &quot;6+6&quot;
英文:

Here is a base R way with grepl/cumsum to create a grouping vector with each group starting at a &quot;formula&quot; string. Then by will insert &quot;CCC&quot; in its place.

test &lt;- c(&quot;here is some text&quot;, &quot;here is some other text&quot;,
          &quot;here is my formula&quot;, &quot;2+2&quot;, &quot;here is my second formula&quot;,
          &quot;4+4&quot;, &quot;here is even more text&quot;, 
          &quot;here is my final formula&quot;, &quot;6+6&quot;)

j &lt;- cumsum(grepl(&quot;formula&quot;, test))
test &lt;- unname(unlist(by(test, j, FUN = \(x) c(&quot;CCC&quot;, x))))
if(j[1L] == 0) test &lt;- test[-1L]
test
#&gt;  [1] &quot;here is some text&quot;         &quot;here is some other text&quot;  
#&gt;  [3] &quot;CCC&quot;                       &quot;here is my formula&quot;       
#&gt;  [5] &quot;2+2&quot;                       &quot;CCC&quot;                      
#&gt;  [7] &quot;here is my second formula&quot; &quot;4+4&quot;                      
#&gt;  [9] &quot;here is even more text&quot;    &quot;CCC&quot;                      
#&gt; [11] &quot;here is my final formula&quot;  &quot;6+6&quot;

<sup>Created on 2023-03-30 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年3月31日 02:59:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75892022.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定