英文:
Insert string in vector before every string containing pattern
问题
我有以下向量:
test <- c("这里有一些文本", "这里有一些其他文本", "这是我的公式", "2+2", "这是我的第二个公式", "4+4", "这里有更多文本", "这是我的最终公式", "6+6")
我希望的是,将包含字符串“公式”的每个实例,并在其前面插入一个随机字符串,例如“CCC”,以便我会得到类似以下内容:
test <- c("这里有一些文本", "这里有一些其他文本", "CCC", "这是我的公式", "2+2", "CCC", "这是我的第二个公式", "4+4", "这里有更多文本", "CCC", "这是我的最终公式", "6+6")
英文:
I have the following vector:
test <- c("here is some text", "here is some other text", "here is my formula", "2+2", "here is my second formula", "4+4", "here is even more text", "here is my final formula", "6+6")
What I'm hoping to do is take every instance of a string containing "formula" and inserting a random string like "CCC" in front of it so that I'd have something like the following:
test <- c("here is some text", "here is some other text", "CCC", "here is my formula", "2+2", "CCC", "here is my second formula", "4+4", "here is even more text", "CCC", "here is my final formula", "6+6")
答案1
得分: 5
These use only base R:
1) Use append
as shown:
test2 <- test
ix <- rev(grep("formula", test)) - 1
for(i in ix) test2 <- append(test2, "CCC", i)
test2
## [1] "here is some text" "here is some other text"
## [3] "CCC" "here is my formula"
## [5] "2+2" "CCC"
## [7] "here is my second formula" "4+4"
## [9] "here is even more text" "CCC"
## [11] "here is my final formula" "6+6"
2) Here are three different one-liners.
The first creates a matrix whose first row contains "CCC" and NA elements and whose second row is test
. It then unravels that and removes the NAs.
The second iterates over the test, outputting the element if "formula" is not contained in it or a vector "CCC" followed by the element. This produces a list which is unlisted.
The third prefaces any element containing "formula" with "CCC\n" and then splits it up.
# 2a
c(na.omit(c(rbind(ifelse(grepl("formula", test), "CCC", NA), test)))
# 2b
unlist(lapply(test, function(x) if (grepl("formula", x)) c("CCC", x) else x))
# 2c
scan(text = sub("(.*formula)", "CCC\n\", test), what="", quiet=TRUE, sep="\n")
英文:
These use only base R:
1) Use append
as shown:
test2 <- test
ix <- rev(grep("formula", test)) - 1
for(i in ix) test2 <- append(test2, "CCC", i)
test2
## [1] "here is some text" "here is some other text"
## [3] "CCC" "here is my formula"
## [5] "2+2" "CCC"
## [7] "here is my second formula" "4+4"
## [9] "here is even more text" "CCC"
## [11] "here is my final formula" "6+6"
2) Here are three different one-liners.
The first creates a matrix whose first row contains "CCC" and NA elements and whose second row is test
. It then unravels that and removes the NA's.
The second iterates over test outputting the element if formula is not contained in it or a vector "CCC" followed by the element. This produces a list which is unlisted.
The third prefaces any element containing formula with "CCC\n" and then splits it up.
# 2a
c(na.omit(c(rbind(ifelse(grepl("formula", test), "CCC", NA), test))))
# 2b
unlist(lapply(test, function(x) if (grepl("formula", x)) c("CCC", x) else x))
# 2c
scan(text = sub("(.*formula)", "CCC\n\", test), what="", quiet=TRUE, sep="\n")
答案2
得分: 2
Update: 改进后的代码:
library(tidyverse)
test %>%
tibble(value = .) %>%
mutate(index = row_number(), new_val = ifelse(str_detect(value, "formula"), "CCC", NA_character_)) %>%
pivot_longer(c(new_val, value), values_drop_na = TRUE) %>%
pull(value)
另一种方法如下:
我已经提到过了:对于我来说,在这种情况下,更容易使用数据框或表格来思考:
library(tidyverse)
test %>%
as_tibble() %>%
mutate(index = row_number()) %>%
mutate(new_val = ifelse(str_detect(test, "formula"), "CCC", NA_character_)) %>%
pivot_longer(c(new_val, value)) %>%
drop_na() %>%
pull(value)
输出:
[1] "here is some text" "here is some other text"
[3] "CCC" "here is my formula"
[5] "2+2" "CCC"
[7] "here is my second formula" "4+4"
[9] "here is even more text" "CCC"
[11] "here is my final formula" "6+6"
英文:
Update: Improved code:
library(tidyverse)
test %>%
tibble(value = .) %>%
mutate(index = row_number(), new_val = ifelse(str_detect(value, "formula"), "CCC", NA_character_)) %>%
pivot_longer(c(new_val, value), values_drop_na = TRUE) %>%
pull(value)
Here is another approach:
I already mentioned this: For me it is much easier to think in terms of a data frame or a tibble in this situation:
library(tidyverse)
test %>%
as_tibble() %>%
mutate(index = row_number()) %>%
mutate(new_val = ifelse(str_detect(test, "formula"), "CCC", NA_character_)) %>%
pivot_longer(c(new_val, value)) %>%
drop_na() %>%
pull(value)
[1] "here is some text" "here is some other text"
[3] "CCC" "here is my formula"
[5] "2+2" "CCC"
[7] "here is my second formula" "4+4"
[9] "here is even more text" "CCC"
[11] "here is my final formula" "6+6"
答案3
得分: 2
这里是`tidyverse`中的一个选项
library(dplyr)
library(stringr)
library(tidyr)
tibble(test) %>%
uncount(str_detect(test, 'formula') + 1) %>%
mutate(test = replace(test, duplicated(test, fromLast = TRUE), "CCC"))
-输出
一个 tibble: 12 × 1
test
<chr>
1 这里是一些文本
2 这里是一些其他文本
3 CCC
4 这是我的公式
5 2+2
6 CCC
7 这是我的第二个公式
8 4+4
9 这里有更多文本
10 CCC
11 这是我的最终公式
12 6+6
英文:
Here is one option in tidyverse
library(dplyr)
library(stringr)
library(tidyr)
tibble(test) %>%
uncount(str_detect(test, 'formula') + 1) %>%
mutate(test = replace(test, duplicated(test, fromLast = TRUE), "CCC"))
-output
# A tibble: 12 × 1
test
<chr>
1 here is some text
2 here is some other text
3 CCC
4 here is my formula
5 2+2
6 CCC
7 here is my second formula
8 4+4
9 here is even more text
10 CCC
11 here is my final formula
12 6+6
答案4
得分: 1
test <- c("这是一些文本", "这是一些其他文本",
"这是我的公式", "2+2", "这是我的第二个公式",
"4+4", "这是更多的文本",
"这是我的最终公式", "6+6")
j <- cumsum(grepl("公式", test))
test <- unname(unlist(by(test, j, FUN = \(x) c("CCC", x))))
if(j[1L] == 0) test <- test[-1L]
test
#> [1] "这是一些文本" "这是一些其他文本"
#> [3] "CCC" "这是我的公式"
#> [5] "2+2" "CCC"
#> [7] "这是我的第二个公式" "4+4"
#> [9] "这是更多的文本" "CCC"
#> [11] "这是我的最终公式" "6+6"
英文:
Here is a base R way with grepl/cumsum
to create a grouping vector with each group starting at a "formula"
string. Then by
will insert "CCC"
in its place.
test <- c("here is some text", "here is some other text",
"here is my formula", "2+2", "here is my second formula",
"4+4", "here is even more text",
"here is my final formula", "6+6")
j <- cumsum(grepl("formula", test))
test <- unname(unlist(by(test, j, FUN = \(x) c("CCC", x))))
if(j[1L] == 0) test <- test[-1L]
test
#> [1] "here is some text" "here is some other text"
#> [3] "CCC" "here is my formula"
#> [5] "2+2" "CCC"
#> [7] "here is my second formula" "4+4"
#> [9] "here is even more text" "CCC"
#> [11] "here is my final formula" "6+6"
<sup>Created on 2023-03-30 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论