英文:
Split a string without considering special characters
问题
我需要一种方法,可以每n个字符分割一个字符串。
例如,让 s="QW%ERT%ZU%I%O%P"
,n=3
,我想得到 "QW%E"
"RT%Z"
"U%I%O"
"%P"
。
如您所见,特殊字符 "%" 在分割时不被考虑。
我尝试过
strsplit(s, "(?<=.{10})(?=.*\\%)", perl = TRUE)[[1]]
,但我找不到获得我想要的方法。
英文:
I need a way to split a string every n letters.
For example, let s="QW%ERT%ZU%I%O%P"
and n=3
, I want to obtain "QW%E" "RT%Z" "U%I%O" "%P"
.
As you can see, the special character "%" is not considered in the division.
I tried with
strsplit(s, "(?<=.{10})(?=.*\\%)", perl = TRUE)[[1]]
but I cannot find a way to obtain what I want.
答案1
得分: 5
regmatches
(而不是 strsplit
) 如下所示?
> n <- 3
> regmatches(s, gregexpr(sprintf("(\\W?\\w){1,%i}", n), s))
[[1]]
[1] "QW%E" "RT%Z" "U%I%O" "%P"
或者 tapply
+ strsplit
v <- unlist(strsplit(s, ""))
l <- which(grepl("\\w", v))
tapply(
v,
cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
paste0,
collapse = ""
)
得到结果
0 1 2 3
"QW%E" "RT%Z" "U%I%O" "%P"
注意:以上代码片段中的 HTML 转义字符(如 >
和 <
)已被移除,以便进行代码翻译。
英文:
What about regmatches
(instead of strsplit
) like below?
> n <- 3
> regmatches(s, gregexpr(sprintf("(\\W?\\w){1,%i}", n), s))
[[1]]
[1] "QW%E" "RT%Z" "U%I%O" "%P"
Or tapply
+ strsplit
v <- unlist(strsplit(s, ""))
l <- which(grepl("\\w", v))
tapply(
v,
cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
paste0,
collapse = ""
)
which gives
0 1 2 3
"QW%E" "RT%Z" "U%I%O" "%P"
答案2
得分: 1
以下是已翻译的代码部分:
# 仅考虑由正则表达式定义的特定字符,并将字符串标量拆分为向量中的单独元素的函数
split_string_to_vec <- function(s, n, consider_elements_pattern = "[[:alpha:]]"){
# 确保 s 是字符标量:
stopifnot(is.character(s) && length(s) == 1)
# 确保 n 是整数标量:
stopifnot(is.numeric(n) && length(n) == 1)
# 将字符串拆分为单独的元素:
# str_vec => 字符向量
str_vec <- unlist(strsplit(s, ""))
# 为字符串向量分配索引:
# idx => 命名整数向量
idx <- setNames(seq_len(length(str_vec)), str_vec)
# 解决要考虑哪些值(仅限字母数字):
# considered_vals => 命名整数向量
considered_vals <- idx[grepl(consider_elements_pattern, names(idx))]
# 将字符串向量拆分为列表:
# grpd_strings => 字符向量的列表
grpd_strings <- split(
considered_vals,
ceiling(seq_along(considered_vals) / n)
)
# 对于每个字符串组,解决具有适当字符顺序的组: res_vec => 字符向量
res_vec <- vapply(
seq_along(grpd_strings),
function(i){
# 获取当前列表元素:
curr <- grpd_strings[[i]]
# 如果是第一个元素:
if(i == 1){
# 忽略前一个元素,只关注这个
ir <- sort(c(curr, idx[min(curr):max(curr)]))
# 否则:
}else{
# 解决前一个元素:
prev <- grpd_strings[[(i-1)]]
# ir => 命名整数向量
ir <- sort(c(curr, idx[(max(prev)+1):max(curr)]))
}
# 将结果展平为唯一(按 idx)的字符串:
# 字符标量 => 字符串
paste0(
names(
subset(
ir,
!(duplicated(ir))
)
),
collapse = ""
)
},
character(1)
)
# 明确定义返回的对象:
# 字符向量 => 返回值
return(res_vec)
}
# 输入数据:
# s => 字符标量
s <- "QW%ERT%ZU%I%O%P"
# n => 整数标量
n <- 3
# 应用函数: 字符标量 => 控制台输出
split_string_to_vec(s, n, consider_elements_pattern = "[[:alpha:]]")
请注意,这段代码已经被翻译成中文,不包括代码的运行结果。
英文:
Much less succinct then the above but a Base Solution all the same:
# Function to only consider certain characters define by a regex
# and split a string scalar into seperate elements in a vector
split_string_to_vec <- function(s, n, consider_elements_pattern = "[[:alpha:]]"){
# Ensure s is a character scalar:
stopifnot(is.character(s) && length(s) == 1)
# Ensure n is an integer scalar:
stopifnot(is.numeric(n) && length(n) == 1)
# Split the string into separate elements:
# str_vec => character vector
str_vec <- unlist(strsplit(s, ""))
# Assign an index to the string vector:
# idx => named integer vector
idx <- setNames(seq_len(length(str_vec)), str_vec)
# Resolve which values are to be considered (only alpha numerics):
# considered_vals => named integer vector
considered_vals <- idx[grepl(consider_elements_pattern, names(idx))]
# Split the string vector into a list:
# grpd_strings => list of character vectors
grpd_strings <- split(
considered_vals,
ceiling(seq_along(considered_vals) / n)
)
# For each string group, resolve the group with the
# appropriate characters in order: res_vec => character vector
res_vec <- vapply(
seq_along(grpd_strings),
function(i){
# Get current list element:
curr <- grpd_strings[[i]]
# If its the first element:
if(i == 1){
# Ignore previous element only focus on this
# one: ir => named integer vector
ir <- sort(c(curr, idx[min(curr):max(curr)]))
# Otherwise:
}else{
# Resolve the previous element:
prev <- grpd_strings[[(i-1)]]
# ir => named integer vector
ir <- sort(c(curr, idx[(max(prev)+1):max(curr)]))
}
# Flatten result into a unique (by idx) string:
# character scalar => env
paste0(
names(
subset(
ir,
!(duplicated(ir))
)
),
collapse = ""
)
},
character(1)
)
# Explicitly define the returned object:
# character vector => env
return(res_vec)
}
# Input Data:
# s => string scalar
s <- "QW%ERT%ZU%I%O%P"
# n => integer scalar
n <- 3
# Apply the function: string scalar => stdout(console)
split_string_to_vec(s, n, consider_elements_pattern = "[[:alpha:]]")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论