分割字符串,不考虑特殊字符。

huangapple go评论70阅读模式
英文:

Split a string without considering special characters

问题

我需要一种方法,可以每n个字符分割一个字符串。

例如,让 s="QW%ERT%ZU%I%O%P"n=3,我想得到 "QW%E" "RT%Z" "U%I%O" "%P"

如您所见,特殊字符 "%" 在分割时不被考虑。

我尝试过

strsplit(s, "(?<=.{10})(?=.*\\%)", perl = TRUE)[[1]],但我找不到获得我想要的方法。

英文:

I need a way to split a string every n letters.

For example, let s=&quot;QW%ERT%ZU%I%O%P&quot; and n=3, I want to obtain &quot;QW%E&quot; &quot;RT%Z&quot; &quot;U%I%O&quot; &quot;%P&quot;.

As you can see, the special character "%" is not considered in the division.

I tried with

strsplit(s, &quot;(?&lt;=.{10})(?=.*\\%)&quot;, perl = TRUE)[[1]]

but I cannot find a way to obtain what I want.

答案1

得分: 5

regmatches (而不是 strsplit) 如下所示?

> n <- 3
> regmatches(s, gregexpr(sprintf("(\\W?\\w){1,%i}", n), s))
[[1]]
[1] "QW%E"  "RT%Z"  "U%I%O" "%P"

或者 tapply + strsplit

v <- unlist(strsplit(s, ""))
l <- which(grepl("\\w", v))
tapply(
    v,
    cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
    paste0,
    collapse = ""
)

得到结果

      0       1       2       3
"QW%E"  "RT%Z" "U%I%O"    "%P"

注意:以上代码片段中的 HTML 转义字符(如 &gt;&lt;)已被移除,以便进行代码翻译。

英文:

What about regmatches (instead of strsplit) like below?

&gt; n &lt;- 3

&gt; regmatches(s, gregexpr(sprintf(&quot;(\\W?\\w){1,%i}&quot;, n), s))
[[1]]
[1] &quot;QW%E&quot;  &quot;RT%Z&quot;  &quot;U%I%O&quot; &quot;%P&quot;

Or tapply + strsplit

v &lt;- unlist(strsplit(s, &quot;&quot;))
l &lt;- which(grepl(&quot;\\w&quot;, v))
tapply(
    v,
    cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
    paste0,
    collapse = &quot;&quot;
)

which gives

      0       1       2       3
 &quot;QW%E&quot;  &quot;RT%Z&quot; &quot;U%I%O&quot;    &quot;%P&quot;

答案2

得分: 1

以下是已翻译的代码部分:

# 仅考虑由正则表达式定义的特定字符,并将字符串标量拆分为向量中的单独元素的函数
split_string_to_vec <- function(s, n, consider_elements_pattern = "[[:alpha:]]"){
  # 确保 s 是字符标量:
  stopifnot(is.character(s) && length(s) == 1)
  # 确保 n 是整数标量: 
  stopifnot(is.numeric(n) && length(n) == 1)
  # 将字符串拆分为单独的元素:
  # str_vec => 字符向量
  str_vec <- unlist(strsplit(s, ""))
  # 为字符串向量分配索引: 
  # idx => 命名整数向量
  idx <- setNames(seq_len(length(str_vec)), str_vec)
  # 解决要考虑哪些值(仅限字母数字):
  # considered_vals => 命名整数向量
  considered_vals <- idx[grepl(consider_elements_pattern, names(idx))]
  # 将字符串向量拆分为列表: 
  # grpd_strings => 字符向量的列表
  grpd_strings <- split(
    considered_vals,
    ceiling(seq_along(considered_vals) / n)
  )
  # 对于每个字符串组,解决具有适当字符顺序的组: res_vec => 字符向量
  res_vec <- vapply(
    seq_along(grpd_strings),
    function(i){
      # 获取当前列表元素: 
      curr <- grpd_strings[[i]]
      # 如果是第一个元素: 
      if(i == 1){
        # 忽略前一个元素,只关注这个 
        ir <- sort(c(curr, idx[min(curr):max(curr)]))
      # 否则:
      }else{
        # 解决前一个元素: 
        prev <- grpd_strings[[(i-1)]]
        # ir => 命名整数向量
        ir <- sort(c(curr, idx[(max(prev)+1):max(curr)]))
      }
      # 将结果展平为唯一(按 idx)的字符串: 
      # 字符标量 => 字符串
      paste0(
        names(
          subset(
            ir,
            !(duplicated(ir))
          )
        ),
        collapse = ""
      )
    },
    character(1)
  )
  # 明确定义返回的对象:
  # 字符向量 => 返回值
  return(res_vec) 
}
# 输入数据:
# s => 字符标量
s <- "QW%ERT%ZU%I%O%P"
# n => 整数标量
n <- 3
# 应用函数: 字符标量 => 控制台输出
split_string_to_vec(s, n, consider_elements_pattern = "[[:alpha:]]")

请注意,这段代码已经被翻译成中文,不包括代码的运行结果。

英文:

Much less succinct then the above but a Base Solution all the same:

# Function to only consider certain characters define by a regex
# and split a string scalar into seperate elements in a vector
split_string_to_vec &lt;- function(s, n, consider_elements_pattern = &quot;[[:alpha:]]&quot;){
# Ensure s is a character scalar:
stopifnot(is.character(s) &amp;&amp; length(s) == 1)
# Ensure n is an integer scalar: 
stopifnot(is.numeric(n) &amp;&amp; length(n) == 1)
# Split the string into separate elements:
# str_vec =&gt; character vector
str_vec &lt;- unlist(strsplit(s, &quot;&quot;))
# Assign an index to the string vector: 
# idx =&gt; named integer vector
idx &lt;- setNames(seq_len(length(str_vec)), str_vec)
# Resolve which values are to be considered (only alpha numerics):
# considered_vals =&gt; named integer vector
considered_vals &lt;- idx[grepl(consider_elements_pattern, names(idx))]
# Split the string vector into a list: 
# grpd_strings =&gt; list of character vectors
grpd_strings &lt;- split(
considered_vals,
ceiling(seq_along(considered_vals) / n)
)
# For each string group, resolve the group with the 
# appropriate characters in order: res_vec =&gt; character vector
res_vec &lt;- vapply(
seq_along(grpd_strings),
function(i){
# Get current list element: 
curr &lt;- grpd_strings[[i]]
# If its the first element: 
if(i == 1){
# Ignore previous element only focus on this 
# one: ir =&gt; named integer vector
ir &lt;- sort(c(curr, idx[min(curr):max(curr)]))
# Otherwise:
}else{
# Resolve the previous element: 
prev &lt;- grpd_strings[[(i-1)]]
# ir =&gt; named integer vector
ir &lt;- sort(c(curr, idx[(max(prev)+1):max(curr)]))
}
# Flatten result into a unique (by idx) string: 
# character scalar =&gt; env
paste0(
names(
subset(
ir,
!(duplicated(ir))
)
),
collapse = &quot;&quot;
)
},
character(1)
)
# Explicitly define the returned object:
# character vector =&gt; env
return(res_vec) 
}
# Input Data:
# s =&gt; string scalar
s &lt;- &quot;QW%ERT%ZU%I%O%P&quot;
# n =&gt; integer scalar
n &lt;- 3
# Apply the function: string scalar =&gt; stdout(console)
split_string_to_vec(s, n, consider_elements_pattern = &quot;[[:alpha:]]&quot;)

huangapple
  • 本文由 发表于 2023年6月19日 18:03:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76505572.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定