2023年6月19日 18:03:40go评论102阅读模式

英文:

Split a string without considering special characters

问题

我需要一种方法，可以每n个字符分割一个字符串。

例如，让 s="QW%ERT%ZU%I%O%P"，n=3，我想得到 "QW%E" "RT%Z" "U%I%O" "%P"。

如您所见，特殊字符 "%" 在分割时不被考虑。

我尝试过

strsplit(s, "(?<=.{10})(?=.*\\%)", perl = TRUE)[[1]]，但我找不到获得我想要的方法。

英文:

I need a way to split a string every n letters.

For example, let s="QW%ERT%ZU%I%O%P" and n=3, I want to obtain "QW%E" "RT%Z" "U%I%O" "%P".

As you can see, the special character "%" is not considered in the division.

I tried with

strsplit(s, &quot;(?&lt;=.{10})(?=.*\\%)&quot;, perl = TRUE)[[1]]

but I cannot find a way to obtain what I want.

答案1

得分: 5

regmatches (而不是 strsplit) 如下所示？

> n <- 3
> regmatches(s, gregexpr(sprintf("(\\W?\\w){1,%i}", n), s))
[[1]]
[1] "QW%E"  "RT%Z"  "U%I%O" "%P"

或者 tapply + strsplit

v <- unlist(strsplit(s, ""))
l <- which(grepl("\\w", v))
tapply(
    v,
    cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
    paste0,
    collapse = ""
)

得到结果

      0       1       2       3
"QW%E"  "RT%Z" "U%I%O"    "%P"

注意：以上代码片段中的 HTML 转义字符（如 > 和 <）已被移除，以便进行代码翻译。

英文:

What about regmatches (instead of strsplit) like below?

&gt; n &lt;- 3
&gt; regmatches(s, gregexpr(sprintf(&quot;(\\W?\\w){1,%i}&quot;, n), s))
[[1]]
[1] &quot;QW%E&quot;  &quot;RT%Z&quot;  &quot;U%I%O&quot; &quot;%P&quot;

Or tapply + strsplit

v &lt;- unlist(strsplit(s, &quot;&quot;))
l &lt;- which(grepl(&quot;\\w&quot;, v))
tapply(
    v,
    cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
    paste0,
    collapse = &quot;&quot;
)

which gives

      0       1       2       3
 &quot;QW%E&quot;  &quot;RT%Z&quot; &quot;U%I%O&quot;    &quot;%P&quot;

答案2

得分: 1

以下是已翻译的代码部分：

# 仅考虑由正则表达式定义的特定字符，并将字符串标量拆分为向量中的单独元素的函数
split_string_to_vec <- function(s, n, consider_elements_pattern = "[[:alpha:]]"){
  # 确保 s 是字符标量：
  stopifnot(is.character(s) && length(s) == 1)
  # 确保 n 是整数标量： 
  stopifnot(is.numeric(n) && length(n) == 1)
  # 将字符串拆分为单独的元素：
  # str_vec => 字符向量
  str_vec <- unlist(strsplit(s, ""))
  # 为字符串向量分配索引： 
  # idx => 命名整数向量
  idx <- setNames(seq_len(length(str_vec)), str_vec)
  # 解决要考虑哪些值（仅限字母数字）：
  # considered_vals => 命名整数向量
  considered_vals <- idx[grepl(consider_elements_pattern, names(idx))]
  # 将字符串向量拆分为列表： 
  # grpd_strings => 字符向量的列表
  grpd_strings <- split(
    considered_vals,
    ceiling(seq_along(considered_vals) / n)
  )
  # 对于每个字符串组，解决具有适当字符顺序的组： res_vec => 字符向量
  res_vec <- vapply(
    seq_along(grpd_strings),
    function(i){
      # 获取当前列表元素： 
      curr <- grpd_strings[[i]]
      # 如果是第一个元素： 
      if(i == 1){
        # 忽略前一个元素，只关注这个 
        ir <- sort(c(curr, idx[min(curr):max(curr)]))
      # 否则：
      }else{
        # 解决前一个元素： 
        prev <- grpd_strings[[(i-1)]]
        # ir => 命名整数向量
        ir <- sort(c(curr, idx[(max(prev)+1):max(curr)]))
      }
      # 将结果展平为唯一（按 idx）的字符串： 
      # 字符标量 => 字符串
      paste0(
        names(
          subset(
            ir,
            !(duplicated(ir))
          )
        ),
        collapse = ""
      )
    },
    character(1)
  )
  # 明确定义返回的对象：
  # 字符向量 => 返回值
  return(res_vec) 
}
# 输入数据：
# s => 字符标量
s <- "QW%ERT%ZU%I%O%P"
# n => 整数标量
n <- 3
# 应用函数： 字符标量 => 控制台输出
split_string_to_vec(s, n, consider_elements_pattern = "[[:alpha:]]")

请注意，这段代码已经被翻译成中文，不包括代码的运行结果。

英文:

Much less succinct then the above but a Base Solution all the same:

# Function to only consider certain characters define by a regex
# and split a string scalar into seperate elements in a vector
split_string_to_vec &lt;- function(s, n, consider_elements_pattern = &quot;[[:alpha:]]&quot;){
# Ensure s is a character scalar:
stopifnot(is.character(s) &amp;&amp; length(s) == 1)
# Ensure n is an integer scalar: 
stopifnot(is.numeric(n) &amp;&amp; length(n) == 1)
# Split the string into separate elements:
# str_vec =&gt; character vector
str_vec &lt;- unlist(strsplit(s, &quot;&quot;))
# Assign an index to the string vector: 
# idx =&gt; named integer vector
idx &lt;- setNames(seq_len(length(str_vec)), str_vec)
# Resolve which values are to be considered (only alpha numerics):
# considered_vals =&gt; named integer vector
considered_vals &lt;- idx[grepl(consider_elements_pattern, names(idx))]
# Split the string vector into a list: 
# grpd_strings =&gt; list of character vectors
grpd_strings &lt;- split(
considered_vals,
ceiling(seq_along(considered_vals) / n)
)
# For each string group, resolve the group with the 
# appropriate characters in order: res_vec =&gt; character vector
res_vec &lt;- vapply(
seq_along(grpd_strings),
function(i){
# Get current list element: 
curr &lt;- grpd_strings[[i]]
# If its the first element: 
if(i == 1){
# Ignore previous element only focus on this 
# one: ir =&gt; named integer vector
ir &lt;- sort(c(curr, idx[min(curr):max(curr)]))
# Otherwise:
}else{
# Resolve the previous element: 
prev &lt;- grpd_strings[[(i-1)]]
# ir =&gt; named integer vector
ir &lt;- sort(c(curr, idx[(max(prev)+1):max(curr)]))
}
# Flatten result into a unique (by idx) string: 
# character scalar =&gt; env
paste0(
names(
subset(
ir,
!(duplicated(ir))
)
),
collapse = &quot;&quot;
)
},
character(1)
)
# Explicitly define the returned object:
# character vector =&gt; env
return(res_vec) 
}
# Input Data:
# s =&gt; string scalar
s &lt;- &quot;QW%ERT%ZU%I%O%P&quot;
# n =&gt; integer scalar
n &lt;- 3
# Apply the function: string scalar =&gt; stdout(console)
split_string_to_vec(s, n, consider_elements_pattern = &quot;[[:alpha:]]&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

分割字符串，不考虑特殊字符。

问题

答案1

答案2

如何处理泰文字符串中的组合字符以及\p{L}模式？

如何从另一个包中有条件地为S3通用函数提供S3方法？

如何使用R解决方程中的未知数？

Excel通过另一个范围替换文本

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。