分割字符串,不考虑特殊字符。

huangapple go评论88阅读模式
英文:

Split a string without considering special characters

问题

我需要一种方法,可以每n个字符分割一个字符串。

例如,让 s="QW%ERT%ZU%I%O%P"n=3,我想得到 "QW%E" "RT%Z" "U%I%O" "%P"

如您所见,特殊字符 "%" 在分割时不被考虑。

我尝试过

strsplit(s, "(?<=.{10})(?=.*\\%)", perl = TRUE)[[1]],但我找不到获得我想要的方法。

英文:

I need a way to split a string every n letters.

For example, let s=&quot;QW%ERT%ZU%I%O%P&quot; and n=3, I want to obtain &quot;QW%E&quot; &quot;RT%Z&quot; &quot;U%I%O&quot; &quot;%P&quot;.

As you can see, the special character "%" is not considered in the division.

I tried with

  1. strsplit(s, &quot;(?&lt;=.{10})(?=.*\\%)&quot;, perl = TRUE)[[1]]

but I cannot find a way to obtain what I want.

答案1

得分: 5

regmatches (而不是 strsplit) 如下所示?

  1. > n <- 3
  2. > regmatches(s, gregexpr(sprintf("(\\W?\\w){1,%i}", n), s))
  3. [[1]]
  4. [1] "QW%E" "RT%Z" "U%I%O" "%P"

或者 tapply + strsplit

  1. v <- unlist(strsplit(s, ""))
  2. l <- which(grepl("\\w", v))
  3. tapply(
  4. v,
  5. cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
  6. paste0,
  7. collapse = ""
  8. )

得到结果

  1. 0 1 2 3
  2. "QW%E" "RT%Z" "U%I%O" "%P"

注意:以上代码片段中的 HTML 转义字符(如 &gt;&lt;)已被移除,以便进行代码翻译。

英文:

What about regmatches (instead of strsplit) like below?

  1. &gt; n &lt;- 3
  2. &gt; regmatches(s, gregexpr(sprintf(&quot;(\\W?\\w){1,%i}&quot;, n), s))
  3. [[1]]
  4. [1] &quot;QW%E&quot; &quot;RT%Z&quot; &quot;U%I%O&quot; &quot;%P&quot;

Or tapply + strsplit

  1. v &lt;- unlist(strsplit(s, &quot;&quot;))
  2. l &lt;- which(grepl(&quot;\\w&quot;, v))
  3. tapply(
  4. v,
  5. cumsum(seq_along(v) %in% (1 + by(l, ceiling(seq_along(l) / n), max))),
  6. paste0,
  7. collapse = &quot;&quot;
  8. )

which gives

  1. 0 1 2 3
  2. &quot;QW%E&quot; &quot;RT%Z&quot; &quot;U%I%O&quot; &quot;%P&quot;

答案2

得分: 1

以下是已翻译的代码部分:

  1. # 仅考虑由正则表达式定义的特定字符,并将字符串标量拆分为向量中的单独元素的函数
  2. split_string_to_vec <- function(s, n, consider_elements_pattern = "[[:alpha:]]"){
  3. # 确保 s 是字符标量:
  4. stopifnot(is.character(s) && length(s) == 1)
  5. # 确保 n 是整数标量:
  6. stopifnot(is.numeric(n) && length(n) == 1)
  7. # 将字符串拆分为单独的元素:
  8. # str_vec => 字符向量
  9. str_vec <- unlist(strsplit(s, ""))
  10. # 为字符串向量分配索引:
  11. # idx => 命名整数向量
  12. idx <- setNames(seq_len(length(str_vec)), str_vec)
  13. # 解决要考虑哪些值(仅限字母数字):
  14. # considered_vals => 命名整数向量
  15. considered_vals <- idx[grepl(consider_elements_pattern, names(idx))]
  16. # 将字符串向量拆分为列表:
  17. # grpd_strings => 字符向量的列表
  18. grpd_strings <- split(
  19. considered_vals,
  20. ceiling(seq_along(considered_vals) / n)
  21. )
  22. # 对于每个字符串组,解决具有适当字符顺序的组: res_vec => 字符向量
  23. res_vec <- vapply(
  24. seq_along(grpd_strings),
  25. function(i){
  26. # 获取当前列表元素:
  27. curr <- grpd_strings[[i]]
  28. # 如果是第一个元素:
  29. if(i == 1){
  30. # 忽略前一个元素,只关注这个
  31. ir <- sort(c(curr, idx[min(curr):max(curr)]))
  32. # 否则:
  33. }else{
  34. # 解决前一个元素:
  35. prev <- grpd_strings[[(i-1)]]
  36. # ir => 命名整数向量
  37. ir <- sort(c(curr, idx[(max(prev)+1):max(curr)]))
  38. }
  39. # 将结果展平为唯一(按 idx)的字符串:
  40. # 字符标量 => 字符串
  41. paste0(
  42. names(
  43. subset(
  44. ir,
  45. !(duplicated(ir))
  46. )
  47. ),
  48. collapse = ""
  49. )
  50. },
  51. character(1)
  52. )
  53. # 明确定义返回的对象:
  54. # 字符向量 => 返回值
  55. return(res_vec)
  56. }
  57. # 输入数据:
  58. # s => 字符标量
  59. s <- "QW%ERT%ZU%I%O%P"
  60. # n => 整数标量
  61. n <- 3
  62. # 应用函数: 字符标量 => 控制台输出
  63. split_string_to_vec(s, n, consider_elements_pattern = "[[:alpha:]]")

请注意,这段代码已经被翻译成中文,不包括代码的运行结果。

英文:

Much less succinct then the above but a Base Solution all the same:

  1. # Function to only consider certain characters define by a regex
  2. # and split a string scalar into seperate elements in a vector
  3. split_string_to_vec &lt;- function(s, n, consider_elements_pattern = &quot;[[:alpha:]]&quot;){
  4. # Ensure s is a character scalar:
  5. stopifnot(is.character(s) &amp;&amp; length(s) == 1)
  6. # Ensure n is an integer scalar:
  7. stopifnot(is.numeric(n) &amp;&amp; length(n) == 1)
  8. # Split the string into separate elements:
  9. # str_vec =&gt; character vector
  10. str_vec &lt;- unlist(strsplit(s, &quot;&quot;))
  11. # Assign an index to the string vector:
  12. # idx =&gt; named integer vector
  13. idx &lt;- setNames(seq_len(length(str_vec)), str_vec)
  14. # Resolve which values are to be considered (only alpha numerics):
  15. # considered_vals =&gt; named integer vector
  16. considered_vals &lt;- idx[grepl(consider_elements_pattern, names(idx))]
  17. # Split the string vector into a list:
  18. # grpd_strings =&gt; list of character vectors
  19. grpd_strings &lt;- split(
  20. considered_vals,
  21. ceiling(seq_along(considered_vals) / n)
  22. )
  23. # For each string group, resolve the group with the
  24. # appropriate characters in order: res_vec =&gt; character vector
  25. res_vec &lt;- vapply(
  26. seq_along(grpd_strings),
  27. function(i){
  28. # Get current list element:
  29. curr &lt;- grpd_strings[[i]]
  30. # If its the first element:
  31. if(i == 1){
  32. # Ignore previous element only focus on this
  33. # one: ir =&gt; named integer vector
  34. ir &lt;- sort(c(curr, idx[min(curr):max(curr)]))
  35. # Otherwise:
  36. }else{
  37. # Resolve the previous element:
  38. prev &lt;- grpd_strings[[(i-1)]]
  39. # ir =&gt; named integer vector
  40. ir &lt;- sort(c(curr, idx[(max(prev)+1):max(curr)]))
  41. }
  42. # Flatten result into a unique (by idx) string:
  43. # character scalar =&gt; env
  44. paste0(
  45. names(
  46. subset(
  47. ir,
  48. !(duplicated(ir))
  49. )
  50. ),
  51. collapse = &quot;&quot;
  52. )
  53. },
  54. character(1)
  55. )
  56. # Explicitly define the returned object:
  57. # character vector =&gt; env
  58. return(res_vec)
  59. }
  60. # Input Data:
  61. # s =&gt; string scalar
  62. s &lt;- &quot;QW%ERT%ZU%I%O%P&quot;
  63. # n =&gt; integer scalar
  64. n &lt;- 3
  65. # Apply the function: string scalar =&gt; stdout(console)
  66. split_string_to_vec(s, n, consider_elements_pattern = &quot;[[:alpha:]]&quot;)

huangapple
  • 本文由 发表于 2023年6月19日 18:03:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76505572.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定