使用strsplit()函数来分割一个数字字符串,并用字符替换部分。

huangapple go评论91阅读模式
英文:

Using strsplit() to split up a numeric string and replace part with characters

问题

我有一个字符串,我想将其拆分并将最后的两个数字替换为字符。例如,字符串"1-1-2-2"将变为"1-1-B-B"。我已经包括了我尝试的代码片段和我的尝试,希望这样更清楚。

  1. > df
  2. num
  3. 1-1-26-2
  4. 1-2-2-4
  5. 1-2-4-5
  6. 1-3-25-1

现在我已经尝试使用strsplit(num, '-')来拆分旧的num列,但不确定如何使用下面的替换df来替换最后的两个数字。

  1. > replacement_df
  2. character num
  3. A 1
  4. B 2
  5. D 4
  6. E 5
  7. Y 25
  8. Z 26
英文:

I have a string which I want to split up and replace the last 2 numbers with characters. So for example a string of "1-1-2-2" would become "1-1-B-B". I have included a snippet of what I'm trying to do and my attempt so far and hopefully it becomes a bit clearer.

  1. > df
  2. num
  3. 1-1-26-2
  4. 1-2-2-4
  5. 1-2-4-5
  6. 1-3-25-1

So now I have attempted to split up the old_num column using strsplit(num, '-') but unsure of how to replace the last 2 digits with the characters using the replacement df from below

  1. > replacement_df
  2. character num
  3. A 1
  4. B 2
  5. D 4
  6. E 5
  7. Y 25
  8. Z 26

答案1

得分: 2

  1. 像这样吗?
  2. ```r
  3. replace_nums <- function(x, n = 2) {
  4. x_split <- unlist(strsplit(x, "-"))
  5. x_tail <- tail(x_split, n)
  6. paste(c(
  7. head(x_split, -n),
  8. LETTERS[as.integer(x_tail)]
  9. ), collapse = "-")
  10. }
  11. x <- c("1-1-2-2")
  12. replace_nums(x)
  13. # [1] "1-1-B-B"

或者对于矢量化版本:

  1. replace_nums_df <- function(x, n = 2) {
  2. x_split <- strsplit(x, "-")
  3. x_tail <- lapply(x_split, \(x) tail(x, n))
  4. Map(\(split_str, tail_str) {
  5. paste(c(
  6. head(split_str, -n),
  7. LETTERS[as.integer(tail_str)]
  8. ), collapse = "-")
  9. }, x_split, x_tail)
  10. }
  11. df$replaced <- replace_nums_df(df$num)
  12. df
  13. # num replaced
  14. # 1 1-1-26-2 1-1-Z-B
  15. # 2 1-2-2-4 1-2-B-D
  16. # 3 1-2-4-5 1-2-D-E
  17. # 4 1-3-25-1 1-3-Y-A
  1. <details>
  2. <summary>英文:</summary>
  3. Something like this?
  4. ```r
  5. replace_nums &lt;- function(x, n = 2) {
  6. x_split &lt;- unlist(strsplit(x, &quot;-&quot;))
  7. x_tail &lt;- tail(x_split, n)
  8. paste(c(
  9. head(x_split, -n),
  10. LETTERS[as.integer(x_tail)]
  11. ), collapse = &quot;-&quot;)
  12. }
  13. x &lt;- c(&quot;1-1-2-2&quot;)
  14. replace_nums(x)
  15. # [1] &quot;1-1-B-B&quot;

Or for a vectorised version:

  1. replace_nums_df &lt;- function(x, n = 2) {
  2. x_split &lt;- strsplit(x, &quot;-&quot;)
  3. x_tail &lt;- lapply(x_split, \(x) tail(x, n))
  4. Map(\(split_str, tail_str) {
  5. paste(c(
  6. head(split_str, -n),
  7. LETTERS[as.integer(tail_str)]
  8. ), collapse = &quot;-&quot;)
  9. }, x_split, x_tail)
  10. }
  11. df$replaced &lt;- replace_nums_df(df$num)
  12. df
  13. # num replaced
  14. # 1 1-1-26-2 1-1-Z-B
  15. # 2 1-2-2-4 1-2-B-D
  16. # 3 1-2-4-5 1-2-D-E
  17. # 4 1-3-25-1 1-3-Y-A

答案2

得分: 2

1. stringr 解决方案

str_replace_all() 中提供一个自定义函数,以替换最后2个数字的匹配。

  1. library(dplyr)
  2. library(stringr)
  3. df %>%
  4. mutate(num_new = str_replace_all(num, "\\d+-\\d+$", \(x) {
  5. str_c(LETTERS[as.integer(str_split_1(x, '-'))], collapse = '-')
  6. }))

2. tidyr 解决方案

separate_wider_regex() + unite()

  1. library(dplyr)
  2. library(tidyr)
  3. df %>%
  4. separate_wider_regex(
  5. num,
  6. patterns = c(col1 = ".+", "-", col2 = "\\d+", "-", col3 = "\\d+"),
  7. cols_remove = FALSE
  8. ) %>%
  9. mutate(across(col2:col3, ~ LETTERS[as.integer(.x)])) %>%
  10. unite(num_new, col1:col3, sep = '-')
输出
  1. # # A tibble: 4 × 2
  2. # num num_new
  3. # <chr> <chr>
  4. # 1 1-1-26-2 1-1-Z-B
  5. # 2 1-2-2-4 1-2-B-D
  6. # 3 1-2-4-5 1-2-D-E
  7. # 4 1-3-25-1 1-3-Y-A

对于一般情况,即列中的字符串不都包含相同数量的数字。

  1. df <- data.frame(num = c("1-2-3", "1-2-3-4", "1-2-3-4-5"))

上述两种解决方案都可以处理这种情况:

  1. # num num_new
  2. # 1 1-2-3 1-B-C
  3. # 2 1-2-3-4 1-2-C-D
  4. # 3 1-2-3-4-5 1-2-3-D-E
英文:

1. stringr solution

Supply a custom function into str_replace_all() to replace the match of the last 2 numbers.

  1. library(dplyr)
  2. library(stringr)
  3. df %&gt;%
  4. mutate(num_new = str_replace_all(num, &quot;\\d+-\\d+$&quot;, \(x) {
  5. str_c(LETTERS[as.integer(str_split_1(x, &#39;-&#39;))], collapse = &#39;-&#39;)
  6. }))

2. tidyr solution

separate_wider_regex() + unite()

  1. library(dplyr)
  2. library(tidyr)
  3. df %&gt;%
  4. separate_wider_regex(
  5. num,
  6. patterns = c(col1 = &quot;.+&quot;, &quot;-&quot;, col2 = &quot;\\d+&quot;, &quot;-&quot;, col3 = &quot;\\d+&quot;),
  7. cols_remove = FALSE
  8. ) %&gt;%
  9. mutate(across(col2:col3, ~ LETTERS[as.integer(.x)])) %&gt;%
  10. unite(num_new, col1:col3, sep = &quot;-&quot;)
Output
  1. # # A tibble: 4 &#215; 2
  2. # num num_new
  3. # &lt;chr&gt; &lt;chr&gt;
  4. # 1 1-1-26-2 1-1-Z-B
  5. # 2 1-2-2-4 1-2-B-D
  6. # 3 1-2-4-5 1-2-D-E
  7. # 4 1-3-25-1 1-3-Y-A

For a generalized case, i.e. not all strings in the column have equal amounts of numbers.

  1. df &lt;- data.frame(num = c(&quot;1-2-3&quot;, &quot;1-2-3-4&quot;, &quot;1-2-3-4-5&quot;))

Both solutions above can deal with this:

  1. # num num_new
  2. # 1 1-2-3 1-B-C
  3. # 2 1-2-3-4 1-2-C-D
  4. # 3 1-2-3-4-5 1-2-3-D-E

答案3

得分: 2

请尝试以下代码,我假设replacement_dfLETTER相同。

这里我使用了separateunite函数。

  1. library(tidyverse)
  2. # 识别字符串的长度
  3. len <- max(lengths(strsplit(df$num, '-')))
  4. # 创建变量名称
  5. nam <- paste0('l', seq(1:len))
  6. # 选择最后2个名称
  7. nam2 <- nam[(len-1):len]
  8. df %>% separate(num, into = c(nam), sep = '-', remove = FALSE, fill = 'left') %>%
  9. mutate(across(all_of(nam2), ~LETTERS[as.numeric(.x)])) %>%
  10. unite(num_new, all_of(nam), sep = '-', na.rm = TRUE)

创建于2023年7月10日,使用 reprex v2.0.2

  1. num num_new
  2. 1 1-2-3 1-B-C
  3. 2 1-2-3-4 1-2-C-D
  4. 3 1-2-3-4-5 1-2-3-D-E

请注意,代码中的df$numLETTERS是变量名,无需翻译。

英文:

Alternatively please try the below code where I assume that the replacement_df is same as that of LETTER

here I used separate and unite functions

  1. library(tidyverse)
  2. # identify the length of the string
  3. len &lt;- max(lengths(strsplit(df$num,&#39;-&#39;)))
  4. # create the variables names
  5. nam &lt;- paste0(&#39;l&#39;,seq(1:len))
  6. # select last 2 names
  7. nam2 &lt;- nam[(len-1):len]
  8. df %&gt;% separate(num,into = c(nam), sep = &#39;\\-&#39;, remove = F, fill = &#39;left&#39;) %&gt;%
  9. mutate(across(all_of(nam2), ~LETTERS[as.numeric(.x)])) %&gt;%
  10. unite(num_new,all_of(nam), sep = &#39;-&#39;, na.rm = T)

<sup>Created on 2023-07-10 with reprex v2.0.2</sup>

  1. num num_new
  2. 1 1-2-3 1-B-C
  3. 2 1-2-3-4 1-2-C-D
  4. 3 1-2-3-4-5 1-2-3-D-E

huangapple
  • 本文由 发表于 2023年7月10日 22:01:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654519.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定