使用strsplit()函数来分割一个数字字符串,并用字符替换部分。

huangapple go评论63阅读模式
英文:

Using strsplit() to split up a numeric string and replace part with characters

问题

我有一个字符串,我想将其拆分并将最后的两个数字替换为字符。例如,字符串"1-1-2-2"将变为"1-1-B-B"。我已经包括了我尝试的代码片段和我的尝试,希望这样更清楚。

> df
num
1-1-26-2
1-2-2-4
1-2-4-5
1-3-25-1

现在我已经尝试使用strsplit(num, '-')来拆分旧的num列,但不确定如何使用下面的替换df来替换最后的两个数字。

> replacement_df
character    num
A            1
B            2
D            4
E            5
Y            25
Z            26
英文:

I have a string which I want to split up and replace the last 2 numbers with characters. So for example a string of "1-1-2-2" would become "1-1-B-B". I have included a snippet of what I'm trying to do and my attempt so far and hopefully it becomes a bit clearer.

> df
num
1-1-26-2
1-2-2-4
1-2-4-5
1-3-25-1

So now I have attempted to split up the old_num column using strsplit(num, '-') but unsure of how to replace the last 2 digits with the characters using the replacement df from below

> replacement_df
character    num
A            1
B            2
D            4
E            5
Y            25
Z            26

答案1

得分: 2

像这样吗?

```r
replace_nums <- function(x, n = 2) {
    x_split <- unlist(strsplit(x, "-"))

    x_tail <- tail(x_split, n)

    paste(c(
        head(x_split, -n),
        LETTERS[as.integer(x_tail)]
    ), collapse = "-")
}

x <- c("1-1-2-2")
replace_nums(x)
# [1] "1-1-B-B"

或者对于矢量化版本:

replace_nums_df <- function(x, n = 2) {
    x_split <- strsplit(x, "-")

    x_tail <- lapply(x_split, \(x) tail(x, n))

    Map(\(split_str, tail_str) {
        paste(c(
            head(split_str, -n),
            LETTERS[as.integer(tail_str)]
        ), collapse = "-")
    }, x_split, x_tail)
}

df$replaced <- replace_nums_df(df$num)
df
#        num replaced
# 1 1-1-26-2  1-1-Z-B
# 2  1-2-2-4  1-2-B-D
# 3  1-2-4-5  1-2-D-E
# 4 1-3-25-1  1-3-Y-A

<details>
<summary>英文:</summary>

Something like this?

```r
replace_nums &lt;- function(x, n = 2) {
    x_split &lt;- unlist(strsplit(x, &quot;-&quot;))

    x_tail &lt;- tail(x_split, n)

    paste(c(
        head(x_split, -n),
        LETTERS[as.integer(x_tail)]
    ), collapse = &quot;-&quot;)
}

x &lt;- c(&quot;1-1-2-2&quot;)
replace_nums(x)
# [1] &quot;1-1-B-B&quot;

Or for a vectorised version:

replace_nums_df &lt;- function(x, n = 2) {
    x_split &lt;- strsplit(x, &quot;-&quot;)

    x_tail &lt;- lapply(x_split, \(x) tail(x, n))

    Map(\(split_str, tail_str) {
        paste(c(
            head(split_str, -n),
            LETTERS[as.integer(tail_str)]
        ), collapse = &quot;-&quot;)
    }, x_split, x_tail)
}

df$replaced &lt;- replace_nums_df(df$num)
df
#        num replaced
# 1 1-1-26-2  1-1-Z-B
# 2  1-2-2-4  1-2-B-D
# 3  1-2-4-5  1-2-D-E
# 4 1-3-25-1  1-3-Y-A

答案2

得分: 2

1. stringr 解决方案

str_replace_all() 中提供一个自定义函数,以替换最后2个数字的匹配。

library(dplyr)
library(stringr)

df %>%
  mutate(num_new = str_replace_all(num, "\\d+-\\d+$", \(x) {
    str_c(LETTERS[as.integer(str_split_1(x, '-'))], collapse = '-')
  }))

2. tidyr 解决方案

separate_wider_regex() + unite()

library(dplyr)
library(tidyr)

df %>%
  separate_wider_regex(
    num,
    patterns = c(col1 = ".+", "-", col2 = "\\d+", "-", col3 = "\\d+"),
    cols_remove = FALSE
  ) %>%
  mutate(across(col2:col3, ~ LETTERS[as.integer(.x)])) %>%
  unite(num_new, col1:col3, sep = '-')
输出
# # A tibble: 4 × 2
#   num      num_new
#   <chr>    <chr>  
# 1 1-1-26-2 1-1-Z-B
# 2 1-2-2-4  1-2-B-D
# 3 1-2-4-5  1-2-D-E
# 4 1-3-25-1 1-3-Y-A

对于一般情况,即列中的字符串不都包含相同数量的数字。

df <- data.frame(num = c("1-2-3", "1-2-3-4", "1-2-3-4-5"))

上述两种解决方案都可以处理这种情况:

#         num   num_new
# 1     1-2-3     1-B-C
# 2   1-2-3-4   1-2-C-D
# 3 1-2-3-4-5 1-2-3-D-E
英文:

1. stringr solution

Supply a custom function into str_replace_all() to replace the match of the last 2 numbers.

library(dplyr)
library(stringr)

df %&gt;%
  mutate(num_new = str_replace_all(num, &quot;\\d+-\\d+$&quot;, \(x) {
    str_c(LETTERS[as.integer(str_split_1(x, &#39;-&#39;))], collapse = &#39;-&#39;)
  }))

2. tidyr solution

separate_wider_regex() + unite()

library(dplyr)
library(tidyr)

df %&gt;%
  separate_wider_regex(
    num,
    patterns = c(col1 = &quot;.+&quot;, &quot;-&quot;, col2 = &quot;\\d+&quot;, &quot;-&quot;, col3 = &quot;\\d+&quot;),
    cols_remove = FALSE
  ) %&gt;%
  mutate(across(col2:col3, ~ LETTERS[as.integer(.x)])) %&gt;%
  unite(num_new, col1:col3, sep = &quot;-&quot;)
Output
# # A tibble: 4 &#215; 2
#   num      num_new
#   &lt;chr&gt;    &lt;chr&gt;  
# 1 1-1-26-2 1-1-Z-B
# 2 1-2-2-4  1-2-B-D
# 3 1-2-4-5  1-2-D-E
# 4 1-3-25-1 1-3-Y-A

For a generalized case, i.e. not all strings in the column have equal amounts of numbers.

df &lt;- data.frame(num = c(&quot;1-2-3&quot;, &quot;1-2-3-4&quot;, &quot;1-2-3-4-5&quot;))

Both solutions above can deal with this:

#         num   num_new
# 1     1-2-3     1-B-C
# 2   1-2-3-4   1-2-C-D
# 3 1-2-3-4-5 1-2-3-D-E

答案3

得分: 2

请尝试以下代码,我假设replacement_dfLETTER相同。

这里我使用了separateunite函数。

library(tidyverse)

# 识别字符串的长度
len <- max(lengths(strsplit(df$num, '-')))

# 创建变量名称
nam <- paste0('l', seq(1:len))

# 选择最后2个名称
nam2 <- nam[(len-1):len]

df %>% separate(num, into = c(nam), sep = '-', remove = FALSE, fill = 'left') %>%
  mutate(across(all_of(nam2), ~LETTERS[as.numeric(.x)])) %>%
  unite(num_new, all_of(nam), sep = '-', na.rm = TRUE)

创建于2023年7月10日,使用 reprex v2.0.2

        num   num_new
1     1-2-3     1-B-C
2   1-2-3-4   1-2-C-D
3 1-2-3-4-5 1-2-3-D-E

请注意,代码中的df$numLETTERS是变量名,无需翻译。

英文:

Alternatively please try the below code where I assume that the replacement_df is same as that of LETTER

here I used separate and unite functions

library(tidyverse)


# identify the length of the string
len &lt;- max(lengths(strsplit(df$num,&#39;-&#39;)))

# create the variables names
nam &lt;- paste0(&#39;l&#39;,seq(1:len))

# select last 2 names
nam2 &lt;- nam[(len-1):len]

df %&gt;% separate(num,into = c(nam), sep = &#39;\\-&#39;, remove = F, fill = &#39;left&#39;) %&gt;% 
  mutate(across(all_of(nam2), ~LETTERS[as.numeric(.x)])) %&gt;% 
  unite(num_new,all_of(nam), sep = &#39;-&#39;, na.rm = T)

<sup>Created on 2023-07-10 with reprex v2.0.2</sup>

        num   num_new
1     1-2-3     1-B-C
2   1-2-3-4   1-2-C-D
3 1-2-3-4-5 1-2-3-D-E

huangapple
  • 本文由 发表于 2023年7月10日 22:01:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654519.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定