英文:
Using strsplit() to split up a numeric string and replace part with characters
问题
我有一个字符串,我想将其拆分并将最后的两个数字替换为字符。例如,字符串"1-1-2-2"将变为"1-1-B-B"。我已经包括了我尝试的代码片段和我的尝试,希望这样更清楚。
> df
num
1-1-26-2
1-2-2-4
1-2-4-5
1-3-25-1
现在我已经尝试使用strsplit(num, '-')
来拆分旧的num列,但不确定如何使用下面的替换df来替换最后的两个数字。
> replacement_df
character num
A 1
B 2
D 4
E 5
Y 25
Z 26
英文:
I have a string which I want to split up and replace the last 2 numbers with characters. So for example a string of "1-1-2-2" would become "1-1-B-B". I have included a snippet of what I'm trying to do and my attempt so far and hopefully it becomes a bit clearer.
> df
num
1-1-26-2
1-2-2-4
1-2-4-5
1-3-25-1
So now I have attempted to split up the old_num column using strsplit(num, '-')
but unsure of how to replace the last 2 digits with the characters using the replacement df from below
> replacement_df
character num
A 1
B 2
D 4
E 5
Y 25
Z 26
答案1
得分: 2
像这样吗?
```r
replace_nums <- function(x, n = 2) {
x_split <- unlist(strsplit(x, "-"))
x_tail <- tail(x_split, n)
paste(c(
head(x_split, -n),
LETTERS[as.integer(x_tail)]
), collapse = "-")
}
x <- c("1-1-2-2")
replace_nums(x)
# [1] "1-1-B-B"
或者对于矢量化版本:
replace_nums_df <- function(x, n = 2) {
x_split <- strsplit(x, "-")
x_tail <- lapply(x_split, \(x) tail(x, n))
Map(\(split_str, tail_str) {
paste(c(
head(split_str, -n),
LETTERS[as.integer(tail_str)]
), collapse = "-")
}, x_split, x_tail)
}
df$replaced <- replace_nums_df(df$num)
df
# num replaced
# 1 1-1-26-2 1-1-Z-B
# 2 1-2-2-4 1-2-B-D
# 3 1-2-4-5 1-2-D-E
# 4 1-3-25-1 1-3-Y-A
<details>
<summary>英文:</summary>
Something like this?
```r
replace_nums <- function(x, n = 2) {
x_split <- unlist(strsplit(x, "-"))
x_tail <- tail(x_split, n)
paste(c(
head(x_split, -n),
LETTERS[as.integer(x_tail)]
), collapse = "-")
}
x <- c("1-1-2-2")
replace_nums(x)
# [1] "1-1-B-B"
Or for a vectorised version:
replace_nums_df <- function(x, n = 2) {
x_split <- strsplit(x, "-")
x_tail <- lapply(x_split, \(x) tail(x, n))
Map(\(split_str, tail_str) {
paste(c(
head(split_str, -n),
LETTERS[as.integer(tail_str)]
), collapse = "-")
}, x_split, x_tail)
}
df$replaced <- replace_nums_df(df$num)
df
# num replaced
# 1 1-1-26-2 1-1-Z-B
# 2 1-2-2-4 1-2-B-D
# 3 1-2-4-5 1-2-D-E
# 4 1-3-25-1 1-3-Y-A
答案2
得分: 2
1. stringr
解决方案
在 str_replace_all()
中提供一个自定义函数,以替换最后2个数字的匹配。
library(dplyr)
library(stringr)
df %>%
mutate(num_new = str_replace_all(num, "\\d+-\\d+$", \(x) {
str_c(LETTERS[as.integer(str_split_1(x, '-'))], collapse = '-')
}))
2. tidyr
解决方案
separate_wider_regex()
+ unite()
library(dplyr)
library(tidyr)
df %>%
separate_wider_regex(
num,
patterns = c(col1 = ".+", "-", col2 = "\\d+", "-", col3 = "\\d+"),
cols_remove = FALSE
) %>%
mutate(across(col2:col3, ~ LETTERS[as.integer(.x)])) %>%
unite(num_new, col1:col3, sep = '-')
输出
# # A tibble: 4 × 2
# num num_new
# <chr> <chr>
# 1 1-1-26-2 1-1-Z-B
# 2 1-2-2-4 1-2-B-D
# 3 1-2-4-5 1-2-D-E
# 4 1-3-25-1 1-3-Y-A
对于一般情况,即列中的字符串不都包含相同数量的数字。
df <- data.frame(num = c("1-2-3", "1-2-3-4", "1-2-3-4-5"))
上述两种解决方案都可以处理这种情况:
# num num_new
# 1 1-2-3 1-B-C
# 2 1-2-3-4 1-2-C-D
# 3 1-2-3-4-5 1-2-3-D-E
英文:
1. stringr
solution
Supply a custom function into str_replace_all()
to replace the match of the last 2 numbers.
library(dplyr)
library(stringr)
df %>%
mutate(num_new = str_replace_all(num, "\\d+-\\d+$", \(x) {
str_c(LETTERS[as.integer(str_split_1(x, '-'))], collapse = '-')
}))
2. tidyr
solution
separate_wider_regex()
+ unite()
library(dplyr)
library(tidyr)
df %>%
separate_wider_regex(
num,
patterns = c(col1 = ".+", "-", col2 = "\\d+", "-", col3 = "\\d+"),
cols_remove = FALSE
) %>%
mutate(across(col2:col3, ~ LETTERS[as.integer(.x)])) %>%
unite(num_new, col1:col3, sep = "-")
Output
# # A tibble: 4 × 2
# num num_new
# <chr> <chr>
# 1 1-1-26-2 1-1-Z-B
# 2 1-2-2-4 1-2-B-D
# 3 1-2-4-5 1-2-D-E
# 4 1-3-25-1 1-3-Y-A
For a generalized case, i.e. not all strings in the column have equal amounts of numbers.
df <- data.frame(num = c("1-2-3", "1-2-3-4", "1-2-3-4-5"))
Both solutions above can deal with this:
# num num_new
# 1 1-2-3 1-B-C
# 2 1-2-3-4 1-2-C-D
# 3 1-2-3-4-5 1-2-3-D-E
答案3
得分: 2
请尝试以下代码,我假设replacement_df
与LETTER
相同。
这里我使用了separate
和unite
函数。
library(tidyverse)
# 识别字符串的长度
len <- max(lengths(strsplit(df$num, '-')))
# 创建变量名称
nam <- paste0('l', seq(1:len))
# 选择最后2个名称
nam2 <- nam[(len-1):len]
df %>% separate(num, into = c(nam), sep = '-', remove = FALSE, fill = 'left') %>%
mutate(across(all_of(nam2), ~LETTERS[as.numeric(.x)])) %>%
unite(num_new, all_of(nam), sep = '-', na.rm = TRUE)
创建于2023年7月10日,使用 reprex v2.0.2
num num_new
1 1-2-3 1-B-C
2 1-2-3-4 1-2-C-D
3 1-2-3-4-5 1-2-3-D-E
请注意,代码中的df$num
和LETTERS
是变量名,无需翻译。
英文:
Alternatively please try the below code where I assume that the replacement_df
is same as that of LETTER
here I used separate
and unite
functions
library(tidyverse)
# identify the length of the string
len <- max(lengths(strsplit(df$num,'-')))
# create the variables names
nam <- paste0('l',seq(1:len))
# select last 2 names
nam2 <- nam[(len-1):len]
df %>% separate(num,into = c(nam), sep = '\\-', remove = F, fill = 'left') %>%
mutate(across(all_of(nam2), ~LETTERS[as.numeric(.x)])) %>%
unite(num_new,all_of(nam), sep = '-', na.rm = T)
<sup>Created on 2023-07-10 with reprex v2.0.2</sup>
num num_new
1 1-2-3 1-B-C
2 1-2-3-4 1-2-C-D
3 1-2-3-4-5 1-2-3-D-E
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论