英文:
How to rename identical values in a column within R?
问题
以下是您要翻译的代码部分:
Say a data set:
a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
dt <- as.data.frame(cbind(a,b))
dt
a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M
6 103 M
Column a is subject_ID, and column b is subject_name. I want to uniquely rename subject ID 101 to M1, and 103 to M2.
Is there a way to do this by indexing?
This does not work.
dt.try1 <- gsub("M","M1",dt[1:2,c(2)])
dt.try1
[1] "M1" "M1"
This is what would be ideal result:
a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M2
6 103 M2
Why does not this work?
英文:
Say a data set:
a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
dt <- as.data.frame(cbind(a,b))
dt
a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M
6 103 M
Column a is subject_ID, and column b is subject_name. I want to uniquely rename subject ID 101 to M1, and 103 to M2.
Is there a way to do this by indexing?
This does not work.
dt.try1 <- gsub("M","M1",dt[1:2,c(2)])
dt.try1
[1] "M1" "M1"
This is what would be ideal result:
a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M2
6 103 M2
Why does not this work?
答案1
得分: 5
样本数据。
```r
a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
dt <- data.frame(a, b)
注意:永远不要使用 data.frame(cbind(..))
来创建一个数据框:在这种情况下,由于至少有一个向量是字符型的,它们都将成为字符型,因为 cbind
默认创建矩阵(矩阵只能包含一种类别,而数据框可以包含多种类别)。最好直接使用 data.frame(..)
。
注意: 为了清晰起见,您的 "理想输出" 显示为 M,M,P,P,M2,M2
,但是您先前的代码块尝试将前两个更改为 M1
。我基于您需要将前两个更改为 M1
而不是只是 M
的假设来编写代码。(对于这一点,akrun 的答案是正确的,尽管这种方法可以进行调整。)
dplyr
library(dplyr)
dt %>%
distinct(a, b) %>%
group_by(b) %>%
mutate(b = if (n() > 1) paste0(b, row_number()) else b) %>%
left_join(dt, ., by = "a", suffix = c(".x", "")) %>%
select(-b.x)
# a b
# 1 101 M1
# 2 101 M1
# 3 102 P
# 4 102 P
# 5 103 M2
# 6 103 M2
base R
dt2 <- unique(dt[, c("a", "b")])
dt2$b <- ave(dt2$b, dt2$b, FUN = function(z) if (length(z) > 1) paste0(z, seq_along(z)) else z)
dt2
# a b
# 1 101 M1
# 3 102 P
# 5 103 M2
merge(subset(dt, select = -b), dt2, by = "a")
# a b
# 1 101 M1
# 2 101 M1
# 3 102 P
# 4 102 P
# 5 103 M2
# 6 103 M2
<details>
<summary>英文:</summary>
Sample data.
```r
a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
dt <- data.frame(a, b)
FYI, never use data.frame(cbind(..))
to create a frame: in this case, since at least one of the vectors is character
, they will all be character
since cbind
by default creates matrices (which are limited to one class, unlike frames). It's always better here to use data.frame(..)
directly.
Note: for clarity, your "ideal output" shows M,M,P,P,M2,M2
, but your previous code block trying to change the first two to M1
. I'm basing my code on the assumption that you need the first two to be M1
instead of just M
. (For that, akrun's answer is correct, though this metholodogy could be adjusted.)
dplyr
library(dplyr)
dt %>%
distinct(a, b) %>%
group_by(b) %>%
mutate(b = if (n() > 1) paste0(b, row_number()) else b) %>%
left_join(dt, ., by = "a", suffix = c(".x", "")) %>%
select(-b.x)
# a b
# 1 101 M1
# 2 101 M1
# 3 102 P
# 4 102 P
# 5 103 M2
# 6 103 M2
base R
dt2 <- unique(dt[, c("a", "b")])
dt2$b <- ave(dt2$b, dt2$b, FUN = function(z) if (length(z) > 1) paste0(z, seq_along(z)) else z)
dt2
# a b
# 1 101 M1
# 3 102 P
# 5 103 M2
merge(subset(dt, select = -b), dt2, by = "a")
# a b
# 1 101 M1
# 2 101 M1
# 3 102 P
# 4 102 P
# 5 103 M2
# 6 103 M2
答案2
得分: 3
另一个选项:
在(dt, b[b == 'M'] <- paste0('M', with(rle(a[b == 'M']), rep(seq_along(lengths), lengths))))
输出:
a b
1 101 M1
2 101 M1
3 102 P
4 102 P
5 103 M2
6 103 M2
英文:
Another option:
within(dt, b[b == 'M'] <- paste0('M', with(rle(a[b == 'M']), rep(seq_along(lengths), lengths))))
Output:
a b
1 101 M1
2 101 M1
3 102 P
4 102 P
5 103 M2
6 103 M2
答案3
得分: 3
Using rle/inverse.rle
from base R
dt$b <- inverse.rle(within.list(rle(dt$b), values <- make.unique(values, sep = "")))
-output
> dt
a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M1
6 103 M1
Or using rle
in tidyverse
library(dplyr)
library(stringr)
dt %>%
mutate(b = inverse.rle(within.list(rle(b),
values <- str_replace_all(make.unique(values, sep = ""),
"(\\d+)", function(x) as.numeric(x) + 1) )))
-output
a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M2
6 103 M2
英文:
Using rle/inverse.rle
from base R
dt$b <- inverse.rle(within.list(rle(dt$b), values <- make.unique(values, sep = "")))
-output
> dt
a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M1
6 103 M1
Or using rle
in tidyverse
library(dplyr)
library(stringr)
dt %>%
mutate(b = inverse.rle(within.list(rle(b),
values <- str_replace_all(make.unique(values, sep = ""),
"(\\d+)", function(x) as.numeric(x) + 1) )))
-output
a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M2
6 103 M2
答案4
得分: 0
你可以将 a
视为一个因子,然后检查 levels(a) > 1
:
library(dplyr)
df %>%
group_by(b) %>%
mutate(n_lvl = length(levels(factor(a))),
b = paste0(b, ifelse(n_lvl > 1, as.integer(factor(a)), ""))) %>%
select(-n_lvl)
# 一个 tibble: 6 × 2
# 分组: b [3]
a b
<dbl> <chr>
1 101 M1
2 101 M1
3 102 P
4 102 P
5 103 M2
6 103 M2
来自 @r2evans 的数据:
a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
df <- tibble(a, b)
英文:
You can treat a
as a factor and then check for levels(a) > 1
:
library(dplyr)
df |>
group_by(b) |>
mutate(n_lvl = length(levels(factor(a))),
b = paste0(b, ifelse(n_lvl > 1, as.integer(factor(a)), ""))) |>
select(-n_lvl)
# A tibble: 6 × 2
# Groups: b [3]
a b
<dbl> <chr>
1 101 M1
2 101 M1
3 102 P
4 102 P
5 103 M2
6 103 M2
Data from @r2evans
a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
df <- tibble(a, b)
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论