如何在R中重命名列中相同的数值?

huangapple go评论66阅读模式
英文:

How to rename identical values in a column within R?

问题

以下是您要翻译的代码部分:

Say a data set:

a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
dt <- as.data.frame(cbind(a,b))
dt

    a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M
6 103 M

Column a is subject_ID, and column b is subject_name. I want to uniquely rename subject ID 101 to M1, and 103 to M2.

Is there a way to do this by indexing?

This does not work.

dt.try1 <- gsub("M","M1",dt[1:2,c(2)])
dt.try1
[1] "M1" "M1"

This is what would be ideal result:

    a  b
1 101  M
2 101  M
3 102  P
4 102  P
5 103 M2
6 103 M2

Why does not this work?

英文:

Say a data set:

a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
dt <- as.data.frame(cbind(a,b))
dt

    a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M
6 103 M

Column a is subject_ID, and column b is subject_name. I want to uniquely rename subject ID 101 to M1, and 103 to M2.

Is there a way to do this by indexing?

This does not work.

dt.try1 <- gsub("M","M1",dt[1:2,c(2)])
dt.try1
[1] "M1" "M1"

This is what would be ideal result:

    a  b
1 101  M
2 101  M
3 102  P
4 102  P
5 103 M2
6 103 M2

Why does not this work?

答案1

得分: 5

样本数据。

```r
a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
dt <- data.frame(a, b)

注意:永远不要使用 data.frame(cbind(..)) 来创建一个数据框:在这种情况下,由于至少有一个向量是字符型的,它们都将成为字符型,因为 cbind 默认创建矩阵(矩阵只能包含一种类别,而数据框可以包含多种类别)。最好直接使用 data.frame(..)

注意: 为了清晰起见,您的 "理想输出" 显示为 M,M,P,P,M2,M2,但是您先前的代码块尝试将前两个更改为 M1。我基于您需要将前两个更改为 M1 而不是只是 M 的假设来编写代码。(对于这一点,akrun 的答案是正确的,尽管这种方法可以进行调整。)

dplyr

library(dplyr)
dt %>%
  distinct(a, b) %>%
  group_by(b) %>%
  mutate(b = if (n() > 1) paste0(b, row_number()) else b) %>%
  left_join(dt, ., by = "a", suffix = c(".x", "")) %>%
  select(-b.x)
#     a  b
# 1 101 M1
# 2 101 M1
# 3 102  P
# 4 102  P
# 5 103 M2
# 6 103 M2

base R

dt2 <- unique(dt[, c("a", "b")])
dt2$b <- ave(dt2$b, dt2$b, FUN = function(z) if (length(z) > 1) paste0(z, seq_along(z)) else z)
dt2
#     a  b
# 1 101 M1
# 3 102  P
# 5 103 M2
merge(subset(dt, select = -b), dt2, by = "a")
#     a  b
# 1 101 M1
# 2 101 M1
# 3 102  P
# 4 102  P
# 5 103 M2
# 6 103 M2

<details>
<summary>英文:</summary>

Sample data.

```r
a &lt;- c(101,101,102,102,103,103)
b &lt;- c(&quot;M&quot;,&quot;M&quot;,&quot;P&quot;,&quot;P&quot;,&quot;M&quot;,&quot;M&quot;)
dt &lt;- data.frame(a, b)

FYI, never use data.frame(cbind(..)) to create a frame: in this case, since at least one of the vectors is character, they will all be character since cbind by default creates matrices (which are limited to one class, unlike frames). It's always better here to use data.frame(..) directly.

Note: for clarity, your "ideal output" shows M,M,P,P,M2,M2, but your previous code block trying to change the first two to M1. I'm basing my code on the assumption that you need the first two to be M1 instead of just M. (For that, akrun's answer is correct, though this metholodogy could be adjusted.)

dplyr

library(dplyr)
dt %&gt;%
  distinct(a, b) %&gt;%
  group_by(b) %&gt;%
  mutate(b = if (n() &gt; 1) paste0(b, row_number()) else b) %&gt;%
  left_join(dt, ., by = &quot;a&quot;, suffix = c(&quot;.x&quot;, &quot;&quot;)) %&gt;%
  select(-b.x)
#     a  b
# 1 101 M1
# 2 101 M1
# 3 102  P
# 4 102  P
# 5 103 M2
# 6 103 M2

base R

dt2 &lt;- unique(dt[, c(&quot;a&quot;, &quot;b&quot;)])
dt2$b &lt;- ave(dt2$b, dt2$b, FUN = function(z) if (length(z) &gt; 1) paste0(z, seq_along(z)) else z)
dt2
#     a  b
# 1 101 M1
# 3 102  P
# 5 103 M2
merge(subset(dt, select = -b), dt2, by = &quot;a&quot;)
#     a  b
# 1 101 M1
# 2 101 M1
# 3 102  P
# 4 102  P
# 5 103 M2
# 6 103 M2

答案2

得分: 3

另一个选项:

在(dt, b[b == 'M'] <- paste0('M', with(rle(a[b == 'M']), rep(seq_along(lengths), lengths))))

输出:

    a   b
1 101  M1
2 101  M1
3 102   P
4 102   P
5 103  M2
6 103  M2
英文:

Another option:

within(dt, b[b == &#39;M&#39;] &lt;- paste0(&#39;M&#39;, with(rle(a[b == &#39;M&#39;]), rep(seq_along(lengths), lengths))))

Output:

    a  b
1 101 M1
2 101 M1
3 102  P
4 102  P
5 103 M2
6 103 M2

答案3

得分: 3

Using rle/inverse.rle from base R

dt$b <- inverse.rle(within.list(rle(dt$b), values <- make.unique(values, sep = "")))

-output

> dt
    a  b
1 101  M
2 101  M
3 102  P
4 102  P
5 103 M1
6 103 M1

Or using rle in tidyverse

library(dplyr)
library(stringr)
dt %>%
  mutate(b = inverse.rle(within.list(rle(b), 
  values <- str_replace_all(make.unique(values, sep = ""),  
       "(\\d+)", function(x) as.numeric(x) + 1) )))

-output

    a  b
1 101  M
2 101  M
3 102  P
4 102  P
5 103 M2
6 103 M2
英文:

Using rle/inverse.rle from base R

dt$b &lt;- inverse.rle(within.list(rle(dt$b), values &lt;- make.unique(values, sep = &quot;&quot;)))

-output

&gt; dt
    a  b
1 101  M
2 101  M
3 102  P
4 102  P
5 103 M1
6 103 M1

Or using rle in tidyverse

library(dplyr)
library(stringr)
dt %&gt;% 
  mutate(b = inverse.rle(within.list(rle(b), 
  values &lt;- str_replace_all(make.unique(values, sep = &quot;&quot;),  
       &quot;(\\d+)&quot;, function(x) as.numeric(x) + 1) )))

-output

    a  b
1 101  M
2 101  M
3 102  P
4 102  P
5 103 M2
6 103 M2

答案4

得分: 0

你可以将 a 视为一个因子,然后检查 levels(a) > 1

library(dplyr)

df %>%
  group_by(b) %>%
  mutate(n_lvl = length(levels(factor(a))),
         b = paste0(b, ifelse(n_lvl > 1, as.integer(factor(a)), ""))) %>%
  select(-n_lvl)

# 一个 tibble: 6 × 2
# 分组:   b [3]
      a b    
  <dbl> <chr>
1   101 M1   
2   101 M1   
3   102 P    
4   102 P    
5   103 M2   
6   103 M2   

来自 @r2evans 的数据:

a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
df <- tibble(a, b)
英文:

You can treat a as a factor and then check for levels(a) &gt; 1:

library(dplyr)

df |&gt; 
  group_by(b) |&gt; 
  mutate(n_lvl = length(levels(factor(a))),
         b = paste0(b, ifelse(n_lvl &gt; 1, as.integer(factor(a)), &quot;&quot;))) |&gt; 
  select(-n_lvl)

# A tibble: 6 &#215; 2
# Groups:   b [3]
      a b    
  &lt;dbl&gt; &lt;chr&gt;
1   101 M1   
2   101 M1   
3   102 P    
4   102 P    
5   103 M2   
6   103 M2   

Data from @r2evans

a &lt;- c(101,101,102,102,103,103)
b &lt;- c(&quot;M&quot;,&quot;M&quot;,&quot;P&quot;,&quot;P&quot;,&quot;M&quot;,&quot;M&quot;)
df &lt;- tibble(a, b)

</details>



huangapple
  • 本文由 发表于 2023年2月27日 02:06:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75574024.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定