2023年2月27日 02:06:29go评论91阅读模式

英文:

How to rename identical values in a column within R?

问题

以下是您要翻译的代码部分：

Say a data set:
a &lt;- c(101,101,102,102,103,103)
b &lt;- c(&quot;M&quot;,&quot;M&quot;,&quot;P&quot;,&quot;P&quot;,&quot;M&quot;,&quot;M&quot;)
dt &lt;- as.data.frame(cbind(a,b))
dt
    a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M
6 103 M

Column a is subject_ID, and column b is subject_name. I want to uniquely rename subject ID 101 to M1, and 103 to M2.

Is there a way to do this by indexing?

This does not work.

dt.try1 &lt;- gsub(&quot;M&quot;,&quot;M1&quot;,dt[1:2,c(2)])
dt.try1
[1] &quot;M1&quot; &quot;M1&quot;

This is what would be ideal result:

Why does not this work?

英文:

Say a data set:

a &lt;- c(101,101,102,102,103,103)
b &lt;- c(&quot;M&quot;,&quot;M&quot;,&quot;P&quot;,&quot;P&quot;,&quot;M&quot;,&quot;M&quot;)
dt &lt;- as.data.frame(cbind(a,b))
dt
    a b
1 101 M
2 101 M
3 102 P
4 102 P
5 103 M
6 103 M

Column a is subject_ID, and column b is subject_name. I want to uniquely rename subject ID 101 to M1, and 103 to M2.

Is there a way to do this by indexing?

This does not work.

dt.try1 &lt;- gsub(&quot;M&quot;,&quot;M1&quot;,dt[1:2,c(2)])
dt.try1
[1] &quot;M1&quot; &quot;M1&quot;

This is what would be ideal result:

Why does not this work?

答案1

得分: 5

样本数据。
```r
a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
dt <- data.frame(a, b)

注意：永远不要使用 data.frame(cbind(..)) 来创建一个数据框：在这种情况下，由于至少有一个向量是字符型的，它们都将成为字符型，因为 cbind 默认创建矩阵（矩阵只能包含一种类别，而数据框可以包含多种类别）。最好直接使用 data.frame(..)。

注意： 为了清晰起见，您的 "理想输出" 显示为 M，M，P，P，M2，M2，但是您先前的代码块尝试将前两个更改为 M1。我基于您需要将前两个更改为 M1 而不是只是 M 的假设来编写代码。（对于这一点，akrun 的答案是正确的，尽管这种方法可以进行调整。）

dplyr

library(dplyr)
dt %>%
  distinct(a, b) %>%
  group_by(b) %>%
  mutate(b = if (n() > 1) paste0(b, row_number()) else b) %>%
  left_join(dt, ., by = "a", suffix = c(".x", "")) %>%
  select(-b.x)
#     a  b
# 1 101 M1
# 2 101 M1
# 3 102  P
# 4 102  P
# 5 103 M2
# 6 103 M2

base R

dt2 <- unique(dt[, c("a", "b")])
dt2$b <- ave(dt2$b, dt2$b, FUN = function(z) if (length(z) > 1) paste0(z, seq_along(z)) else z)
dt2
#     a  b
# 1 101 M1
# 3 102  P
# 5 103 M2
merge(subset(dt, select = -b), dt2, by = "a")
#     a  b
# 1 101 M1
# 2 101 M1
# 3 102  P
# 4 102  P
# 5 103 M2
# 6 103 M2


<details>
<summary>英文:</summary>
Sample data.
```r
a &lt;- c(101,101,102,102,103,103)
b &lt;- c(&quot;M&quot;,&quot;M&quot;,&quot;P&quot;,&quot;P&quot;,&quot;M&quot;,&quot;M&quot;)
dt &lt;- data.frame(a, b)

FYI, never use data.frame(cbind(..)) to create a frame: in this case, since at least one of the vectors is character, they will all be character since cbind by default creates matrices (which are limited to one class, unlike frames). It's always better here to use data.frame(..) directly.

Note: for clarity, your "ideal output" shows M,M,P,P,M2,M2, but your previous code block trying to change the first two to M1. I'm basing my code on the assumption that you need the first two to be M1 instead of just M. (For that, akrun's answer is correct, though this metholodogy could be adjusted.)

dplyr

library(dplyr)
dt %&gt;%
  distinct(a, b) %&gt;%
  group_by(b) %&gt;%
  mutate(b = if (n() &gt; 1) paste0(b, row_number()) else b) %&gt;%
  left_join(dt, ., by = &quot;a&quot;, suffix = c(&quot;.x&quot;, &quot;&quot;)) %&gt;%
  select(-b.x)
#     a  b
# 1 101 M1
# 2 101 M1
# 3 102  P
# 4 102  P
# 5 103 M2
# 6 103 M2

base R

dt2 &lt;- unique(dt[, c(&quot;a&quot;, &quot;b&quot;)])
dt2$b &lt;- ave(dt2$b, dt2$b, FUN = function(z) if (length(z) &gt; 1) paste0(z, seq_along(z)) else z)
dt2
#     a  b
# 1 101 M1
# 3 102  P
# 5 103 M2
merge(subset(dt, select = -b), dt2, by = &quot;a&quot;)
#     a  b
# 1 101 M1
# 2 101 M1
# 3 102  P
# 4 102  P
# 5 103 M2
# 6 103 M2

答案2

得分: 3

另一个选项：

在(dt, b[b == 'M'] <- paste0('M', with(rle(a[b == 'M']), rep(seq_along(lengths), lengths))))

输出：

英文:

Another option:

within(dt, b[b == &#39;M&#39;] &lt;- paste0(&#39;M&#39;, with(rle(a[b == &#39;M&#39;]), rep(seq_along(lengths), lengths))))

Output:

答案3

得分: 3

Using rle/inverse.rle from base R

dt$b <- inverse.rle(within.list(rle(dt$b), values <- make.unique(values, sep = "")))

-output

Or using rle in tidyverse

library(dplyr)
library(stringr)
dt %>%
  mutate(b = inverse.rle(within.list(rle(b), 
  values <- str_replace_all(make.unique(values, sep = ""),  
       "(\\d+)", function(x) as.numeric(x) + 1) )))

-output

英文:

Using rle/inverse.rle from base R

dt$b &lt;- inverse.rle(within.list(rle(dt$b), values &lt;- make.unique(values, sep = &quot;&quot;)))

-output

Or using rle in tidyverse

library(dplyr)
library(stringr)
dt %&gt;% 
  mutate(b = inverse.rle(within.list(rle(b), 
  values &lt;- str_replace_all(make.unique(values, sep = &quot;&quot;),  
       &quot;(\\d+)&quot;, function(x) as.numeric(x) + 1) )))

-output

答案4

得分: 0

你可以将 a 视为一个因子，然后检查 levels(a) > 1：

library(dplyr)
df %>%
  group_by(b) %>%
  mutate(n_lvl = length(levels(factor(a))),
         b = paste0(b, ifelse(n_lvl > 1, as.integer(factor(a)), ""))) %>%
  select(-n_lvl)
# 一个 tibble: 6 × 2
# 分组:   b [3]
      a b    
  <dbl> <chr>
1   101 M1   
2   101 M1   
3   102 P    
4   102 P    
5   103 M2   
6   103 M2

来自 @r2evans 的数据：

a <- c(101,101,102,102,103,103)
b <- c("M","M","P","P","M","M")
df <- tibble(a, b)

英文:

You can treat a as a factor and then check for levels(a) > 1:

library(dplyr)
df |&gt; 
  group_by(b) |&gt; 
  mutate(n_lvl = length(levels(factor(a))),
         b = paste0(b, ifelse(n_lvl &gt; 1, as.integer(factor(a)), &quot;&quot;))) |&gt; 
  select(-n_lvl)
# A tibble: 6 &#215; 2
# Groups:   b [3]
      a b    
  &lt;dbl&gt; &lt;chr&gt;
1   101 M1   
2   101 M1   
3   102 P    
4   102 P    
5   103 M2   
6   103 M2

Data from @r2evans

a &lt;- c(101,101,102,102,103,103)
b &lt;- c(&quot;M&quot;,&quot;M&quot;,&quot;P&quot;,&quot;P&quot;,&quot;M&quot;,&quot;M&quot;)
df &lt;- tibble(a, b)
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中重命名列中相同的数值？

问题

答案1

dplyr

base R

dplyr

base R

答案2

答案3

答案4

Key Phrase Search in String (在字符串中搜索关键短语)

无法使用bind_rows来合并由for循环创建的列表输出。

Rvest表格返回空白

Boxplot with additional lines for 10th and 90th percentile in R

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论