在R中有条件地替换匹配值列表的列数值。

huangapple go评论120阅读模式
英文:

Replacing column values that conditionally match a list of values in R

问题

我尝试在一个数据框中替换数值,当它与一个远小于其大小的第二个数据框中的标识符匹配时。下面是我尝试的一个示例:

df1 = data.frame(row = seq(1,6),
                   x = c("a","b","c","d","e","f"))

df2 = data.frame(row = c(5,3,1,15,10),
                 x2 = c("g","h","i","j","k"))

df3 = df1 %>% mutate(x = case_when(
  df1$row == df2$row ~ df2$x2,
  .default = df1$x
))

我试图实现这个操作,即当 df1$row 与 df2$row 匹配时,用 df2$x2 中的值替换 df1$x,否则保留 df1$x。预期输出如下:

df3
  row x
1   1 i
2   2 b
3   3 h
4   4 d
5   5 g
6   6 f

感谢任何帮助。

英文:

I am trying to replace values in one dataframe when it matches an identifier in a second dataframe of a much smaller size. A toy example of what I've tried:

df1 = data.frame(row = seq(1,6),
                   x = c("a","b","c","d","e","f"))

df2 = data.frame(row = c(5,3,1,15,10),
                 x2 = c("g","h","i","j","k"))

df3 = df1 %>% mutate(x = case_when(
  df1$row == df2$row ~ df2$x2,
  .default = df1$x
))

I am attempting this to read, when df1$row matches df2$row, replace df1$x with the value from df2$x2 and otherwise leave df1$x. The expected output:

df3
  row x
1   1 i
2   2 b
3   3 h
4   4 d
5   5 g
6   6 f

Any help appreciated.

答案1

得分: 1

我们可以通过row进行join,然后使用coalesce

library(dplyr)
df1 %>%
    left_join(df2, by = 'row') %>%
    mutate(x = coalesce(x2, x), .keep = 'unused')

row x
1 1 i
2 2 b
3 3 h
4 4 d
5 5 g
6 6 f

英文:

We can join by row, then use coalesce:

library(dplyr)
df1 %>%
    left_join(df2, by = 'row') %>%
    mutate(x = coalesce(x2, x), .keep = 'unused')

  row x
1   1 i
2   2 b
3   3 h
4   4 d
5   5 g
6   6 f

</details>



# 答案2
**得分**: 1

我们可以使用 {powerjoin}

``` r
df1 = data.frame(row = seq(1,6),
                 x = c("a","b","c","d","e","f"))

df2 = data.frame(row = c(5,3,1,15,10),
                 x2 = c("g","h","i","j","k"))

library(powerjoin)
power_left_join(df1, df2 |&gt; dplyr::rename(x = x2), by = "row", conflict = coalesce_yx)
#&gt;   row x
#&gt; 1   1 i
#&gt; 2   2 b
#&gt; 3   3 h
#&gt; 4   4 d
#&gt; 5   5 g
#&gt; 6   6 f

创建于2023年03月17日,使用 reprex v2.0.2

英文:

We might use {powerjoin}

df1 = data.frame(row = seq(1,6),
                 x = c(&quot;a&quot;,&quot;b&quot;,&quot;c&quot;,&quot;d&quot;,&quot;e&quot;,&quot;f&quot;))

df2 = data.frame(row = c(5,3,1,15,10),
                 x2 = c(&quot;g&quot;,&quot;h&quot;,&quot;i&quot;,&quot;j&quot;,&quot;k&quot;))

library(powerjoin)
power_left_join(df1, df2 |&gt; dplyr::rename(x = x2), by = &quot;row&quot;, conflict = coalesce_yx)
#&gt;   row x
#&gt; 1   1 i
#&gt; 2   2 b
#&gt; 3   3 h
#&gt; 4   4 d
#&gt; 5   5 g
#&gt; 6   6 f

<sup>Created on 2023-03-17 with reprex v2.0.2</sup>

答案3

得分: 0

使用dplyr 1.1.0版本:

df1 %>% 
  rows_update(df2 %>% rename(x = x2), unmatched = "ignore")

结果:

匹配,按 = "row"
  行 x
1   1 i
2   2 b
3   3 h
4   4 d
5   5 g
6   6 f

如果两个表具有相同的行名称,会更简单:

df1 %>% 
  rows_update(df2, unmatched = "ignore")
英文:

With dplyr 1.1.0:

df1 %&gt;%
  rows_update(df2 %&gt;% rename(x = x2), unmatched = &quot;ignore&quot;)

Result

Matching, by = &quot;row&quot;
  row x
1   1 i
2   2 b
3   3 h
4   4 d
5   5 g
6   6 f

If both tables had the same rownames it would be simpler:

df1 %&gt;%
  rows_update(df2, unmatched = &quot;ignore&quot;)

huangapple
  • 本文由 发表于 2023年3月4日 05:57:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632211.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定