Substitute specific values in a dataframe by matching strings stored in another dataframe

huangapple go评论65阅读模式
英文:

Substitute specific values in a dataframe by matching strings stored in another dataframe

问题

你可以使用R中的dplyr包来实现这个任务,以下是如何用一行代码替换mydf中的1、2和3的值为match_df中的名称(将值0替换为NA):

mydf[] <- match_df$name[mydf[]]

这行代码会将mydf中的所有列都替换为match_df中对应的name,并将0替换为NA。

英文:

Say I have a data frame like the following:

mydf=data.frame(id=LETTERS, value=runif(26,0,1), match1=sample(c(0,1),26,replace=T), match2=sample(c(0,2),26,replace=T), match3=sample(c(0,3),26,replace=T), all_matches=sample(0:3,26,replace=T))

which looks like:

&gt; mydf
   id       value match1 match2 match3 all_matches
1   A 0.267675256      1      0      0           0
2   B 0.974518682      1      0      3           3
3   C 0.175529131      1      2      3           0
4   D 0.050552174      0      2      0           0
5   E 0.228286981      0      0      0           1
6   F 0.025520208      0      2      3           1
7   G 0.206697937      1      2      0           2
8   H 0.644523511      0      2      3           2
9   I 0.342110147      0      0      3           3
10  J 0.430250450      1      0      0           1
...

match1 column has 0 and 1 values, match2 has 0 and 2 values, match3 0 and 3, and all_matches values from 0 to 3.

The only thing I want to do here is to rewrite 1, 2, and 3 values in those columns by a name associated to those values, and stored in another data frame:

match_df=data.frame(match=1:3, name=c(&#39;ABC&#39;,&#39;XYZ&#39;,&#39;IJK&#39;))

which looks like this:

&gt; match_df
  match name
1     1  ABC
2     2  XYZ
3     3  IJK

What would be the best way to replace values 1, 2, 3 in columns match1, match2, match3, all_matches in mydf by names in match_df (leaving value 0 as NA)?

So far I'm merging match_df to each column of interest in mydf in a for loop, but I'm sure this can be done better in one line of code.

Any help appreciated! Thanks!

答案1

得分: 4

mydf %>%
mutate(across(contains('match'),~recode(.x,!!!deframe(match_df))))

id value match1 match2 match3 all_matches
1 A 0.26767526 ABC <NA> <NA> <NA>
2 B 0.97451868 ABC <NA> IJK IJK
3 C 0.17552913 ABC XYZ IJK <NA>
4 D 0.05055217 <NA> XYZ <NA> <NA>
5 E 0.22828698 <NA> <NA> <NA> ABC
6 F 0.02552021 <NA> XYZ IJK ABC
7 G 0.20669794 ABC XYZ <NA> XYZ
8 H 0.64452351 <NA> XYZ IJK XYZ
9 I 0.34211015 <NA> <NA> IJK IJK
10 J 0.43025045 ABC <NA> <NA> ABC

英文:
mydf %&gt;%
  mutate(across(contains(&#39;match&#39;),~recode(.x,!!!deframe(match_df))))

   id      value match1 match2 match3 all_matches
1   A 0.26767526    ABC   &lt;NA&gt;   &lt;NA&gt;        &lt;NA&gt;
2   B 0.97451868    ABC   &lt;NA&gt;    IJK         IJK
3   C 0.17552913    ABC    XYZ    IJK        &lt;NA&gt;
4   D 0.05055217   &lt;NA&gt;    XYZ   &lt;NA&gt;        &lt;NA&gt;
5   E 0.22828698   &lt;NA&gt;   &lt;NA&gt;   &lt;NA&gt;         ABC
6   F 0.02552021   &lt;NA&gt;    XYZ    IJK         ABC
7   G 0.20669794    ABC    XYZ   &lt;NA&gt;         XYZ
8   H 0.64452351   &lt;NA&gt;    XYZ    IJK         XYZ
9   I 0.34211015   &lt;NA&gt;   &lt;NA&gt;    IJK         IJK
10  J 0.43025045    ABC   &lt;NA&gt;   &lt;NA&gt;         ABC

答案2

得分: 2

一行代码使用`match`函数:
```r
mydf[-c(1,2)] <- match_df$name[match(unlist(mydf[-c(1,2)]), match_df$match)]

输出:

#    id      value match1 match2 match3 all_matches
# 1   A 0.17599087    ABC   <NA>   <NA>        <NA>
# 2   B 0.45899500   <NA>    XYZ   <NA>         XYZ
# 3   C 0.12762547    ABC   <NA>   <NA>         XYZ
# 4   D 0.67893265   <NA>    XYZ    IJK         IJK
# 5   E 0.64393827   <NA>   <NA>   <NA>        <NA>
# 6   F 0.93755603   <NA>   <NA>   <NA>         ABC
# 7   G 0.70161939    ABC    XYZ   <NA>        <NA>
# 8   H 0.81897072   <NA>   <NA>    IJK         XYZ
# 9   I 0.26734462   <NA>    XYZ    IJK         ABC
# 10  J 0.03569294   <NA>    XYZ    IJK        <NA>
# 11  K 0.08168074   <NA>   <NA>    IJK         IJK
# 12  L 0.67863032   <NA>   <NA>    IJK         ABC
# 13  M 0.79585738   <NA>    XYZ   <NA>         IJK
# 14  N 0.48506734    ABC    XYZ   <NA>         IJK
# 15  O 0.56177191    ABC   <NA>    IJK        <NA>
# 16  P 0.50113968    ABC    XYZ   <NA>        <NA>
# 17  Q 0.74527715   <NA>   <NA>   <NA>         XYZ
# 18  R 0.64572526   <NA>   <NA>   <NA>        <NA>
# 19  S 0.27640699   <NA>    XYZ    IJK         XYZ
# 20  T 0.76158656   <NA>    XYZ   <NA>         XYZ
# 21  U 0.44533420   <NA>   <NA>    IJK         IJK
# 22  V 0.17232906   <NA>   <NA>    IJK        <NA>
# 23  W 0.87758234    ABC    XYZ   <NA>         ABC
# 24  X 0.15478237   <NA>   <NA>    IJK        <NA>
# 25  Y 0.80055561   <NA>    XYZ    IJK         XYZ
# 26  Z 0.80190420    ABC   <NA>    IJK         ABC
英文:

A one-liner with match:

mydf[-c(1,2)] &lt;- match_df$name[match(unlist(mydf[-c(1,2)]), match_df$match)]

output

#    id      value match1 match2 match3 all_matches
# 1   A 0.17599087    ABC   &lt;NA&gt;   &lt;NA&gt;        &lt;NA&gt;
# 2   B 0.45899500   &lt;NA&gt;    XYZ   &lt;NA&gt;         XYZ
# 3   C 0.12762547    ABC   &lt;NA&gt;   &lt;NA&gt;         XYZ
# 4   D 0.67893265   &lt;NA&gt;    XYZ    IJK         IJK
# 5   E 0.64393827   &lt;NA&gt;   &lt;NA&gt;   &lt;NA&gt;        &lt;NA&gt;
# 6   F 0.93755603   &lt;NA&gt;   &lt;NA&gt;   &lt;NA&gt;         ABC
# 7   G 0.70161939    ABC    XYZ   &lt;NA&gt;        &lt;NA&gt;
# 8   H 0.81897072   &lt;NA&gt;   &lt;NA&gt;    IJK         XYZ
# 9   I 0.26734462   &lt;NA&gt;    XYZ    IJK         ABC
# 10  J 0.03569294   &lt;NA&gt;    XYZ    IJK        &lt;NA&gt;
# 11  K 0.08168074   &lt;NA&gt;   &lt;NA&gt;    IJK         IJK
# 12  L 0.67863032   &lt;NA&gt;   &lt;NA&gt;    IJK         ABC
# 13  M 0.79585738   &lt;NA&gt;    XYZ   &lt;NA&gt;         IJK
# 14  N 0.48506734    ABC    XYZ   &lt;NA&gt;         IJK
# 15  O 0.56177191    ABC   &lt;NA&gt;    IJK        &lt;NA&gt;
# 16  P 0.50113968    ABC    XYZ   &lt;NA&gt;        &lt;NA&gt;
# 17  Q 0.74527715   &lt;NA&gt;   &lt;NA&gt;   &lt;NA&gt;         XYZ
# 18  R 0.64572526   &lt;NA&gt;   &lt;NA&gt;   &lt;NA&gt;        &lt;NA&gt;
# 19  S 0.27640699   &lt;NA&gt;    XYZ    IJK         XYZ
# 20  T 0.76158656   &lt;NA&gt;    XYZ   &lt;NA&gt;         XYZ
# 21  U 0.44533420   &lt;NA&gt;   &lt;NA&gt;    IJK         IJK
# 22  V 0.17232906   &lt;NA&gt;   &lt;NA&gt;    IJK        &lt;NA&gt;
# 23  W 0.87758234    ABC    XYZ   &lt;NA&gt;         ABC
# 24  X 0.15478237   &lt;NA&gt;   &lt;NA&gt;    IJK        &lt;NA&gt;
# 25  Y 0.80055561   &lt;NA&gt;    XYZ    IJK         XYZ
# 26  Z 0.80190420    ABC   &lt;NA&gt;    IJK         ABC

huangapple
  • 本文由 发表于 2023年2月6日 15:59:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75358666.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定