英文:
Substitute specific values in a dataframe by matching strings stored in another dataframe
问题
你可以使用R中的dplyr
包来实现这个任务,以下是如何用一行代码替换mydf
中的1、2和3的值为match_df
中的名称(将值0替换为NA):
mydf[] <- match_df$name[mydf[]]
这行代码会将mydf
中的所有列都替换为match_df
中对应的name
,并将0替换为NA。
英文:
Say I have a data frame like the following:
mydf=data.frame(id=LETTERS, value=runif(26,0,1), match1=sample(c(0,1),26,replace=T), match2=sample(c(0,2),26,replace=T), match3=sample(c(0,3),26,replace=T), all_matches=sample(0:3,26,replace=T))
which looks like:
> mydf
id value match1 match2 match3 all_matches
1 A 0.267675256 1 0 0 0
2 B 0.974518682 1 0 3 3
3 C 0.175529131 1 2 3 0
4 D 0.050552174 0 2 0 0
5 E 0.228286981 0 0 0 1
6 F 0.025520208 0 2 3 1
7 G 0.206697937 1 2 0 2
8 H 0.644523511 0 2 3 2
9 I 0.342110147 0 0 3 3
10 J 0.430250450 1 0 0 1
...
match1
column has 0 and 1 values, match2
has 0 and 2 values, match3
0 and 3, and all_matches
values from 0 to 3.
The only thing I want to do here is to rewrite 1, 2, and 3 values in those columns by a name
associated to those values, and stored in another data frame:
match_df=data.frame(match=1:3, name=c('ABC','XYZ','IJK'))
which looks like this:
> match_df
match name
1 1 ABC
2 2 XYZ
3 3 IJK
What would be the best way to replace values 1, 2, 3 in columns match1
, match2
, match3
, all_matches
in mydf
by names
in match_df
(leaving value 0 as NA
)?
So far I'm merging match_df
to each column of interest in mydf
in a for loop, but I'm sure this can be done better in one line of code.
Any help appreciated! Thanks!
答案1
得分: 4
mydf %>%
mutate(across(contains('match'),~recode(.x,!!!deframe(match_df))))
id value match1 match2 match3 all_matches
1 A 0.26767526 ABC <NA> <NA> <NA>
2 B 0.97451868 ABC <NA> IJK IJK
3 C 0.17552913 ABC XYZ IJK <NA>
4 D 0.05055217 <NA> XYZ <NA> <NA>
5 E 0.22828698 <NA> <NA> <NA> ABC
6 F 0.02552021 <NA> XYZ IJK ABC
7 G 0.20669794 ABC XYZ <NA> XYZ
8 H 0.64452351 <NA> XYZ IJK XYZ
9 I 0.34211015 <NA> <NA> IJK IJK
10 J 0.43025045 ABC <NA> <NA> ABC
英文:
mydf %>%
mutate(across(contains('match'),~recode(.x,!!!deframe(match_df))))
id value match1 match2 match3 all_matches
1 A 0.26767526 ABC <NA> <NA> <NA>
2 B 0.97451868 ABC <NA> IJK IJK
3 C 0.17552913 ABC XYZ IJK <NA>
4 D 0.05055217 <NA> XYZ <NA> <NA>
5 E 0.22828698 <NA> <NA> <NA> ABC
6 F 0.02552021 <NA> XYZ IJK ABC
7 G 0.20669794 ABC XYZ <NA> XYZ
8 H 0.64452351 <NA> XYZ IJK XYZ
9 I 0.34211015 <NA> <NA> IJK IJK
10 J 0.43025045 ABC <NA> <NA> ABC
答案2
得分: 2
一行代码使用`match`函数:
```r
mydf[-c(1,2)] <- match_df$name[match(unlist(mydf[-c(1,2)]), match_df$match)]
输出:
# id value match1 match2 match3 all_matches
# 1 A 0.17599087 ABC <NA> <NA> <NA>
# 2 B 0.45899500 <NA> XYZ <NA> XYZ
# 3 C 0.12762547 ABC <NA> <NA> XYZ
# 4 D 0.67893265 <NA> XYZ IJK IJK
# 5 E 0.64393827 <NA> <NA> <NA> <NA>
# 6 F 0.93755603 <NA> <NA> <NA> ABC
# 7 G 0.70161939 ABC XYZ <NA> <NA>
# 8 H 0.81897072 <NA> <NA> IJK XYZ
# 9 I 0.26734462 <NA> XYZ IJK ABC
# 10 J 0.03569294 <NA> XYZ IJK <NA>
# 11 K 0.08168074 <NA> <NA> IJK IJK
# 12 L 0.67863032 <NA> <NA> IJK ABC
# 13 M 0.79585738 <NA> XYZ <NA> IJK
# 14 N 0.48506734 ABC XYZ <NA> IJK
# 15 O 0.56177191 ABC <NA> IJK <NA>
# 16 P 0.50113968 ABC XYZ <NA> <NA>
# 17 Q 0.74527715 <NA> <NA> <NA> XYZ
# 18 R 0.64572526 <NA> <NA> <NA> <NA>
# 19 S 0.27640699 <NA> XYZ IJK XYZ
# 20 T 0.76158656 <NA> XYZ <NA> XYZ
# 21 U 0.44533420 <NA> <NA> IJK IJK
# 22 V 0.17232906 <NA> <NA> IJK <NA>
# 23 W 0.87758234 ABC XYZ <NA> ABC
# 24 X 0.15478237 <NA> <NA> IJK <NA>
# 25 Y 0.80055561 <NA> XYZ IJK XYZ
# 26 Z 0.80190420 ABC <NA> IJK ABC
英文:
A one-liner with match
:
mydf[-c(1,2)] <- match_df$name[match(unlist(mydf[-c(1,2)]), match_df$match)]
output
# id value match1 match2 match3 all_matches
# 1 A 0.17599087 ABC <NA> <NA> <NA>
# 2 B 0.45899500 <NA> XYZ <NA> XYZ
# 3 C 0.12762547 ABC <NA> <NA> XYZ
# 4 D 0.67893265 <NA> XYZ IJK IJK
# 5 E 0.64393827 <NA> <NA> <NA> <NA>
# 6 F 0.93755603 <NA> <NA> <NA> ABC
# 7 G 0.70161939 ABC XYZ <NA> <NA>
# 8 H 0.81897072 <NA> <NA> IJK XYZ
# 9 I 0.26734462 <NA> XYZ IJK ABC
# 10 J 0.03569294 <NA> XYZ IJK <NA>
# 11 K 0.08168074 <NA> <NA> IJK IJK
# 12 L 0.67863032 <NA> <NA> IJK ABC
# 13 M 0.79585738 <NA> XYZ <NA> IJK
# 14 N 0.48506734 ABC XYZ <NA> IJK
# 15 O 0.56177191 ABC <NA> IJK <NA>
# 16 P 0.50113968 ABC XYZ <NA> <NA>
# 17 Q 0.74527715 <NA> <NA> <NA> XYZ
# 18 R 0.64572526 <NA> <NA> <NA> <NA>
# 19 S 0.27640699 <NA> XYZ IJK XYZ
# 20 T 0.76158656 <NA> XYZ <NA> XYZ
# 21 U 0.44533420 <NA> <NA> IJK IJK
# 22 V 0.17232906 <NA> <NA> IJK <NA>
# 23 W 0.87758234 ABC XYZ <NA> ABC
# 24 X 0.15478237 <NA> <NA> IJK <NA>
# 25 Y 0.80055561 <NA> XYZ IJK XYZ
# 26 Z 0.80190420 ABC <NA> IJK ABC
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论