英文:
Match values in pandas dataframe and replace with matched values from master table
问题
我想要将主表中的值与映射表中的详细信息进行匹配和替换,而不使用for循环。
主表:
Case | Path1 | Path2 | Path3 |
---|---|---|---|
1 | a | c | d |
2 | b | c | a |
3 | c | a | e |
4 | b | d | e |
5 | d | b | a |
映射表:
factor | detail |
---|---|
a | 样本A |
b | 样本B |
c | 样本C |
d | 样本D |
e | 样本E |
f | 样本F |
我希望输出如下所示。
结果:
Case | Path1 | Path2 | Path3 |
---|---|---|---|
1 | 样本A | 样本C | 样本D |
2 | 样本B | 样本C | 样本A |
3 | 样本C | 样本A | 样本E |
4 | 样本B | 样本D | 样本E |
5 | 样本D | 样本B | 样本A |
英文:
I would like to match and replace values from Main Table to detail in Mapping Table without using for-loop.
Main Table:
Case | Path1 | Path2 | Path3 |
---|---|---|---|
1 | a | c | d |
2 | b | c | a |
3 | c | a | e |
4 | b | d | e |
5 | d | b | a |
Mapping Table:
factor | detail |
---|---|
a | sample A |
b | sample B |
c | sample C |
d | sample D |
e | sample E |
f | sample F |
I would like the output to be like this.
Result:
Case | Path1 | Path2 | Path3 |
---|---|---|---|
1 | sample A | sample C | sample D |
2 | sample B | sample C | sample A |
3 | sample C | sample A | sample E |
4 | sample B | sample D | sample E |
5 | sample D | sample B | sample A |
答案1
得分: 2
你可以使用replace
:
# df -> 主表
# dmap -> 映射表
cols = df.filter(like='Path').columns
df[cols] = df[cols].replace(dmap.set_index('factor')['detail'])
print(df)
# 输出
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
英文:
You can use replace
:
# df -> main table
# dmap -> map table
cols = df.filter(like='Path').columns
df[cols] = df[cols].replace(dmap.set_index('factor')['detail'])
print(df)
# Output
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
答案2
得分: 1
Use Series.map
by all columns without first - if no match get NaN
s:
df1.iloc[:, 1:] = df1.iloc[:, 1:].apply(lambda x: x.map(df2.set_index('factor')['detail']))
Or DataFrame.replace
- if no match get original value:
df1.iloc[:, 1:] = df1.iloc[:, 1:].replace(df2.set_index('factor')['detail'])
print (df1)
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
If want update only columns starting Path
use Series.update
with DataFrame.filter
and DataFrame.replace
:
df1.update(df1.filter(regex=r'^Path').replace(df2.set_index('factor')['detail']))
print (df1)
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
英文:
Use Series.map
by all columns without first - if no match get NaN
s:
df1.iloc[:, 1:] = df1.iloc[:, 1:].apply(lambda x: x.map(df2.set_index('factor')['detail']))
Or DataFrame.replace
- if no match get original value:
df1.iloc[:, 1:] = df1.iloc[:, 1:].replace(df2.set_index('factor')['detail'])
print (df1)
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
If want update only columns starting Path
use Series.update
with DataFrame.filter
and DataFrame.replace
:
df1.update(df1.filter(regex=r'^Path').replace(df2.set_index('factor')['detail']))
print (df1)
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论