英文:
Match values in pandas dataframe and replace with matched values from master table
问题
我想要将主表中的值与映射表中的详细信息进行匹配和替换,而不使用for循环。
主表:
| Case | Path1 | Path2 | Path3 |
|---|---|---|---|
| 1 | a | c | d |
| 2 | b | c | a |
| 3 | c | a | e |
| 4 | b | d | e |
| 5 | d | b | a |
映射表:
| factor | detail |
|---|---|
| a | 样本A |
| b | 样本B |
| c | 样本C |
| d | 样本D |
| e | 样本E |
| f | 样本F |
我希望输出如下所示。
结果:
| Case | Path1 | Path2 | Path3 |
|---|---|---|---|
| 1 | 样本A | 样本C | 样本D |
| 2 | 样本B | 样本C | 样本A |
| 3 | 样本C | 样本A | 样本E |
| 4 | 样本B | 样本D | 样本E |
| 5 | 样本D | 样本B | 样本A |
英文:
I would like to match and replace values from Main Table to detail in Mapping Table without using for-loop.
Main Table:
| Case | Path1 | Path2 | Path3 |
|---|---|---|---|
| 1 | a | c | d |
| 2 | b | c | a |
| 3 | c | a | e |
| 4 | b | d | e |
| 5 | d | b | a |
Mapping Table:
| factor | detail |
|---|---|
| a | sample A |
| b | sample B |
| c | sample C |
| d | sample D |
| e | sample E |
| f | sample F |
I would like the output to be like this.
Result:
| Case | Path1 | Path2 | Path3 |
|---|---|---|---|
| 1 | sample A | sample C | sample D |
| 2 | sample B | sample C | sample A |
| 3 | sample C | sample A | sample E |
| 4 | sample B | sample D | sample E |
| 5 | sample D | sample B | sample A |
答案1
得分: 2
你可以使用replace:
# df -> 主表
# dmap -> 映射表
cols = df.filter(like='Path').columns
df[cols] = df[cols].replace(dmap.set_index('factor')['detail'])
print(df)
# 输出
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
英文:
You can use replace:
# df -> main table
# dmap -> map table
cols = df.filter(like='Path').columns
df[cols] = df[cols].replace(dmap.set_index('factor')['detail'])
print(df)
# Output
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
答案2
得分: 1
Use Series.map by all columns without first - if no match get NaNs:
df1.iloc[:, 1:] = df1.iloc[:, 1:].apply(lambda x: x.map(df2.set_index('factor')['detail']))
Or DataFrame.replace - if no match get original value:
df1.iloc[:, 1:] = df1.iloc[:, 1:].replace(df2.set_index('factor')['detail'])
print (df1)
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
If want update only columns starting Path use Series.update with DataFrame.filter and DataFrame.replace:
df1.update(df1.filter(regex=r'^Path').replace(df2.set_index('factor')['detail']))
print (df1)
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
英文:
Use Series.map by all columns without first - if no match get NaNs:
df1.iloc[:, 1:] = df1.iloc[:, 1:].apply(lambda x: x.map(df2.set_index('factor')['detail']))
Or DataFrame.replace - if no match get original value:
df1.iloc[:, 1:] = df1.iloc[:, 1:].replace(df2.set_index('factor')['detail'])
print (df1)
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
If want update only columns starting Path use Series.update with DataFrame.filter and DataFrame.replace:
df1.update(df1.filter(regex=r'^Path').replace(df2.set_index('factor')['detail']))
print (df1)
Case Path1 Path2 Path3
0 1 sample A sample C sample D
1 2 sample B sample C sample A
2 3 sample C sample A sample E
3 4 sample B sample D sample E
4 5 sample D sample B sample A
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论