匹配 pandas 数据框中的值,并用主表中匹配的值替换。

huangapple go评论67阅读模式
英文:

Match values in pandas dataframe and replace with matched values from master table

问题

我想要将主表中的值与映射表中的详细信息进行匹配和替换,而不使用for循环。

主表:

Case Path1 Path2 Path3
1 a c d
2 b c a
3 c a e
4 b d e
5 d b a

映射表:

factor detail
a 样本A
b 样本B
c 样本C
d 样本D
e 样本E
f 样本F

我希望输出如下所示。

结果:

Case Path1 Path2 Path3
1 样本A 样本C 样本D
2 样本B 样本C 样本A
3 样本C 样本A 样本E
4 样本B 样本D 样本E
5 样本D 样本B 样本A
英文:

I would like to match and replace values from Main Table to detail in Mapping Table without using for-loop.

Main Table:

Case Path1 Path2 Path3
1 a c d
2 b c a
3 c a e
4 b d e
5 d b a

Mapping Table:

factor detail
a sample A
b sample B
c sample C
d sample D
e sample E
f sample F

I would like the output to be like this.

Result:

Case Path1 Path2 Path3
1 sample A sample C sample D
2 sample B sample C sample A
3 sample C sample A sample E
4 sample B sample D sample E
5 sample D sample B sample A

答案1

得分: 2

你可以使用replace

# df -> 主表
# dmap -> 映射表

cols = df.filter(like='Path').columns
df[cols] = df[cols].replace(dmap.set_index('factor')['detail'])
print(df)

# 输出
   Case     Path1     Path2     Path3
0     1  sample A  sample C  sample D
1     2  sample B  sample C  sample A
2     3  sample C  sample A  sample E
3     4  sample B  sample D  sample E
4     5  sample D  sample B  sample A
英文:

You can use replace:

# df -> main table
# dmap -> map table

cols = df.filter(like='Path').columns
df[cols] = df[cols].replace(dmap.set_index('factor')['detail'])
print(df)

# Output
   Case     Path1     Path2     Path3
0     1  sample A  sample C  sample D
1     2  sample B  sample C  sample A
2     3  sample C  sample A  sample E
3     4  sample B  sample D  sample E
4     5  sample D  sample B  sample A

答案2

得分: 1

Use Series.map by all columns without first - if no match get NaNs:

df1.iloc[:, 1:] = df1.iloc[:, 1:].apply(lambda x: x.map(df2.set_index('factor')['detail']))

Or DataFrame.replace - if no match get original value:

df1.iloc[:, 1:] = df1.iloc[:, 1:].replace(df2.set_index('factor')['detail'])
print (df1)
   Case     Path1     Path2     Path3
0     1  sample A  sample C  sample D
1     2  sample B  sample C  sample A
2     3  sample C  sample A  sample E
3     4  sample B  sample D  sample E
4     5  sample D  sample B  sample A

If want update only columns starting Path use Series.update with DataFrame.filter and DataFrame.replace:

df1.update(df1.filter(regex=r'^Path').replace(df2.set_index('factor')['detail']))
print (df1)
   Case     Path1     Path2     Path3
0     1  sample A  sample C  sample D
1     2  sample B  sample C  sample A
2     3  sample C  sample A  sample E
3     4  sample B  sample D  sample E
4     5  sample D  sample B  sample A
英文:

Use Series.map by all columns without first - if no match get NaNs:

df1.iloc[:, 1:] = df1.iloc[:, 1:].apply(lambda x: x.map(df2.set_index('factor')['detail']))

Or DataFrame.replace - if no match get original value:

df1.iloc[:, 1:] = df1.iloc[:, 1:].replace(df2.set_index('factor')['detail'])
print (df1)
   Case     Path1     Path2     Path3
0     1  sample A  sample C  sample D
1     2  sample B  sample C  sample A
2     3  sample C  sample A  sample E
3     4  sample B  sample D  sample E
4     5  sample D  sample B  sample A

If want update only columns starting Path use Series.update with DataFrame.filter and DataFrame.replace:

df1.update(df1.filter(regex=r'^Path').replace(df2.set_index('factor')['detail']))
print (df1)
   Case     Path1     Path2     Path3
0     1  sample A  sample C  sample D
1     2  sample B  sample C  sample A
2     3  sample C  sample A  sample E
3     4  sample B  sample D  sample E
4     5  sample D  sample B  sample A

huangapple
  • 本文由 发表于 2023年4月11日 14:51:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75983120.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定