匹配 pandas 数据框中的值,并用主表中匹配的值替换。

huangapple go评论96阅读模式
英文:

Match values in pandas dataframe and replace with matched values from master table

问题

我想要将主表中的值与映射表中的详细信息进行匹配和替换,而不使用for循环。

主表:

Case Path1 Path2 Path3
1 a c d
2 b c a
3 c a e
4 b d e
5 d b a

映射表:

factor detail
a 样本A
b 样本B
c 样本C
d 样本D
e 样本E
f 样本F

我希望输出如下所示。

结果:

Case Path1 Path2 Path3
1 样本A 样本C 样本D
2 样本B 样本C 样本A
3 样本C 样本A 样本E
4 样本B 样本D 样本E
5 样本D 样本B 样本A
英文:

I would like to match and replace values from Main Table to detail in Mapping Table without using for-loop.

Main Table:

Case Path1 Path2 Path3
1 a c d
2 b c a
3 c a e
4 b d e
5 d b a

Mapping Table:

factor detail
a sample A
b sample B
c sample C
d sample D
e sample E
f sample F

I would like the output to be like this.

Result:

Case Path1 Path2 Path3
1 sample A sample C sample D
2 sample B sample C sample A
3 sample C sample A sample E
4 sample B sample D sample E
5 sample D sample B sample A

答案1

得分: 2

你可以使用replace

  1. # df -> 主表
  2. # dmap -> 映射表
  3. cols = df.filter(like='Path').columns
  4. df[cols] = df[cols].replace(dmap.set_index('factor')['detail'])
  5. print(df)
  6. # 输出
  7. Case Path1 Path2 Path3
  8. 0 1 sample A sample C sample D
  9. 1 2 sample B sample C sample A
  10. 2 3 sample C sample A sample E
  11. 3 4 sample B sample D sample E
  12. 4 5 sample D sample B sample A
英文:

You can use replace:

  1. # df -> main table
  2. # dmap -> map table
  3. cols = df.filter(like='Path').columns
  4. df[cols] = df[cols].replace(dmap.set_index('factor')['detail'])
  5. print(df)
  6. # Output
  7. Case Path1 Path2 Path3
  8. 0 1 sample A sample C sample D
  9. 1 2 sample B sample C sample A
  10. 2 3 sample C sample A sample E
  11. 3 4 sample B sample D sample E
  12. 4 5 sample D sample B sample A

答案2

得分: 1

Use Series.map by all columns without first - if no match get NaNs:

  1. df1.iloc[:, 1:] = df1.iloc[:, 1:].apply(lambda x: x.map(df2.set_index('factor')['detail']))

Or DataFrame.replace - if no match get original value:

  1. df1.iloc[:, 1:] = df1.iloc[:, 1:].replace(df2.set_index('factor')['detail'])
  2. print (df1)
  3. Case Path1 Path2 Path3
  4. 0 1 sample A sample C sample D
  5. 1 2 sample B sample C sample A
  6. 2 3 sample C sample A sample E
  7. 3 4 sample B sample D sample E
  8. 4 5 sample D sample B sample A

If want update only columns starting Path use Series.update with DataFrame.filter and DataFrame.replace:

  1. df1.update(df1.filter(regex=r'^Path').replace(df2.set_index('factor')['detail']))
  2. print (df1)
  3. Case Path1 Path2 Path3
  4. 0 1 sample A sample C sample D
  5. 1 2 sample B sample C sample A
  6. 2 3 sample C sample A sample E
  7. 3 4 sample B sample D sample E
  8. 4 5 sample D sample B sample A
英文:

Use Series.map by all columns without first - if no match get NaNs:

  1. df1.iloc[:, 1:] = df1.iloc[:, 1:].apply(lambda x: x.map(df2.set_index('factor')['detail']))

Or DataFrame.replace - if no match get original value:

  1. df1.iloc[:, 1:] = df1.iloc[:, 1:].replace(df2.set_index('factor')['detail'])
  2. print (df1)
  3. Case Path1 Path2 Path3
  4. 0 1 sample A sample C sample D
  5. 1 2 sample B sample C sample A
  6. 2 3 sample C sample A sample E
  7. 3 4 sample B sample D sample E
  8. 4 5 sample D sample B sample A

If want update only columns starting Path use Series.update with DataFrame.filter and DataFrame.replace:

  1. df1.update(df1.filter(regex=r'^Path').replace(df2.set_index('factor')['detail']))
  2. print (df1)
  3. Case Path1 Path2 Path3
  4. 0 1 sample A sample C sample D
  5. 1 2 sample B sample C sample A
  6. 2 3 sample C sample A sample E
  7. 3 4 sample B sample D sample E
  8. 4 5 sample D sample B sample A

huangapple
  • 本文由 发表于 2023年4月11日 14:51:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75983120.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定