合并两个满足条件的数据框。

huangapple go评论111阅读模式
英文:

Merge two dataframes where requirements are met

问题

我有两个数据框,我想在满足特定条件的情况下将它们合并。

我想将DF2合并到DF1中,其中客户端MAC和AP MAC匹配,但同时"Description"包含"AP",且"Client VLAN"等于"1234"。然后,我想用相应的"Interface"填充"AP Interface"。

我的实际DF1有超过10,000行,而DF2通常只有300行或更少。

DF1:

Switch Interface Description Client VLAN Client MAC
SW1 Gi1/0/1 AP Port 1234 1234.1234.1234
SW1 Gi1/0/1 AP Port 2344 3210.3210.3210
SW1 Gi1/0/2 AP Port 1234 3456.3456.3456
SW2 Gi2/0/5 Computer 2343 6543.6543.6543
SW2 Gi2/0/8 Trunk 1234 2345.2345.2345
SW3 Te1/1/1 AP Port 1234 7890.7890.7890
SW4 Gi3/0/6 AP Port 1234 2345.2345.2345

DF2:

AP Name AP Intf AP MAC
AP1 1234.1234.1234
AP2 3456.3456.3456
AP3 2345.2345.2345
AP4 7890.7890.7890

期望的输出:

Switch Interface Description Client VLAN Client MAC AP Name AP Intf AP MAC
SW1 Gi1/0/1 AP Port 1234 1234.1234.1234 AP1 Gi1/0/1 1234.1234.1234
SW1 Gi1/0/1 AP Port 2344 3210.3210.3210
SW1 Gi1/0/2 AP Port 1234 3456.3456.3456 AP2 Gi1/0/2 3456.3456.3456
SW2 Gi2/0/5 Computer 2343 6543.6543.6543
SW2 Gi2/0/8 Trunk 1234 2345.2345.2345
SW3 Te1/1/1 AP Port 1234 7890.7890.7890 AP4 Te1/1/1 7890.7890.7890
SW4 Gi3/0/6 AP Port 1234 2345.2345.2345 AP3 Gi3/0/6 2345.2345.2345

我正在使用以下代码,但输出结果不正确:

  1. def helper(data, col): return data.groupby(col).cumcount()
  2. df1['Client MAC'] = df1['Client MAC'].fillna("")
  3. df2['AP MAC'] = df2['AP MAC'].fillna("")
  4. df = df1.merge(df2, left_on=['Client MAC', helper(df1, ['Client MAC'])], right_on=['AP MAC', helper(df2, \
  5. ['AP MAC'])], how='left').drop(columns='key_1')

当前输出:

Switch Interface Description Client VLAN Client MAC AP Name AP Intf AP MAC
SW1 Gi1/0/1 AP Port 1234 1234.1234.1234 AP1 Gi1/0/1 1234.1234.1234
SW1 Gi1/0/1 AP Port 2344 3210.3210.3210
SW1 Gi1/0/2 AP Port 1234 3456.3456.3456 AP2 Gi1/0/2 3456.3456.3456
SW2 Gi2/0/5 Computer 2343 6543.6543.6543
SW2 Gi2/0/8 Trunk 1234 2345.2345.2345 AP3 Gi3/0/6 2345.2345.2345
SW3 Te1/1/1 AP Port 1234 7890.7890.7890 AP4 Te1/1/1 7890.7890.7890
SW4 Gi3/0/6 AP Port 1234 2345.2345.2345
英文:

I have two dataframes that I am trying to merge where certain criteria is met.

I want to merge DF2 into DF1 where the Client MAC and AP MAC match but also where the 'Description' contains 'AP' and the 'Client VLAN' is equal to '1234'. I then want to populate the 'AP Interface' with the corresponding 'Interface'.

My actual DF1 is over 10k rows and DF2 is typically 300 or less.

DF1:

Switch Interface Description Client VLAN Client MAC
SW1 Gi1/0/1 AP Port 1234 1234.1234.1234
SW1 Gi1/0/1 AP Port 2344 3210.3210.3210
SW1 Gi1/0/2 AP Port 1234 3456.3456.3456
SW2 Gi2/0/5 Computer 2343 6543.6543.6543
SW2 Gi2/0/8 Trunk 1234 2345.2345.2345
SW3 Te1/1/1 AP Port 1234 7890.7890.7890
SW4 Gi3/0/6 AP Port 1234 2345.2345.2345

DF2:

AP Name AP Intf AP MAC
AP1 1234.1234.1234
AP2 3456.3456.3456
AP3 2345.2345.2345
AP4 7890.7890.7890

Desired output:

Switch Interface Description Client VLAN Client MAC AP Name AP Intf AP MAC
SW1 Gi1/0/1 AP Port 1234 1234.1234.1234 AP1 Gi1/0/1 1234.1234.1234
SW1 Gi1/0/1 AP Port 2344 3210.3210.3210
SW1 Gi1/0/2 AP Port 1234 3456.3456.3456 AP2 Gi1/0/2 3456.3456.3456
SW2 Gi2/0/5 Computer 2343 6543.6543.6543
SW2 Gi2/0/8 Trunk 1234 2345.2345.2345
SW3 Te1/1/1 AP Port 1234 7890.7890.7890 AP4 Te1/1/1 7890.7890.7890
SW4 Gi3/0/6 AP Port 1234 2345.2345.2345 AP3 Gi3/0/6 2345.2345.2345

The code I'm using that is giving me the below output:

  1. def helper(data, col): return data.groupby(col).cumcount()
  2. df1['Client MAC'] = df1['Client MAC'].fillna("")
  3. df2['AP MAC'] = df2['AP MAC'].fillna("")
  4. df = df1.merge(df2, left_on=['Client MAC', helper(df1, ['Client MAC'])], right_on=['AP MAC', helper(df2, \
  5. ['AP MAC'])], how='left').drop(columns='key_1')

Current output:

Switch Interface Description Client VLAN Client MAC AP Name AP Intf AP MAC
SW1 Gi1/0/1 AP Port 1234 1234.1234.1234 AP1 Gi1/0/1 1234.1234.1234
SW1 Gi1/0/1 AP Port 2344 3210.3210.3210
SW1 Gi1/0/2 AP Port 1234 3456.3456.3456 AP2 Gi1/0/2 3456.3456.3456
SW2 Gi2/0/5 Computer 2343 6543.6543.6543
SW2 Gi2/0/8 Trunk 1234 2345.2345.2345 AP3 Gi3/0/6 2345.2345.2345
SW3 Te1/1/1 AP Port 1234 7890.7890.7890 AP4 Te1/1/1 7890.7890.7890
SW4 Gi3/0/6 AP Port 1234 2345.2345.2345

答案1

得分: 3

使用merge + mask操作:

  1. df = df1.merge(df2, left_on='Client MAC', right_on='AP MAC', how='left')
  2. df['AP Intf'] = (df['AP Intf'].mask(df['Description'].str.contains('AP')
  3. & df['Client VLAN'].eq(1234), df['Interface']))

  1. Switch Interface Description Client VLAN Client MAC AP Name AP Intf AP MAC
  2. 0 SW1 Gi1/0/1 AP Port 1234 1234.1234.1234 AP1 Gi1/0/1 1234.1234.1234
  3. 1 SW1 Gi1/0/1 AP Port 2344 3210.3210.3210 NaN NaN NaN
  4. 2 SW1 Gi1/0/2 AP Port 1234 3456.3456.3456 AP2 Gi1/0/2 3456.3456.3456
  5. 3 SW2 Gi2/0/5 Computer 2343 6543.6543.6543 NaN NaN NaN
  6. 4 SW2 Gi2/0/8 Trunk 1234 2345.2345.2345 AP3 NaN 2345.2345.2345
  7. 5 SW3 Te1/1/1 AP Port 1234 7890.7890.7890 AP4 Te1/1/1 7890.7890.7890
  8. 6 SW4 Gi3/0/6 AP Port 1234 2345.2345.2345 AP3 Gi3/0/6 2345.2345.2345
英文:

With merge + mask operations:

  1. df = df1.merge(df2, left_on='Client MAC', right_on='AP MAC', how='left')
  2. df['AP Intf'] = (df['AP Intf'].mask(df['Description'].str.contains('AP')
  3. & df['Client VLAN'].eq(1234), df['Interface']))

  1. Switch Interface Description Client VLAN Client MAC AP Name AP Intf AP MAC
  2. 0 SW1 Gi1/0/1 AP Port 1234 1234.1234.1234 AP1 Gi1/0/1 1234.1234.1234
  3. 1 SW1 Gi1/0/1 AP Port 2344 3210.3210.3210 NaN NaN NaN
  4. 2 SW1 Gi1/0/2 AP Port 1234 3456.3456.3456 AP2 Gi1/0/2 3456.3456.3456
  5. 3 SW2 Gi2/0/5 Computer 2343 6543.6543.6543 NaN NaN NaN
  6. 4 SW2 Gi2/0/8 Trunk 1234 2345.2345.2345 AP3 NaN 2345.2345.2345
  7. 5 SW3 Te1/1/1 AP Port 1234 7890.7890.7890 AP4 Te1/1/1 7890.7890.7890
  8. 6 SW4 Gi3/0/6 AP Port 1234 2345.2345.2345 AP3 Gi3/0/6 2345.2345.2345

huangapple
  • 本文由 发表于 2023年8月8日 23:04:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76860835.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定