英文:
Merge two dataframes where requirements are met
问题
我有两个数据框,我想在满足特定条件的情况下将它们合并。
我想将DF2合并到DF1中,其中客户端MAC和AP MAC匹配,但同时"Description"包含"AP",且"Client VLAN"等于"1234"。然后,我想用相应的"Interface"填充"AP Interface"。
我的实际DF1有超过10,000行,而DF2通常只有300行或更少。
DF1:
Switch | Interface | Description | Client VLAN | Client MAC |
---|---|---|---|---|
SW1 | Gi1/0/1 | AP Port | 1234 | 1234.1234.1234 |
SW1 | Gi1/0/1 | AP Port | 2344 | 3210.3210.3210 |
SW1 | Gi1/0/2 | AP Port | 1234 | 3456.3456.3456 |
SW2 | Gi2/0/5 | Computer | 2343 | 6543.6543.6543 |
SW2 | Gi2/0/8 | Trunk | 1234 | 2345.2345.2345 |
SW3 | Te1/1/1 | AP Port | 1234 | 7890.7890.7890 |
SW4 | Gi3/0/6 | AP Port | 1234 | 2345.2345.2345 |
DF2:
AP Name | AP Intf | AP MAC |
---|---|---|
AP1 | 1234.1234.1234 | |
AP2 | 3456.3456.3456 | |
AP3 | 2345.2345.2345 | |
AP4 | 7890.7890.7890 |
期望的输出:
Switch | Interface | Description | Client VLAN | Client MAC | AP Name | AP Intf | AP MAC |
---|---|---|---|---|---|---|---|
SW1 | Gi1/0/1 | AP Port | 1234 | 1234.1234.1234 | AP1 | Gi1/0/1 | 1234.1234.1234 |
SW1 | Gi1/0/1 | AP Port | 2344 | 3210.3210.3210 | |||
SW1 | Gi1/0/2 | AP Port | 1234 | 3456.3456.3456 | AP2 | Gi1/0/2 | 3456.3456.3456 |
SW2 | Gi2/0/5 | Computer | 2343 | 6543.6543.6543 | |||
SW2 | Gi2/0/8 | Trunk | 1234 | 2345.2345.2345 | |||
SW3 | Te1/1/1 | AP Port | 1234 | 7890.7890.7890 | AP4 | Te1/1/1 | 7890.7890.7890 |
SW4 | Gi3/0/6 | AP Port | 1234 | 2345.2345.2345 | AP3 | Gi3/0/6 | 2345.2345.2345 |
我正在使用以下代码,但输出结果不正确:
def helper(data, col): return data.groupby(col).cumcount()
df1['Client MAC'] = df1['Client MAC'].fillna("")
df2['AP MAC'] = df2['AP MAC'].fillna("")
df = df1.merge(df2, left_on=['Client MAC', helper(df1, ['Client MAC'])], right_on=['AP MAC', helper(df2, \
['AP MAC'])], how='left').drop(columns='key_1')
当前输出:
Switch | Interface | Description | Client VLAN | Client MAC | AP Name | AP Intf | AP MAC |
---|---|---|---|---|---|---|---|
SW1 | Gi1/0/1 | AP Port | 1234 | 1234.1234.1234 | AP1 | Gi1/0/1 | 1234.1234.1234 |
SW1 | Gi1/0/1 | AP Port | 2344 | 3210.3210.3210 | |||
SW1 | Gi1/0/2 | AP Port | 1234 | 3456.3456.3456 | AP2 | Gi1/0/2 | 3456.3456.3456 |
SW2 | Gi2/0/5 | Computer | 2343 | 6543.6543.6543 | |||
SW2 | Gi2/0/8 | Trunk | 1234 | 2345.2345.2345 | AP3 | Gi3/0/6 | 2345.2345.2345 |
SW3 | Te1/1/1 | AP Port | 1234 | 7890.7890.7890 | AP4 | Te1/1/1 | 7890.7890.7890 |
SW4 | Gi3/0/6 | AP Port | 1234 | 2345.2345.2345 |
英文:
I have two dataframes that I am trying to merge where certain criteria is met.
I want to merge DF2 into DF1 where the Client MAC and AP MAC match but also where the 'Description' contains 'AP' and the 'Client VLAN' is equal to '1234'. I then want to populate the 'AP Interface' with the corresponding 'Interface'.
My actual DF1 is over 10k rows and DF2 is typically 300 or less.
DF1:
Switch | Interface | Description | Client VLAN | Client MAC |
---|---|---|---|---|
SW1 | Gi1/0/1 | AP Port | 1234 | 1234.1234.1234 |
SW1 | Gi1/0/1 | AP Port | 2344 | 3210.3210.3210 |
SW1 | Gi1/0/2 | AP Port | 1234 | 3456.3456.3456 |
SW2 | Gi2/0/5 | Computer | 2343 | 6543.6543.6543 |
SW2 | Gi2/0/8 | Trunk | 1234 | 2345.2345.2345 |
SW3 | Te1/1/1 | AP Port | 1234 | 7890.7890.7890 |
SW4 | Gi3/0/6 | AP Port | 1234 | 2345.2345.2345 |
DF2:
AP Name | AP Intf | AP MAC |
---|---|---|
AP1 | 1234.1234.1234 | |
AP2 | 3456.3456.3456 | |
AP3 | 2345.2345.2345 | |
AP4 | 7890.7890.7890 |
Desired output:
Switch | Interface | Description | Client VLAN | Client MAC | AP Name | AP Intf | AP MAC |
---|---|---|---|---|---|---|---|
SW1 | Gi1/0/1 | AP Port | 1234 | 1234.1234.1234 | AP1 | Gi1/0/1 | 1234.1234.1234 |
SW1 | Gi1/0/1 | AP Port | 2344 | 3210.3210.3210 | |||
SW1 | Gi1/0/2 | AP Port | 1234 | 3456.3456.3456 | AP2 | Gi1/0/2 | 3456.3456.3456 |
SW2 | Gi2/0/5 | Computer | 2343 | 6543.6543.6543 | |||
SW2 | Gi2/0/8 | Trunk | 1234 | 2345.2345.2345 | |||
SW3 | Te1/1/1 | AP Port | 1234 | 7890.7890.7890 | AP4 | Te1/1/1 | 7890.7890.7890 |
SW4 | Gi3/0/6 | AP Port | 1234 | 2345.2345.2345 | AP3 | Gi3/0/6 | 2345.2345.2345 |
The code I'm using that is giving me the below output:
def helper(data, col): return data.groupby(col).cumcount()
df1['Client MAC'] = df1['Client MAC'].fillna("")
df2['AP MAC'] = df2['AP MAC'].fillna("")
df = df1.merge(df2, left_on=['Client MAC', helper(df1, ['Client MAC'])], right_on=['AP MAC', helper(df2, \
['AP MAC'])], how='left').drop(columns='key_1')
Current output:
Switch | Interface | Description | Client VLAN | Client MAC | AP Name | AP Intf | AP MAC |
---|---|---|---|---|---|---|---|
SW1 | Gi1/0/1 | AP Port | 1234 | 1234.1234.1234 | AP1 | Gi1/0/1 | 1234.1234.1234 |
SW1 | Gi1/0/1 | AP Port | 2344 | 3210.3210.3210 | |||
SW1 | Gi1/0/2 | AP Port | 1234 | 3456.3456.3456 | AP2 | Gi1/0/2 | 3456.3456.3456 |
SW2 | Gi2/0/5 | Computer | 2343 | 6543.6543.6543 | |||
SW2 | Gi2/0/8 | Trunk | 1234 | 2345.2345.2345 | AP3 | Gi3/0/6 | 2345.2345.2345 |
SW3 | Te1/1/1 | AP Port | 1234 | 7890.7890.7890 | AP4 | Te1/1/1 | 7890.7890.7890 |
SW4 | Gi3/0/6 | AP Port | 1234 | 2345.2345.2345 |
答案1
得分: 3
使用merge
+ mask
操作:
df = df1.merge(df2, left_on='Client MAC', right_on='AP MAC', how='left')
df['AP Intf'] = (df['AP Intf'].mask(df['Description'].str.contains('AP')
& df['Client VLAN'].eq(1234), df['Interface']))
Switch Interface Description Client VLAN Client MAC AP Name AP Intf AP MAC
0 SW1 Gi1/0/1 AP Port 1234 1234.1234.1234 AP1 Gi1/0/1 1234.1234.1234
1 SW1 Gi1/0/1 AP Port 2344 3210.3210.3210 NaN NaN NaN
2 SW1 Gi1/0/2 AP Port 1234 3456.3456.3456 AP2 Gi1/0/2 3456.3456.3456
3 SW2 Gi2/0/5 Computer 2343 6543.6543.6543 NaN NaN NaN
4 SW2 Gi2/0/8 Trunk 1234 2345.2345.2345 AP3 NaN 2345.2345.2345
5 SW3 Te1/1/1 AP Port 1234 7890.7890.7890 AP4 Te1/1/1 7890.7890.7890
6 SW4 Gi3/0/6 AP Port 1234 2345.2345.2345 AP3 Gi3/0/6 2345.2345.2345
英文:
With merge
+ mask
operations:
df = df1.merge(df2, left_on='Client MAC', right_on='AP MAC', how='left')
df['AP Intf'] = (df['AP Intf'].mask(df['Description'].str.contains('AP')
& df['Client VLAN'].eq(1234), df['Interface']))
Switch Interface Description Client VLAN Client MAC AP Name AP Intf AP MAC
0 SW1 Gi1/0/1 AP Port 1234 1234.1234.1234 AP1 Gi1/0/1 1234.1234.1234
1 SW1 Gi1/0/1 AP Port 2344 3210.3210.3210 NaN NaN NaN
2 SW1 Gi1/0/2 AP Port 1234 3456.3456.3456 AP2 Gi1/0/2 3456.3456.3456
3 SW2 Gi2/0/5 Computer 2343 6543.6543.6543 NaN NaN NaN
4 SW2 Gi2/0/8 Trunk 1234 2345.2345.2345 AP3 NaN 2345.2345.2345
5 SW3 Te1/1/1 AP Port 1234 7890.7890.7890 AP4 Te1/1/1 7890.7890.7890
6 SW4 Gi3/0/6 AP Port 1234 2345.2345.2345 AP3 Gi3/0/6 2345.2345.2345
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论