英文:
Update Dataframe on the basis of subset dataframe
问题
我明白了,你想要将 data1
中的 account
值根据 data2
中的 acc_id
进行更新。你已经尝试了合并操作,但得到了带有 _x
和 _y
后缀的列名。
你想要的输出如下:
account value acc_id
0 abc_test 100.0 11
1 def_test 50.0 12
2 xxx 25.0 13
3 hso 22.0 14
4 xxx 47.5 15
你可以尝试以下代码:
dfx = pd.merge(df1, df2, how='left', on='acc_id')
dfx['account_x'].fillna(dfx['account_y'], inplace=True)
dfx = dfx[['account_x', 'value_x', 'acc_id']]
dfx.columns = ['account', 'value', 'acc_id']
这将会得到你想要的输出结果。
英文:
I have two dataframes, of which one is the subset of other.
ex:
data1 = [{'account': 'abc_test', 'value': 100, 'acc_id': '11'},
{'account': 'def_test', 'value': 50, 'acc_id': '12'},
{'account': 'ghi_badbad', 'value': 25, 'acc_id': '13'},
{'account': 'hso', 'value': 22, 'acc_id': '14'},
{'account': 'mko', 'value': 47.5, 'acc_id': '15'}]
data2 = [{'account': 'ghi_badbad', 'value': 25, 'acc_id': '13'},
{'account': 'mko', 'value': 47.5, 'acc_id': '15'}]
Column acc_id is the index in both.
We need to update the value of account in data1 with respect to the values of acc_id in data2.
Options I tried: Merging
dfx = pd.merge(df1, df2, indicator=True, how='outer', on='acc_id')
dfx.loc[dfx['_merge'] == 'both', 'account_x'] = 'xxx'
then I am getting output column names with _x and _y
account_x value_x acc_id account_y value_y _merge
0 abc_test 100.0 11 NaN NaN left_only
1 def_test 50.0 12 NaN NaN left_only
2 xxx 25.0 13 ghi_badbad 25.0 both
3 hso 22.0 14 NaN NaN left_only
4 xxx 47.5 15 mko 47.5 both
I need output like this:
account value acc_id
0 abc_test 100.0 11
1 def_test 50.0 12
2 xxx 25.0 13
3 hso 22.0 14
4 xxx 47.5 15
答案1
得分: 0
如果只需要测试一个列,Series.isin
是您的朋友:
df1.loc[df1['acc_id'].isin(df2['acc_id']), 'account'] = 'xxx'
print(df1)
account value acc_id
0 abc_test 100.0 11
1 def_test 50.0 12
2 xxx 25.0 13
3 hso 22.0 14
4 xxx 47.5 15
如果需要测试多列匹配,最好只使用左连接并筛选用于连接的列:
dfx = pd.merge(df1, df2[['acc_id']], indicator=True, how='left', on='acc_id')
# 使用多列进行匹配
# dfx = pd.merge(df1, df2[['acc_id', 'another_id']], indicator=True, how='left', on=['acc_id', 'another_id'])
dfx.loc[dfx.pop('_merge') == 'both', 'account'] = 'xxx'
print(dfx)
account value acc_id
0 abc_test 100.0 11
1 def_test 50.0 12
2 xxx 25.0 13
3 hso 22.0 14
4 xxx 47.5 15
英文:
If need test only one column Series.isin
is your friend:
df1.loc[df1['acc_id'].isin(df2['acc_id']), 'account'] = 'xxx'
print (df1)
account value acc_id
0 abc_test 100.0 11
1 def_test 50.0 12
2 xxx 25.0 13
3 hso 22.0 14
4 xxx 47.5 15
Your solution is better, if need test match by mutiple columns - only use left join and filter columns used for join:
dfx = pd.merge(df1, df2[['acc_id']], indicator=True, how='left', on='acc_id')
#match by multiple columns
#dfx = pd.merge(df1, df2[['acc_id', 'another_id']], indicator=True, how='left', on=['acc_id', 'another_id'])
dfx.loc[dfx.pop('_merge') == 'both', 'account'] = 'xxx'
print (dfx)
account value acc_id
0 abc_test 100.0 11
1 def_test 50.0 12
2 xxx 25.0 13
3 hso 22.0 14
4 xxx 47.5 15
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论