根据子数据框更新数据框

huangapple go评论74阅读模式
英文:

Update Dataframe on the basis of subset dataframe

问题

我明白了,你想要将 data1 中的 account 值根据 data2 中的 acc_id 进行更新。你已经尝试了合并操作,但得到了带有 _x_y 后缀的列名。

你想要的输出如下:

    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

你可以尝试以下代码:

dfx = pd.merge(df1, df2, how='left', on='acc_id')
dfx['account_x'].fillna(dfx['account_y'], inplace=True)
dfx = dfx[['account_x', 'value_x', 'acc_id']]
dfx.columns = ['account', 'value', 'acc_id']

这将会得到你想要的输出结果。

英文:

I have two dataframes, of which one is the subset of other.
ex:

data1 = [{'account': 'abc_test', 'value': 100, 'acc_id': '11'},
        {'account': 'def_test', 'value': 50, 'acc_id': '12'}, 
        {'account': 'ghi_badbad', 'value': 25, 'acc_id': '13'},
        {'account': 'hso', 'value': 22, 'acc_id': '14'},
        {'account': 'mko', 'value': 47.5, 'acc_id': '15'}]

data2 = [{'account': 'ghi_badbad', 'value': 25, 'acc_id': '13'},
         {'account': 'mko', 'value': 47.5, 'acc_id': '15'}]

Column acc_id is the index in both.

We need to update the value of account in data1 with respect to the values of acc_id in data2.

Options I tried: Merging

dfx = pd.merge(df1, df2, indicator=True, how='outer', on='acc_id')
dfx.loc[dfx['_merge'] == 'both', 'account_x'] = 'xxx'

then I am getting output column names with _x and _y

  account_x  value_x acc_id   account_y  value_y     _merge
0  abc_test    100.0     11         NaN      NaN  left_only
1  def_test     50.0     12         NaN      NaN  left_only
2       xxx     25.0     13  ghi_badbad     25.0       both
3       hso     22.0     14         NaN      NaN  left_only
4       xxx     47.5     15         mko     47.5       both

I need output like this:

    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

答案1

得分: 0

如果只需要测试一个列,Series.isin 是您的朋友:

df1.loc[df1['acc_id'].isin(df2['acc_id']), 'account'] = 'xxx'
print(df1)
    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

如果需要测试多列匹配,最好只使用左连接并筛选用于连接的列:

dfx = pd.merge(df1, df2[['acc_id']], indicator=True, how='left', on='acc_id')

# 使用多列进行匹配
# dfx = pd.merge(df1, df2[['acc_id', 'another_id']], indicator=True, how='left', on=['acc_id', 'another_id'])

dfx.loc[dfx.pop('_merge') == 'both', 'account'] = 'xxx'
print(dfx)
    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15
英文:

If need test only one column Series.isin is your friend:

df1.loc[df1['acc_id'].isin(df2['acc_id']), 'account'] = 'xxx'
print (df1)
    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

Your solution is better, if need test match by mutiple columns - only use left join and filter columns used for join:

dfx = pd.merge(df1, df2[['acc_id']], indicator=True, how='left', on='acc_id')

#match by multiple columns
#dfx = pd.merge(df1, df2[['acc_id', 'another_id']], indicator=True, how='left', on=['acc_id', 'another_id'])

dfx.loc[dfx.pop('_merge') == 'both', 'account'] = 'xxx'
print (dfx)
    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

huangapple
  • 本文由 发表于 2023年6月29日 18:36:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76580233.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定