2023年6月29日 18:36:03go评论74阅读模式

英文:

Update Dataframe on the basis of subset dataframe

问题

我明白了，你想要将 data1 中的 account 值根据 data2 中的 acc_id 进行更新。你已经尝试了合并操作，但得到了带有 _x 和 _y 后缀的列名。

你想要的输出如下：

    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

你可以尝试以下代码：

dfx = pd.merge(df1, df2, how='left', on='acc_id')
dfx['account_x'].fillna(dfx['account_y'], inplace=True)
dfx = dfx[['account_x', 'value_x', 'acc_id']]
dfx.columns = ['account', 'value', 'acc_id']

这将会得到你想要的输出结果。

英文:

I have two dataframes, of which one is the subset of other.
ex:

data1 = [{&#39;account&#39;: &#39;abc_test&#39;, &#39;value&#39;: 100, &#39;acc_id&#39;: &#39;11&#39;},
        {&#39;account&#39;: &#39;def_test&#39;, &#39;value&#39;: 50, &#39;acc_id&#39;: &#39;12&#39;}, 
        {&#39;account&#39;: &#39;ghi_badbad&#39;, &#39;value&#39;: 25, &#39;acc_id&#39;: &#39;13&#39;},
        {&#39;account&#39;: &#39;hso&#39;, &#39;value&#39;: 22, &#39;acc_id&#39;: &#39;14&#39;},
        {&#39;account&#39;: &#39;mko&#39;, &#39;value&#39;: 47.5, &#39;acc_id&#39;: &#39;15&#39;}]

data2 = [{&#39;account&#39;: &#39;ghi_badbad&#39;, &#39;value&#39;: 25, &#39;acc_id&#39;: &#39;13&#39;},
         {&#39;account&#39;: &#39;mko&#39;, &#39;value&#39;: 47.5, &#39;acc_id&#39;: &#39;15&#39;}]

Column acc_id is the index in both.

We need to update the value of account in data1 with respect to the values of acc_id in data2.

Options I tried: Merging

dfx = pd.merge(df1, df2, indicator=True, how=&#39;outer&#39;, on=&#39;acc_id&#39;)
dfx.loc[dfx[&#39;_merge&#39;] == &#39;both&#39;, &#39;account_x&#39;] = &#39;xxx&#39;

then I am getting output column names with _x and _y

  account_x  value_x acc_id   account_y  value_y     _merge
0  abc_test    100.0     11         NaN      NaN  left_only
1  def_test     50.0     12         NaN      NaN  left_only
2       xxx     25.0     13  ghi_badbad     25.0       both
3       hso     22.0     14         NaN      NaN  left_only
4       xxx     47.5     15         mko     47.5       both

I need output like this:

    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

答案1

得分: 0

如果只需要测试一个列，Series.isin 是您的朋友：

df1.loc[df1['acc_id'].isin(df2['acc_id']), 'account'] = 'xxx'
print(df1)
    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

如果需要测试多列匹配，最好只使用左连接并筛选用于连接的列：

dfx = pd.merge(df1, df2[['acc_id']], indicator=True, how='left', on='acc_id')

# 使用多列进行匹配
# dfx = pd.merge(df1, df2[['acc_id', 'another_id']], indicator=True, how='left', on=['acc_id', 'another_id'])

dfx.loc[dfx.pop('_merge') == 'both', 'account'] = 'xxx'
print(dfx)
    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

英文:

If need test only one column Series.isin is your friend:

df1.loc[df1[&#39;acc_id&#39;].isin(df2[&#39;acc_id&#39;]), &#39;account&#39;] = &#39;xxx&#39;
print (df1)
    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

Your solution is better, if need test match by mutiple columns - only use left join and filter columns used for join:

dfx = pd.merge(df1, df2[[&#39;acc_id&#39;]], indicator=True, how=&#39;left&#39;, on=&#39;acc_id&#39;)

#match by multiple columns
#dfx = pd.merge(df1, df2[[&#39;acc_id&#39;, &#39;another_id&#39;]], indicator=True, how=&#39;left&#39;, on=[&#39;acc_id&#39;, &#39;another_id&#39;])

dfx.loc[dfx.pop(&#39;_merge&#39;) == &#39;both&#39;, &#39;account&#39;] = &#39;xxx&#39;
print (dfx)
    account  value acc_id
0  abc_test  100.0     11
1  def_test   50.0     12
2       xxx   25.0     13
3       hso   22.0     14
4       xxx   47.5     15

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据子数据框更新数据框

问题

答案1

如何在Python的for循环中从np.array创建数据框架

如何检查 Pandas 数据框中列中具有共同组 ID 的相邻行值是否相等？

在pandas中有条件地向列表的列表中追加值。

在单个直方图中绘制两个 DataFrame.value_counts()。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论