pandas:顺序合并会添加新列而不是替换NaN值。

huangapple go评论59阅读模式
英文:

pandas: sequential merge adds new columns instead of replacing NaN values

问题

你可以使用suffixes参数来指定合并时重名列的后缀。这样,你可以避免生成*_x和*_y这样的重复列名。以下是修改后的代码:

df_all = df_all.merge(df_additional_info1, how="left", on="uid", suffixes=("", "_additional1"))
df_all = df_all.merge(df_additional_info2, how="outer", on="uid", suffixes=("", "_additional2"))

这将给你所需的结果,使用空字符串作为后缀来避免生成重复的列名。

英文:
import pandas as pd

df_all = pd.DataFrame(columns=["uid", "a", "b"], data=[["uid1", 12, 15],
                                                    ["uid2", 13, 16],
                                                    ["uid3", 14, 17],
                                                    ["uid4", 15, 18]])

df_additional_info1 = pd.DataFrame(columns=["uid", "c", "d"], data=[["uid1", 12, 15],
                                                                   ["uid3", 14, 17]])

df_additional_info2 = pd.DataFrame(columns=["uid", "c", "d"], data=[["uid2", 12, 15]])

I need to merge df_all twice with additional information. First with df_additional_info1, then with df_additional_info2 and so on. They will always contain additional info for already existing rows and only for those rows that were not yet updated.

When I do following:

df_all = df_all.merge(df_additional_info1, how="left", on="uid")
df_all = df_all.merge(df_additional_info2, how="outer", on="uid")

I get duplicated columns (*_x, *_y):

pandas:顺序合并会添加新列而不是替换NaN值。

But I need this
pandas:顺序合并会添加新列而不是替换NaN值。

Any suggestion?

答案1

得分: 1

另一种可能的解决方案:

(pd.concat([df_all.set_index('uid'), df_additional_info1.set_index('uid'),
            df_additional_info2.set_index('uid')])
 .stack().unstack().reset_index())

或者,

(df_additional_info2.set_index('uid')
 .combine_first(
     pd.concat([df_all.set_index('uid'), 
                df_additional_info1.set_index('uid')], axis=1))
 .reset_index())

解释:

该代码首先将 uid 设置为所有数据帧的索引,便于合并操作。它使用 pd.concat() 沿着列轴连接 df_alldf_additional_info1,将 df_additional_info1 的列附加到 df_all,创建一个包含两者所有列的数据帧,对于 df_additional_info1 中缺少的 uid,用 NaN 填充。然后,combine_first() 用连接后的数据帧中的值替换 df_additional_info2 中的任何空值,实质上是使用 df_alldf_additional_info1 的数据填充 df_additional_info2 中的空白,基于 uid。最后一步,reset_index()uid 移回普通列并重新建立默认整数索引,生成一个合并的数据帧,包含所有可用的附加信息。


输出:

    uid   a   b     c     d
0  uid1  12  15  12.0  15.0
1  uid2  13  16  12.0  15.0
2  uid3  14  17  14.0  17.0
3  uid4  15  18   NaN   NaN
英文:

Another possible solution:

(pd.concat([df_all.set_index('uid'), df_additional_info1.set_index('uid'),
            df_additional_info2.set_index('uid')])
 .stack().unstack().reset_index())

Alternatively,

(df_additional_info2.set_index('uid')
 .combine_first(
     pd.concat([df_all.set_index('uid'), 
                df_additional_info1.set_index('uid')], axis=1))
 .reset_index())

EXPLANATION:

The code first sets uid as the index for all DataFrames, facilitating the merging operations. It concatenates df_all and df_additional_info1 along the column axis using pd.concat(), appending df_additional_info1's columns to df_all and creating a DataFrame containing all columns from both, filling with NaN for uids absent in df_additional_info1. combine_first() then replaces any null values in df_additional_info2 with values from the concatenated DataFrame, essentially filling in df_additional_info2's gaps with data from df_all and df_additional_info1, based on uid. The final step, reset_index(), moves uid back to a regular column and reestablishes the default integer index, producing a merged DataFrame with all available additional information.


Output:

    uid   a   b     c     d
0  uid1  12  15  12.0  15.0
1  uid2  13  16  12.0  15.0
2  uid3  14  17  14.0  17.0
3  uid4  15  18   NaN   NaN

答案2

得分: 0

你可以使用.concat连接额外的信息帧:

df_all.merge(
   pd.concat([df_additional_info1, df_additional_info2]),
   how="left"
)
    uid   a   b     c     d
0  uid1  12  15  12.0  15.0
1  uid2  13  16  12.0  15.0
2  uid3  14  17  14.0  17.0
3  uid4  15  18   NaN   NaN
英文:

You can .concat the additional info frames:

df_all.merge(
   pd.concat([df_additional_info1, df_additional_info2]),
   how = "left"
)
    uid   a   b     c     d
0  uid1  12  15  12.0  15.0
1  uid2  13  16  12.0  15.0
2  uid3  14  17  14.0  17.0
3  uid4  15  18   NaN   NaN

huangapple
  • 本文由 发表于 2023年6月8日 15:49:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429688.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定