更新一个Pandas数据框从另一个数据框,并在需要时附加行。

huangapple go评论68阅读模式
英文:

Update one Pandas dataframe from another and append rows if needed

问题

以下是翻译好的内容:

我在Pandas中有以下的数据框:

df1:
索引   列
1      A1
2      A2

df2:
索引   列
2      A2_new
3      A3

我想要获得如下结果:

索引   列
1      A1
2      A2_new
3      A3

我该如何实现这个目标?

df1.update(df2)不够有用,因为我想在结果中看到索引为3的行。

英文:

I have the following dataframes in Pandas:

df1:
index  column
1         A1
2         A2

df2:
index  column
2         A2_new
3         A3

I want to get the result:

index  column
1         A1
2         A2_new
3         A3

How do I can achieve this?

df1.update(df2) is not helpful, because I want to see row with index 3 in the result.

答案1

得分: 1

df1

    column
1	A1
2	A2

df2

	column
2	A2_new
3	A3

Code

df2.combine_first(df1)

output

    column
1	A1
2	A2_new
3	A3
英文:

Example

df1 = pd.DataFrame(['A1', 'A2'], columns=['column'], index=[1, 2])
df2 = pd.DataFrame(['A2_new', 'A3'], columns=['column'], index=[2, 3])

df1

    column
1	A1
2	A2

df2

	column
2	A2_new
3	A3

Code

df2.combine_first(df1)

output

    column
1	A1
2	A2_new
3	A3

答案2

得分: 0

Sure, here is the translated code:

@Ars ML
您可以垂直连接这两个DataFrame并从'index'列中删除重复项仅保留每个索引值的最后一次出现

df1 = pd.DataFrame({'index': [1, 2], 'column': ['A1', 'A2']})
df2 = pd.DataFrame({'index': [2, 3], 'column': ['A2_new', 'A3']})

merged_df = pd.concat([df1, df2]).drop_duplicates(subset=['index'], keep='last')
merged_df.set_index('index', inplace=True)

输出如您所期望的那样。

1          A1
2      A2_new
3          A3

您还可以使用merge,它更为复杂,但可以产生您期望的结果。

merge_chain = pd.merge(df1, df2, on='index', how='outer') \
                .assign(column=lambda x: x['column_y'].fillna(x['column_x'])) \
                .drop(['column_x', 'column_y'], axis=1) \
                .set_index('index')

希望这对您有帮助。

英文:

@Ars ML
You can concatenate the two DataFrames vertically and remove duplicates from 'index' column, keeping only the last occurrence of each index value

df1 = pd.DataFrame({'index': [1, 2], 'column': ['A1', 'A2']})
df2 = pd.DataFrame({'index': [2, 3], 'column': ['A2_new', 'A3']})

merged_df = pd.concat([df1, df2]).drop_duplicates(subset=['index'], keep='last')
merged_df.set_index('index', inplace=True)

outputs as per your desired outcome.

1          A1
2      A2_new
3          A3

You can also use merge, it is more involved but produces your desired outcome.

merge_chain = pd.merge(df1, df2, on='index', how='outer') \
                .assign(column=lambda x: x['column_y'].fillna(x['column_x'])) \
                .drop(['column_x', 'column_y'], axis=1) \
                .set_index('index')

答案3

得分: 0

另一个可能的解决方案:

out = pd.concat([df1, df2])
out[~out.index.duplicated(keep='last')]

输出:

     column
1        A1
2    A2_new
3        A3
英文:

Another possible solution:

out = pd.concat([df1, df2])
out[~out.index.duplicated(keep='last')]

Output:

   column
1      A1
2  A2_new
3      A3

huangapple
  • 本文由 发表于 2023年5月7日 17:46:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76193161.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定