英文:
Adding new column to merged DataFrame based on pre-merged DataFrames
问题
我有两个DataFrame,df1和df2。在我的代码中,我使用Pandas.concat方法来查找它们之间的差异。
df1 = pd.read_excel(latest_file, 0)
df2 = pd.read_excel(latest_file, 1)
读取电子表格中的第一个和第二个工作表。
new_dataframe = pd.concat([df1, df2]).drop_duplicates(keep=False)
这个方法运行得很好,但是我想知道哪些行来自df1,哪些来自df2。为了显示这一点,我想在new_dataframe中添加一列,如果它来自df1,则在新列中写入"Removed",如果来自df2,则写入"Added"。我似乎找不到如何做到这一点的文档。在此先提前感谢任何帮助。
编辑:在我的当前代码中,它删除了每个DataFrame中相同的行。解决方案仍然需要删除共同的行。
<details>
<summary>英文:</summary>
I have two DataFrames, df1 and df2. In my code I used Pandas.concat method to find the differences between them.
df1 = pd.read_excel(latest_file, 0)
df2 = pd.read_excel(latest_file, 1)
#Reads first and second sheet inside spreadsheet.
new_dataframe = pd.concat([df1,df2]).drop_duplicates(keep=False)
This works perfectly, however I want to know which rows are coming from df1, and which are coming from df2. to show this I want to add a column to new_dataframe, if it's from df1 to say "Removed" in the new column, and to say 'Added' if it's from df2. I can't seem to find any documentation on how to do this. Thanks in advance for any help.
Edit: In my current code it removed all columns which are identical in each DataFrame. The solution has to still remove the common rows.
</details>
# 答案1
**得分**: 1
考虑使用 `pd.merge` 并将 `indicator=True` 一同使用。这将创建一个名为 `_merge` 的新列,指示了值来自哪一列。您可以将其修改为表示 "Removed" 和 "Added"。
```python
df1 = pd.DataFrame({'col1': [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({'col1': [3, 4, 5, 6, 7})
m = {'left_only': 'Removed', 'right_only': 'Added'}
new_dataframe = pd.merge(df1, df2, how='outer', indicator=True) \
.query('_merge != "both"') \
.replace({'_merge': m})
输出结果:
col1 _merge
0 1 Removed
1 2 Removed
5 6 Added
6 7 Added
英文:
Consider using pd.merge
with indicator=True
instead. This will create a new column named _merge
that indicates which value came from which column. You can modify this to say Removed
and Added
df1 = pd.DataFrame({'col1': [1,2,3,4,5]})
df2 = pd.DataFrame({'col1': [3,4,5,6,7]})
m = {'left_only': 'Removed', 'right_only': 'Added'}
new_dataframe = pd.merge(df1, df2, how='outer', indicator=True) \
.query('_merge != "both"') \
.replace({'_merge': m})
Output:
col1 _merge
0 1 Removed
1 2 Removed
5 6 Added
6 7 Added
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论