在合并后的数据帧中添加新列,基于预先合并的数据帧。

huangapple go评论51阅读模式
英文:

Adding new column to merged DataFrame based on pre-merged DataFrames

问题

我有两个DataFramedf1和df2在我的代码中我使用Pandas.concat方法来查找它们之间的差异

df1 = pd.read_excel(latest_file, 0)
df2 = pd.read_excel(latest_file, 1)

读取电子表格中的第一个和第二个工作表。

new_dataframe = pd.concat([df1, df2]).drop_duplicates(keep=False)


这个方法运行得很好,但是我想知道哪些行来自df1,哪些来自df2。为了显示这一点,我想在new_dataframe中添加一列,如果它来自df1,则在新列中写入"Removed",如果来自df2,则写入"Added"。我似乎找不到如何做到这一点的文档。在此先提前感谢任何帮助。

编辑:在我的当前代码中,它删除了每个DataFrame中相同的行。解决方案仍然需要删除共同的行。

<details>
<summary>英文:</summary>

I have two DataFrames, df1 and df2. In my code I used Pandas.concat method to find the differences between them. 

df1 = pd.read_excel(latest_file, 0)
df2 = pd.read_excel(latest_file, 1)
#Reads first and second sheet inside spreadsheet.

new_dataframe = pd.concat([df1,df2]).drop_duplicates(keep=False)


This works perfectly, however I want to know which rows are coming from df1, and which are coming from df2. to show this I want to add a column to new_dataframe, if it&#39;s from df1 to say &quot;Removed&quot; in the new column, and to say &#39;Added&#39; if it&#39;s from df2. I can&#39;t seem to find any documentation on how to do this. Thanks in advance for any help.

Edit: In my current code it removed all columns which are identical in each DataFrame. The solution has to still remove the common rows.

</details>


# 答案1
**得分**: 1

考虑使用 `pd.merge` 并将 `indicator=True` 一同使用。这将创建一个名为 `_merge` 的新列,指示了值来自哪一列。您可以将其修改为表示 "Removed" 和 "Added"。

```python
df1 = pd.DataFrame({'col1': [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({'col1': [3, 4, 5, 6, 7})

m = {'left_only': 'Removed', 'right_only': 'Added'}

new_dataframe = pd.merge(df1, df2, how='outer', indicator=True) \
                  .query('_merge != "both"') \
                  .replace({'_merge': m})

输出结果:

   col1   _merge
0     1  Removed
1     2  Removed
5     6    Added
6     7    Added
英文:

Consider using pd.merge with indicator=True instead. This will create a new column named _merge that indicates which value came from which column. You can modify this to say Removed and Added

df1 = pd.DataFrame({&#39;col1&#39;: [1,2,3,4,5]})
df2 = pd.DataFrame({&#39;col1&#39;: [3,4,5,6,7]})

m = {&#39;left_only&#39;: &#39;Removed&#39;, &#39;right_only&#39;: &#39;Added&#39;}

new_dataframe = pd.merge(df1, df2, how=&#39;outer&#39;, indicator=True) \
                  .query(&#39;_merge != &quot;both&quot;&#39;) \ 
                  .replace({&#39;_merge&#39;: m})

Output:

   col1   _merge
0     1  Removed
1     2  Removed
5     6    Added
6     7    Added

huangapple
  • 本文由 发表于 2023年2月13日 23:30:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75437965.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定