在合并后的数据帧中添加新列,基于预先合并的数据帧。

huangapple go评论87阅读模式
英文:

Adding new column to merged DataFrame based on pre-merged DataFrames

问题

  1. 我有两个DataFramedf1df2在我的代码中我使用Pandas.concat方法来查找它们之间的差异

df1 = pd.read_excel(latest_file, 0)
df2 = pd.read_excel(latest_file, 1)

读取电子表格中的第一个和第二个工作表。

new_dataframe = pd.concat([df1, df2]).drop_duplicates(keep=False)

  1. 这个方法运行得很好,但是我想知道哪些行来自df1,哪些来自df2。为了显示这一点,我想在new_dataframe中添加一列,如果它来自df1,则在新列中写入"Removed",如果来自df2,则写入"Added"。我似乎找不到如何做到这一点的文档。在此先提前感谢任何帮助。
  2. 编辑:在我的当前代码中,它删除了每个DataFrame中相同的行。解决方案仍然需要删除共同的行。
  3. <details>
  4. <summary>英文:</summary>
  5. I have two DataFrames, df1 and df2. In my code I used Pandas.concat method to find the differences between them.

df1 = pd.read_excel(latest_file, 0)
df2 = pd.read_excel(latest_file, 1)
#Reads first and second sheet inside spreadsheet.

new_dataframe = pd.concat([df1,df2]).drop_duplicates(keep=False)

  1. This works perfectly, however I want to know which rows are coming from df1, and which are coming from df2. to show this I want to add a column to new_dataframe, if it&#39;s from df1 to say &quot;Removed&quot; in the new column, and to say &#39;Added&#39; if it&#39;s from df2. I can&#39;t seem to find any documentation on how to do this. Thanks in advance for any help.
  2. Edit: In my current code it removed all columns which are identical in each DataFrame. The solution has to still remove the common rows.
  3. </details>
  4. # 答案1
  5. **得分**: 1
  6. 考虑使用 `pd.merge` 并将 `indicator=True` 一同使用。这将创建一个名为 `_merge` 的新列,指示了值来自哪一列。您可以将其修改为表示 "Removed" "Added"
  7. ```python
  8. df1 = pd.DataFrame({'col1': [1, 2, 3, 4, 5]})
  9. df2 = pd.DataFrame({'col1': [3, 4, 5, 6, 7})
  10. m = {'left_only': 'Removed', 'right_only': 'Added'}
  11. new_dataframe = pd.merge(df1, df2, how='outer', indicator=True) \
  12. .query('_merge != "both"') \
  13. .replace({'_merge': m})

输出结果:

  1. col1 _merge
  2. 0 1 Removed
  3. 1 2 Removed
  4. 5 6 Added
  5. 6 7 Added
英文:

Consider using pd.merge with indicator=True instead. This will create a new column named _merge that indicates which value came from which column. You can modify this to say Removed and Added

  1. df1 = pd.DataFrame({&#39;col1&#39;: [1,2,3,4,5]})
  2. df2 = pd.DataFrame({&#39;col1&#39;: [3,4,5,6,7]})
  3. m = {&#39;left_only&#39;: &#39;Removed&#39;, &#39;right_only&#39;: &#39;Added&#39;}
  4. new_dataframe = pd.merge(df1, df2, how=&#39;outer&#39;, indicator=True) \
  5. .query(&#39;_merge != &quot;both&quot;&#39;) \
  6. .replace({&#39;_merge&#39;: m})

Output:

  1. col1 _merge
  2. 0 1 Removed
  3. 1 2 Removed
  4. 5 6 Added
  5. 6 7 Added

huangapple
  • 本文由 发表于 2023年2月13日 23:30:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75437965.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定