在另一张表格中查找数据,如果找不到则复制到新文件。

huangapple go评论50阅读模式
英文:

Phyton - Lookup of a data in another sheet and copy to a new file if not found

问题

我是新手程序员,正在尝试学习。我正在比较两个具有非常相似数据的文档。我想要找出是否在一个文档的“concatenate”列中找到了另一个文档的相同列“concatenate”的数据,因为我想要找出文件最后更新时发生了什么变化。
如果找不到该值,整行数据应该被复制到一个新文档。这样我就知道这个数据集已经发生了变化。

这是我的代码:

import pandas as pd

从两个文件中加载数据到Pandas数据框中

df1 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-05.xlsx')
df2 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-04.xlsx')

从两个数据框的“concatenate”列中提取值

col_a_df1 = df1['concatenate']
col_a_df2 = df2['concatenate']

找到两个数据框的列A中值的交集

intersection = col_a_df1.isin(col_a_df2)

过滤df1中列A的值在df2中找不到的行

df1 = df1[~intersection]

将过滤后的数据写入一个新的Excel文件

df1.to_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/filtered_data.xlsx', index=False)


我刚刚复制了两个输入文件,这意味着我应该得到一个空白文档,但是文档仍然在将数据复制到新的工作表。

我做错了什么?

非常感谢您的支持!
英文:

I am new to programming and I am trying to learn. I am comparing 2 documents that have very similar data. I want to find out if data from column "concatenate" is found in the same column "contatenate" from the other document because I want to find out what changes where made during the last update of the file.
If the value cannot be found this whole row should be copied to a new document. Then I know that this dataset has been changed.

Here is the code I have:

import pandas as pd

# load the data from the two files into Pandas dataframes
df1 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-05.xlsx')
df2 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-04.xlsx')

# extract the values from column Concatenate in both dataframes
col_a_df1 = df1['concatenate']
col_a_df2 = df2['concatenate']

# find the intersection of the values in column A of both dataframes
intersection = col_a_df1.isin(col_a_df2)

# filter the rows of df1 where the value in column A is not found in df2
df1 = df1[intersection]

# write the filtered data to a new Excel file
df1.to_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/filtered_data.xlsx', index=False)

I just duplicated the 2 inputfiles which means I should receive a blank document but the document is still copying data to the new sheet.

What did I do wrong?

Many thanks for your support!

答案1

得分: 0

如果找不到该值,整行应复制到新文档。

IIUC,您需要(~),NOT运算符,以否定您的布尔掩码:

df1 = df1[~intersection]
英文:

> If the value cannot be found, this whole row should be copied to a new
> document.

IIUC, you need (~), the NOT operator, to negate your boolean mask :

df1 = df1[~intersection]

huangapple
  • 本文由 发表于 2023年2月6日 07:29:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356234.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定