英文:
Phyton - Lookup of a data in another sheet and copy to a new file if not found
问题
我是新手程序员,正在尝试学习。我正在比较两个具有非常相似数据的文档。我想要找出是否在一个文档的“concatenate”列中找到了另一个文档的相同列“concatenate”的数据,因为我想要找出文件最后更新时发生了什么变化。
如果找不到该值,整行数据应该被复制到一个新文档。这样我就知道这个数据集已经发生了变化。
这是我的代码:
import pandas as pd
从两个文件中加载数据到Pandas数据框中
df1 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-05.xlsx')
df2 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-04.xlsx')
从两个数据框的“concatenate”列中提取值
col_a_df1 = df1['concatenate']
col_a_df2 = df2['concatenate']
找到两个数据框的列A中值的交集
intersection = col_a_df1.isin(col_a_df2)
过滤df1中列A的值在df2中找不到的行
df1 = df1[~intersection]
将过滤后的数据写入一个新的Excel文件
df1.to_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/filtered_data.xlsx', index=False)
我刚刚复制了两个输入文件,这意味着我应该得到一个空白文档,但是文档仍然在将数据复制到新的工作表。
我做错了什么?
非常感谢您的支持!
英文:
I am new to programming and I am trying to learn. I am comparing 2 documents that have very similar data. I want to find out if data from column "concatenate" is found in the same column "contatenate" from the other document because I want to find out what changes where made during the last update of the file.
If the value cannot be found this whole row should be copied to a new document. Then I know that this dataset has been changed.
Here is the code I have:
import pandas as pd
# load the data from the two files into Pandas dataframes
df1 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-05.xlsx')
df2 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-04.xlsx')
# extract the values from column Concatenate in both dataframes
col_a_df1 = df1['concatenate']
col_a_df2 = df2['concatenate']
# find the intersection of the values in column A of both dataframes
intersection = col_a_df1.isin(col_a_df2)
# filter the rows of df1 where the value in column A is not found in df2
df1 = df1[intersection]
# write the filtered data to a new Excel file
df1.to_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/filtered_data.xlsx', index=False)
I just duplicated the 2 inputfiles which means I should receive a blank document but the document is still copying data to the new sheet.
What did I do wrong?
Many thanks for your support!
答案1
得分: 0
如果找不到该值,整行应复制到新文档。
IIUC,您需要(~
),NOT运算符,以否定您的布尔掩码:
df1 = df1[~intersection]
英文:
> If the value cannot be found, this whole row should be copied to a new
> document.
IIUC, you need (~
), the NOT operator, to negate your boolean mask :
df1 = df1[~intersection]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论