2023年3月7日 15:29:53go评论96阅读模式

英文:

Compare Latest CSV with all CSV in directory and remove the matching from the latest and write new rows in new file with python

问题

代码将无法正常工作，例如当文件名是其他名称时。

例如，当文件名为carre123.csv时，它不会正确比较。但当我将文件名更改为test123.csv时，它可以正常工作。

英文:

the code will not work properly e.g. when the name of files are something else.

for example when the file name is carre123.csv, it wont compare correctly. but when I changed the file name to test123.csv it works fine.

here is the code

import os
import pandas as pd
# Set the directory where the CSV files are stored
directory = &#39;/PATH/csv-files&#39;
# Get a list of all the CSV files in the directory
csv_files = [os.path.join(directory, f) for f in os.listdir(directory) if f.endswith(&#39;.csv&#39;)]
#print(csv_files)
# Sort the CSV files by modification time and select the last file as the latest file
latest_file = sorted(csv_files, key=os.path.getmtime)[-1]
#print(latest_file)
# Read the contents of the latest CSV file into a pandas DataFrame
latest_data = pd.read_csv(latest_file)
#print(latest_data)
# Iterate over all the previous CSV files
for csv_file in csv_files[:-1]:
    # Read the contents of the previous CSV file into a pandas DataFrame
    prev_data = pd.read_csv(csv_file)
    #print(prev_data)
    # Identify the rows in the latest CSV file that match the rows in the previous CSV file
    matches = latest_data.isin(prev_data.to_dict(&#39;list&#39;)).all(axis=1)
    print(matches)
    # Remove the matching rows from the latest CSV file
    latest_data = latest_data[~matches]
# Write the remaining rows in the latest CSV file to a new file
latest_data.to_csv(&#39;/NEWPATH/diff.csv&#39;, index=False)

when the file name is carre123.csv, it wont compare correctly. but when I changed the file name to test123.csv it works fine.

答案1

得分: 1

我认为你的代码有一个bug，这可能是导致问题的原因。for 循环遍历的是 csv_files[:-1]，这个列表并没有按修改时间排序，因此根据文件名的不同，可能会导致循环包括 latest_file。尝试存储排序后的列表，sorted(csv_files, key=os.path.getmtime)，然后选择最后一个作为 latest_file，并循环遍历剩余的文件。也许还有其他问题，但根据你提供的示例，这似乎是我能明显看到的唯一问题。

英文:

I think your code has a bug, which may be what is causing the problem. The for loop is over csv_files[:-1] which is not sorted by modification time, so depending on the file names this may cause the loop to include latest_file. Try storing the sorted list, sorted(csv_files, key=os.path.getmtime), then select the last one for latest_file and loop over the remaining files. Maybe there is something else wrong too, but based on the example you provided, this looks like the only issue I can see that is obviously a problem.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Compare Latest CSV with all CSV in directory and remove the matching from the latest and write new rows in new file with python

问题

答案1

生成KivyMD应用程序的.exe文件而不出现控制台窗口的方法？

并行化一个函数，该函数用于在Pandas数据框中填充重复值的缺失值。

Python和Flask，从服务器发送和接收数据。

Python/Pandas. For loop on multiple dataFrames not working correctly.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。