循环遍历 n 个 CSV 文件并在 Python 中删除列

huangapple go评论72阅读模式
英文:

Looping through n number of CSV files and deleting columns in python

问题

我有一个需要添加功能的程序,即剥离它处理的每个事件CSV文件的第二列。我尝试了这个帖子中的解决方案,但无法成功使用其中任何一个。

我的CSV文件如下所示

Time/Date,Event #,Event Desc
05/19/2020 20:12:30,29,Advance Drive ON
05/19/2020 20:32:23,29,Advance Drive ON
05/19/2020 20:35:13,29,Advance Drive ON
05/19/2020 20:39:50,37,Discharge 1 Plug Chute Fault
05/19/2020 20:47:40,68,LMI is in OFF Mode

以下是我的函数:

# 用于清理事件文件的函数
def CleanEventFiles(EF_files, eventHeader, EFmachineID):
    logging.debug(f'Cleaning Event files...')                       # 写入程序日志
    for f in EF_files:                                              # 对于事件文件中的所有文件
        IsFileReadOnly(f)                                           # 检查文件是否为只读
        print(f'\nCleaning file: {f}')                              # 告诉用户正在清理哪个文件
        print('\tReplacing new MachineIDs & File Headers...')       # 向用户打印信息
        logging.debug(f'\tReplacing headers for file {f}')          # 写入程序日志
        with open(f, newline='', encoding='latin-1') as g:          # 以读取方式打开文件
            r = csv.reader((line.replace('
# 用于清理事件文件的函数
def CleanEventFiles(EF_files, eventHeader, EFmachineID):
    logging.debug(f'Cleaning Event files...')                       # 写入程序日志
    for f in EF_files:                                              # 对于事件文件中的所有文件
        IsFileReadOnly(f)                                           # 检查文件是否为只读
        print(f'\nCleaning file: {f}')                              # 告诉用户正在清理哪个文件
        print('\tReplacing new MachineIDs & File Headers...')       # 向用户打印信息
        logging.debug(f'\tReplacing headers for file {f}')          # 写入程序日志
        with open(f, newline='', encoding='latin-1') as g:          # 以读取方式打开文件
            r = csv.reader((line.replace('\0', '') for line in g))  # 在删除NULL的同时声明读取变量
            next(r)                                                 # 移除旧的机器ID
            data = 
# 将列表设置为文件中的所有数据
data[0] = eventHeader # 用新标题替换第一行 data.insert(0, EFmachineID) # 在标题前添加机器ID的行 WriteData(f, data) # 将数据写入文件
', '') for line in g)) # 在删除NULL的同时声明读取变量
next(r) # 移除旧的机器ID data =
# 将列表设置为文件中的所有数据
data[0] = eventHeader # 用新标题替换第一行 data.insert(0, EFmachineID) # 在标题前添加机器ID的行 WriteData(f, data) # 将数据写入文件

我知道解决方法肯定很简单,比如在某个地方将 del r[1] 放入循环中,但我怎么都似乎找不出来。我能找到的最好的方法似乎是删除每个文件的事件#标题,但文件处理后 data[1] 中的数据仍然保留。

如何最好地删除这些文件中的第二列数据?

英文:

I have a program that I need to add a functionality to, which is to strip out the second column of each Event CSV file which it processes. I've tried the solutions at this thread, but I've been unsuccessful in employing any of them.

My CSV files look like this

Time/Date,Event #,Event Desc
05/19/2020 20:12:30,29,Advance Drive ON
05/19/2020 20:32:23,29,Advance Drive ON
05/19/2020 20:35:13,29,Advance Drive ON
05/19/2020 20:39:50,37,Discharge 1 Plug Chute Fault
05/19/2020 20:47:40,68,LMI is in OFF Mode

And here is my function:

# A function to clean the Event Files of raw data
def CleanEventFiles(EF_files, eventHeader, EFmachineID):
    logging.debug(f'Cleaning Event files...')                       # Write to program logger
    for f in EF_files:                                              # FOR ALL FILES IN EVENT FILES
        IsFileReadOnly(f)                                           # check to see if the file is READ ONLY
        print(f'\nCleaning file: {f}')                              # tell user which file is being cleaned
        print('\tReplacing new MachineIDs & File Headers...')       # print stuff to the user
        logging.debug(f'\tReplacing headers for file {f}')          # write to program logger
        with open(f, newline='', encoding='latin-1') as g:          # open file as read
            r = csv.reader((line.replace('
# A function to clean the Event Files of raw data
def CleanEventFiles(EF_files, eventHeader, EFmachineID):
logging.debug(f'Cleaning Event files...')                       # Write to program logger
for f in EF_files:                                              # FOR ALL FILES IN EVENT FILES
IsFileReadOnly(f)                                           # check to see if the file is READ ONLY
print(f'\nCleaning file: {f}')                              # tell user which file is being cleaned
print('\tReplacing new MachineIDs & File Headers...')       # print stuff to the user
logging.debug(f'\tReplacing headers for file {f}')          # write to program logger
with open(f, newline='', encoding='latin-1') as g:          # open file as read
r = csv.reader((line.replace('\0', '') for line in g))  # declare read variable while removing NULLs
next(r)                                                 # remove old machineID
data = 
# set list to all data in file data[0] = eventHeader # replace first line with new header data.insert(0, EFmachineID) # add line before header for machine ID WriteData(f, data) # write data to the file
', '') for line in g)) # declare read variable while removing NULLs next(r) # remove old machineID data =
# set list to all data in file data[0] = eventHeader # replace first line with new header data.insert(0, EFmachineID) # add line before header for machine ID WriteData(f, data) # write data to the file

I know it's got to be something as simple as putting del r[1] into a loop somewhere, but for the life of me I can't seem to figure it out. The best I seem to get is to remove the Event # header on each file, but the data in data[1] remains after the file processes.

What would be the best way to go about removing the second column of data from these files?

答案1

得分: 0

以下是代码的翻译部分:

import csv

def clean_event_file(filename, column_to_remove):

    ##--------------------
    ## 一次性读取所有行。
    ## 请注意,这还将获取标题。
    ##--------------------
    with open(filename, "r") as file_in:
        rows = list(csv.DictReader(file_in))
    headers = [col for col in rows[0].keys() if col != column_to_remove]
    ##--------------------

    ##--------------------
    ## 再次写出结果,不包括指定的标题
    ##--------------------
    with open(filename, "w", newline="") as file_out:
        writer = csv.DictWriter(file_out, fieldnames=headers, extrasaction="ignore")
        writer.writeheader()
        writer.writerows(rows)
    ##--------------------

clean_event_file("in.csv", "Event #")

注意:代码中的csv.DictReadercsv.DictWriter用于处理CSV文件,将指定列从CSV文件中删除,然后将结果写回文件。这段代码是Python代码,用于处理CSV文件。

英文:

If you can read all the rows into a list via csv.DictReader then that is a fairly straightforward solution. Note, this answer does things one file at a time so you will want to call it for each file. Note as well that this is currently destructive in that the original file is overwritten.

import csv

def clean_event_file(filename, column_to_remove):

    ##--------------------
    ## read in all the rows at once.
    ## note that this will also get us the headers.
    ##--------------------
    with open(filename, "r") as file_in:
        rows = list(csv.DictReader(file_in))
    headers = [col for col in rows[0].keys() if col != column_to_remove]
    ##--------------------

    ##--------------------
    ## Write out the results again absent the given header
    ##--------------------
    with open(filename, "w", newline="") as file_out:
        writer = csv.DictWriter(file_out, fieldnames=headers, extrasaction="ignore")
        writer.writeheader()
        writer.writerows(rows)
    ##--------------------

clean_event_file("in.csv", "Event #")

huangapple
  • 本文由 发表于 2023年2月24日 00:37:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75547739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定