从CSV文件中使用if语句删除列

huangapple go评论63阅读模式
英文:

Deleting columns from a CSV file using an if statement

问题

以下是您要翻译的代码部分:

(I am still really new at python/coding) I am working with a program the spits out a text file with thousands of lines of data for an output which I need to plot. I need to parse the first three columns of data remove the rest. When the simulation is done running sometime the program will give me three columns and sometime it will give me up to 10 columns. I have been trying to figure out how to remove the columns I do not need. I have tried a for loop, and I have thought about a while loop but I do not know what to do.

this is what I have for the while loop:

fin = "file location"
fout = "file location"

#delete unused columns
while cols_to_remove in fin >3:
    cols_to_remove = [3, 4, 5] # Column indexes to be removed (starts at 0)
    cols_to_remove = sorted(cols_to_remove, reverse=True) # Reverse so we remove from the end first
    row_count = 0 # Current amount of rows processed
    with open(fin, "r") as source:
        reader = csv.reader(source)
        with open(fout, "w", newline='') as result:
            writer = csv.writer(result)
            for row in reader:
                row_count += 1
                for col_index in cols_to_remove:
                    del row[col_index]
                writer.writerow(row)

我已为您翻译代码,如果您需要更多帮助,请随时提出。

英文:

(I am still really new at python/coding) I am working with a program the spits out a text file with thousands of lines of data for an output which I need to plot. I need to parse the first three columns of data remove the rest. When the simulation is done running sometime the program will give me three columns and sometime it will give me up to 10 columns. I have been trying to figure out how to remove the columns I do not need. I have tried a for loop, and I have thought about a while loop but I do not know what to do.

this is what I have for the while loop:

fin = "file location"
fout = "file location"

#delete unused columns
while cols_to_remove in fin >3:
    cols_to_remove = [3, 4, 5] # Column indexes to be removed (starts at 0)
    cols_to_remove = sorted(cols_to_remove, reverse=True) # Reverse so we remove from the end first
    row_count = 0 # Current amount of rows processed
    with open(fin, "r") as source:
        reader = csv.reader(source)
        with open(fout, "w", newline='') as result:
            writer = csv.writer(result)
            for row in reader:
                row_count += 1
                for col_index in cols_to_remove:
                    del row[col_index]
                writer.writerow(row)

The if loop I tried was:

for cols_to_remove in fin == 10:
    cols_to_remove = [3, 4, 5, 6, 7, 8, 9] # Column indexes to be removed (starts at 0)
    cols_to_remove = sorted(cols_to_remove, reverse=True) # Reverse so we remove from the end first
    row_count = 0 # Current amount of rows processed
    reader = csv.reader(source)
    with open(fout, "w", newline='') as result:
        writer = csv.writer(result)
        for row in reader:
            row_count += 1
            for col_index in cols_to_remove:
                del row[col_index]
            writer.writerow(row)
elif cols_to_remove in fin == 9:
    cols_to_remove = [3, 4, 5, 6, 7, 8] # Column indexes to be removed (starts at 0)
    cols_to_remove = sorted(cols_to_remove, reverse=True) # Reverse so we remove from the end first
    row_count = 0 # Current amount of rows processed
    reader = csv.reader(source)
    with open(fout, "w", newline='') as result:
        writer = csv.writer(result)
        for row in reader:
            row_count += 1
            for col_index in cols_to_remove:
                del row[col_index]
            writer.writerow(row)
else:
    break

I am not sure if I am on the right track. If I use the section of code that starts with cols_to_remove and change it to [3, 4] when there are five columns it works fine.

This is what I start with from the Text file (first five lines of data set):

X [m],Y [m],Z [m],Task #,Pulse #,Pixel X,Pixel Y,Pixel #,Return,Intensity
4.630,-5.078,16.517,0,0,0,30,960,0,0.211
4.937,-4.779,13.969,0,0,2,32,1026,0,0.106
4.630,-4.623,16.366,0,0,0,33,1056,0,0.205
4.937,-4.626,14.418,0,0,2,33,1058,0,0.296
5.087,-4.626,14.868,0,0,3,33,1059,0,0.109

This is what I want to end up with (in a CSV file):

X [m],Y [m],Z [m]
4.630,-5.078,16.517
4.937,-4.779,13.969
4.630,-4.623,16.366
4.937,-4.626,14.418
5.087,-4.626,14.868

答案1

得分: 2

I think you are thinking about this the opposite way around, why get columns from the csv if you will just delete it later. Instead, only get the columns you need, as so:

new_list = []
with open('example.csv') as f:
    # 创建读取器对象
    reader_obj = csv.reader(f)
    
    # 在这里添加 next(reader_obj)

    # 使用读取器对象遍历 CSV 文件中的每一行
    for row in reader_obj:
        new_list.append([row[0], row[1], row[2]])  # 获取前三个元素

print(new_list)

A simpler way:

with open('example.csv') as f:
    new_list = [[row[0], row[1], row[2]] for row in csv.reader(f)]

print(new_list)

Note that both these methods will include the header row. To remove it: Add the next() function where I commented in the first code. Or you can simply pop the first element of the new list afterwards.

new_list.pop(0)  # 弹出(移除)列表的第一个元素

Edit:

You can also use pandas, much simpler, to be honest:

df = pd.read_csv('example.csv', usecols=[0, 1, 2], header=0)
英文:

I think you are thinking about this the opposite way around, why get columns from the csv if you will just delete it later. Instead, only get the columns you need, as so:

new_list = []
with open('example.csv') as f:
    # Create reader object
    reader_obj = csv.reader(f)
    
    #Add next(reader_obj) here

    # Iterate over each row in the csv 
    # file using reader object
    for row in reader_obj:
        new_list.append([row[0], row[1], row[2]]) #get the first 3 elements

print(new_list)

A simpler way

with open('example.csv') as f:
    new_list = [[row[0], row[1], row[2]] for row in csv.reader(f)]

print(new_list)

Note that both these methods will include the header row. To remove it: Add the next() function where I commented in the first code. Or you can simply pop the first element of the new-list afterwards.

new_list.pop(0) #Pop (remove) the first element of the list

Edit:
You can also use pandas, much simpler tbh:

df = pd.read_csv('example.csv', usecols=[0,1, 2], header=0)

答案2

得分: 0

@Michael_Butscher建议使用“行的切片”似乎是最简单的方法:

for row in reader:
    writer.writerow(row[:3])

这将包括从索引0开始并到(但不包括)索引3的行字段,或者说“前三个字段”。当我在包含10列的输入上运行时,结果如下:

X [m],Y [m],Z [m]
...

将来,如果您需要从行中删除某些字段,您所展示的按索引反向循环并删除该索引处的元素的方法也可以工作:

for row in reader:
    for i in [9, 7, 2]:
        del row[i]
    writer.writerow(row)

当我在包含10列的输入上运行时,结果如下:

X [m],Y [m],Task #,Pulse #,Pixel X,Pixel Y,Return
...

我认为您展示的代码中更大的问题是while循环的“外部逻辑”。正如您在问题中所写的那样:

while cols_to_remove in fin >3:
    ...

这样写会产生错误,因为:

  • cols_to_remove在while循环内部声明,因此不能用作while循环的条件
  • 即使cols_to_remove在循环外部声明(外部范围),它仍会导致错误,因为您不能询问:“这个整数列表是否在我的文件名字符串中?”

我认为在这种情况下,您可能想要的是:

for row in reader:
    if len(row) > 3:
        # 删除一些列

以下是保留每行前3个字段的小而完整的程序。我没有使用with open(...),以使代码看起来更整洁:

import csv

f_in = open("input.csv", newline="")
f_out = open("output.csv", "w", newline="")

reader = csv.reader(f_in)
writer = csv.writer(f_out)

for row in reader:
    writer.writerow(row[:3])

我也没有手动关闭文件,因为Python会在退出时自动关闭它们。

您也可以这样做:

with open("input.csv", newline="") as f_in, open("output.csv", "w", newline="") as f_out:
    ...

如果您使用Python 3.10+,您可以使用括号将它们分组,并将它们放在多行中:

with (
    open("input.csv", newline="") as f_in,
    open("output.csv", "w", newline="") as f_out,
):
    ...
英文:

@Michael_Butscher's advice to use a slice of the row seems like the easiest approach:

for row in reader:
    writer.writerow(row[:3])

That will include the fields of a row starting at index 0 and going up to (but not including) index 3, or, "the first three fields". When I run that on your input of 10 columns I get:

X [m],Y [m],Z [m]
...

In the future, if you do need to delete certain fields from a row, the approach you showed of looping over indices backwards and deleting the element at that index should work:

for row in reader:
    for i in [9, 7, 2]:
        del row[i]
    writer.writerow(row)

when I run that on your input of 10 columns I get:

X [m],Y [m],Task #,Pulse #,Pixel X,Pixel Y,Return
...

I think the bigger problem with the code you showed was the "outer logic" of the while-loops. As it is written in your questions:

while cols_to_remove in fin >3:
    ...

should produce an error because:

  • cols_to_remove is declared inside the while-loop... so it cannot be used as the predicate for the while-loop
  • even if cols_to_remove were declare before (outside) the loop, it'd still error because you cannot ask, "is this list of integers in the string of my filename?"

I think what you'd want in that case is:

for row in reader:
    if len(row) > 3:
        # delete some columns

Here's the small, complete program to keep just the first 3 fields of each row. I'm not using with open(...) to keep it looking neater:

import csv

f_in = open("input.csv", newline="")
f_out = open("output.csv", "w", newline="")

reader = csv.reader(f_in)
writer = csv.writer(f_out)

for row in reader:
    writer.writerow(row[:3])

I'm also not closing the files myself because Python will do that when it exits.

You can always do:

with open("input.csv", newline="") as f_in, open("output.csv", "w", newline="") as f_out:
    ...

and if you're using Python 3.10+ you can group them with parentheses and have them on multiple lines:

with (
    open("input.csv", newline="") as f_in,
    open("output.csv", "w", newline="") as f_out,
):
    ...

huangapple
  • 本文由 发表于 2023年3月10日 00:42:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75687581.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定