如何在Python中读取CSV文件,如果文件可以但不一定包含第一行的标题?

huangapple go评论76阅读模式
英文:

How to read csv file in Python if the file can but does not have to contain headers in first line?

问题

我正在尝试使用csv.DictReader在Python中读取CSV文件。我需要处理以下两种情况:

  1. 当标题在第一行存在时:
col1,col2
foo,bar
  1. 当标题被省略时:
foo,bar

如果提供了标题,我可以假设它们总是col1,col2

我尝试使用fieldnames参数,但当标题存在时,它们被视为值:

reader = csv.DictReader(csv_file, fieldnames=['col1','col2'])
print(list(reader))

输出:

[{'col1': 'col1', 'col2': 'col2'}, {'col1': 'foo', 'col2': 'bar'}]

而不是:

[{'col1': 'foo', 'col2': 'bar'}]

在没有标题的情况下,使用没有fieldnames参数的csv.DictReader可以工作,但当没有标题时返回空列表。

英文:

I am trying to read a csv file in Python using csv.DictReader. I need to handle both cases:

  1. when headers are present in the first line:
col1,col2
foo,bar
  1. when they are omitted:
foo,bar

I can assume that headers are always col1,col2 if they are provided.
I tried to use fieldnames parameter, but then headers are treated as values when they are present:

reader = csv.DictReader(csv_file, fieldnames=['col1','col2'])
print(list(reader))

Output:

[{'col1': 'col1', 'col2': 'col2'}, {'col1': 'foo', 'col2': 'bar'}]

instead of:

[{'col1': 'foo', 'col2': 'bar'}]

Using csv.DictReader without fieldnames parameter works when headers are present but returns empty list when there are no headers.

答案1

得分: 1

你可以尝试读取第一行如果与你期望的标题不同可以通过 `seek()` 返回文件开头

给定 "in1.csv"

col1,col2
foo,bar


给定 "in2.csv"

foo,bar


然后

```python
import csv

fieldnames=['col1','col2']
for file_name in ["in1.csv", "in2.csv"]:
    with open(file_name, "r") as file_in:
        if file_in.readline().strip() != ",".join(fieldnames):
            file_in.seek(0)
        results = list(csv.DictReader(file_in, fieldnames=fieldnames))
    print(results)

应该输出:

[{'col1': 'foo', 'col2': 'bar'}]
[{'col1': 'foo', 'col2': 'bar'}]
英文:

You might look at reading the first row and if it is not the same as your expected headers, go back to the start of the file via seek()

Given "in1.csv"

col1,col2
foo,bar

Given "in2.csv"

foo,bar

Then

import csv

fieldnames=['col1','col2']
for file_name in ["in1.csv", "in2.csv"]:
    with open(file_name, "r") as file_in:
        if file_in.readline().strip() != ",".join(fieldnames):
            file_in.seek(0)
        results = list(csv.DictReader(file_in, fieldnames=fieldnames))
    print(results)

Should give you:

[{'col1': 'foo', 'col2': 'bar'}]
[{'col1': 'foo', 'col2': 'bar'}]

答案2

得分: 1

可以使用DictReader,并提供最终想要的字段名(以处理隐式无标题的情况),然后处理明确有标题的情况,现在需要跳过它:

import csv

header_vals = ["col1", "col2"]
header_row = {x: x for x in header_vals}

for fname in ["input1.csv", "input2.csv"]:
    with open(fname, newline="") as f:
        reader = csv.DictReader(f, fieldnames=header_vals)

        print(f"{fname}:")
        for row in reader:
            if row == header_row:
                print("  跳过显式标题")
                continue
            print(f"  {row}")

对于这两个文件:

input1.csv
==========
col1,col2
r1c1,r1c2
r2c1,r2c2

input2.csv
==========
r1c1,r1c2
r2c1,r2c2

输出结果如下:

input1.csv:
  跳过显式标题
  {'col1': 'r1c1', 'col2': 'r1c2'}
  {'col1': 'r2c1', 'col2': 'r2c2'}
input2.csv:
  {'col1': 'r1c1', 'col2': 'r1c2'}
  {'col1': 'r2c1', 'col2': 'r2c2'}
英文:

You can use the DictReader and supply it the fieldnames you ultimately want (to handle the implicit no-header case), then handle the explicit case where there was a header and now you need to skip it:

import csv

header_vals = ["col1", "col2"]
header_row = {x: x for x in header_vals}

for fname in ["input1.csv", "input2.csv"]:
    with open(fname, newline="") as f:
        reader = csv.DictReader(f, fieldnames=header_vals)

        print(f"{fname}:")
        for row in reader:
            if row == header_row:
                print("  skipped explicit header")
                continue
            print(f"  {row}")

For these two files:

input1.csv
==========
col1,col2
r1c1,r1c2
r2c1,r2c2

input2.csv
==========
r1c1,r1c2
r2c1,r2c2

that prints:

input1.csv:
  skipped explicit header
  {'col1': 'r1c1', 'col2': 'r1c2'}
  {'col1': 'r2c1', 'col2': 'r2c2'}
input2.csv:
  {'col1': 'r1c1', 'col2': 'r1c2'}
  {'col1': 'r2c1', 'col2': 'r2c2'}

huangapple
  • 本文由 发表于 2023年3月7日 23:18:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75663837.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定