英文:
How to read csv file in Python if the file can but does not have to contain headers in first line?
问题
我正在尝试使用csv.DictReader
在Python中读取CSV文件。我需要处理以下两种情况:
- 当标题在第一行存在时:
col1,col2
foo,bar
- 当标题被省略时:
foo,bar
如果提供了标题,我可以假设它们总是col1,col2
。
我尝试使用fieldnames
参数,但当标题存在时,它们被视为值:
reader = csv.DictReader(csv_file, fieldnames=['col1','col2'])
print(list(reader))
输出:
[{'col1': 'col1', 'col2': 'col2'}, {'col1': 'foo', 'col2': 'bar'}]
而不是:
[{'col1': 'foo', 'col2': 'bar'}]
在没有标题的情况下,使用没有fieldnames
参数的csv.DictReader
可以工作,但当没有标题时返回空列表。
英文:
I am trying to read a csv file in Python using csv.DictReader
. I need to handle both cases:
- when headers are present in the first line:
col1,col2
foo,bar
- when they are omitted:
foo,bar
I can assume that headers are always col1,col2
if they are provided.
I tried to use fieldnames
parameter, but then headers are treated as values when they are present:
reader = csv.DictReader(csv_file, fieldnames=['col1','col2'])
print(list(reader))
Output:
[{'col1': 'col1', 'col2': 'col2'}, {'col1': 'foo', 'col2': 'bar'}]
instead of:
[{'col1': 'foo', 'col2': 'bar'}]
Using csv.DictReader
without fieldnames
parameter works when headers are present but returns empty list when there are no headers.
答案1
得分: 1
你可以尝试读取第一行,如果与你期望的标题不同,可以通过 `seek()` 返回文件开头。
给定 "in1.csv"
col1,col2
foo,bar
给定 "in2.csv"
foo,bar
然后
```python
import csv
fieldnames=['col1','col2']
for file_name in ["in1.csv", "in2.csv"]:
with open(file_name, "r") as file_in:
if file_in.readline().strip() != ",".join(fieldnames):
file_in.seek(0)
results = list(csv.DictReader(file_in, fieldnames=fieldnames))
print(results)
应该输出:
[{'col1': 'foo', 'col2': 'bar'}]
[{'col1': 'foo', 'col2': 'bar'}]
英文:
You might look at reading the first row and if it is not the same as your expected headers, go back to the start of the file via seek()
Given "in1.csv"
col1,col2
foo,bar
Given "in2.csv"
foo,bar
Then
import csv
fieldnames=['col1','col2']
for file_name in ["in1.csv", "in2.csv"]:
with open(file_name, "r") as file_in:
if file_in.readline().strip() != ",".join(fieldnames):
file_in.seek(0)
results = list(csv.DictReader(file_in, fieldnames=fieldnames))
print(results)
Should give you:
[{'col1': 'foo', 'col2': 'bar'}]
[{'col1': 'foo', 'col2': 'bar'}]
答案2
得分: 1
可以使用DictReader,并提供最终想要的字段名(以处理隐式无标题的情况),然后处理明确有标题的情况,现在需要跳过它:
import csv
header_vals = ["col1", "col2"]
header_row = {x: x for x in header_vals}
for fname in ["input1.csv", "input2.csv"]:
with open(fname, newline="") as f:
reader = csv.DictReader(f, fieldnames=header_vals)
print(f"{fname}:")
for row in reader:
if row == header_row:
print(" 跳过显式标题")
continue
print(f" {row}")
对于这两个文件:
input1.csv
==========
col1,col2
r1c1,r1c2
r2c1,r2c2
input2.csv
==========
r1c1,r1c2
r2c1,r2c2
输出结果如下:
input1.csv:
跳过显式标题
{'col1': 'r1c1', 'col2': 'r1c2'}
{'col1': 'r2c1', 'col2': 'r2c2'}
input2.csv:
{'col1': 'r1c1', 'col2': 'r1c2'}
{'col1': 'r2c1', 'col2': 'r2c2'}
英文:
You can use the DictReader and supply it the fieldnames you ultimately want (to handle the implicit no-header case), then handle the explicit case where there was a header and now you need to skip it:
import csv
header_vals = ["col1", "col2"]
header_row = {x: x for x in header_vals}
for fname in ["input1.csv", "input2.csv"]:
with open(fname, newline="") as f:
reader = csv.DictReader(f, fieldnames=header_vals)
print(f"{fname}:")
for row in reader:
if row == header_row:
print(" skipped explicit header")
continue
print(f" {row}")
For these two files:
input1.csv
==========
col1,col2
r1c1,r1c2
r2c1,r2c2
input2.csv
==========
r1c1,r1c2
r2c1,r2c2
that prints:
input1.csv:
skipped explicit header
{'col1': 'r1c1', 'col2': 'r1c2'}
{'col1': 'r2c1', 'col2': 'r2c2'}
input2.csv:
{'col1': 'r1c1', 'col2': 'r1c2'}
{'col1': 'r2c1', 'col2': 'r2c2'}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论