2023年3月7日 23:18:06go评论109阅读模式

英文:

How to read csv file in Python if the file can but does not have to contain headers in first line?

问题

我正在尝试使用csv.DictReader在Python中读取CSV文件。我需要处理以下两种情况：

当标题在第一行存在时：

col1,col2
foo,bar

当标题被省略时：

foo,bar

如果提供了标题，我可以假设它们总是col1,col2。

我尝试使用fieldnames参数，但当标题存在时，它们被视为值：

reader = csv.DictReader(csv_file, fieldnames=['col1','col2'])
print(list(reader))

输出：

[{'col1': 'col1', 'col2': 'col2'}, {'col1': 'foo', 'col2': 'bar'}]

而不是：

[{'col1': 'foo', 'col2': 'bar'}]

在没有标题的情况下，使用没有fieldnames参数的csv.DictReader可以工作，但当没有标题时返回空列表。

英文:

I am trying to read a csv file in Python using csv.DictReader. I need to handle both cases:

when headers are present in the first line:

col1,col2
foo,bar

when they are omitted:

foo,bar

I can assume that headers are always col1,col2 if they are provided.
I tried to use fieldnames parameter, but then headers are treated as values when they are present:

reader = csv.DictReader(csv_file, fieldnames=[&#39;col1&#39;,&#39;col2&#39;])
print(list(reader))

Output:

[{&#39;col1&#39;: &#39;col1&#39;, &#39;col2&#39;: &#39;col2&#39;}, {&#39;col1&#39;: &#39;foo&#39;, &#39;col2&#39;: &#39;bar&#39;}]

instead of:

[{&#39;col1&#39;: &#39;foo&#39;, &#39;col2&#39;: &#39;bar&#39;}]

Using csv.DictReader without fieldnames parameter works when headers are present but returns empty list when there are no headers.

答案1

得分: 1

你可以尝试读取第一行，如果与你期望的标题不同，可以通过 `seek()` 返回文件开头。
给定 "in1.csv"

col1,col2
foo,bar


给定 "in2.csv"

foo,bar


然后
```python
import csv
fieldnames=['col1','col2']
for file_name in ["in1.csv", "in2.csv"]:
    with open(file_name, "r") as file_in:
        if file_in.readline().strip() != ",".join(fieldnames):
            file_in.seek(0)
        results = list(csv.DictReader(file_in, fieldnames=fieldnames))
    print(results)

应该输出：

[{'col1': 'foo', 'col2': 'bar'}]
[{'col1': 'foo', 'col2': 'bar'}]

英文:

You might look at reading the first row and if it is not the same as your expected headers, go back to the start of the file via seek()

Given "in1.csv"

col1,col2
foo,bar

Given "in2.csv"

foo,bar

Then

import csv
fieldnames=[&#39;col1&#39;,&#39;col2&#39;]
for file_name in [&quot;in1.csv&quot;, &quot;in2.csv&quot;]:
    with open(file_name, &quot;r&quot;) as file_in:
        if file_in.readline().strip() != &quot;,&quot;.join(fieldnames):
            file_in.seek(0)
        results = list(csv.DictReader(file_in, fieldnames=fieldnames))
    print(results)

Should give you:

[{&#39;col1&#39;: &#39;foo&#39;, &#39;col2&#39;: &#39;bar&#39;}]
[{&#39;col1&#39;: &#39;foo&#39;, &#39;col2&#39;: &#39;bar&#39;}]

答案2

得分: 1

可以使用DictReader，并提供最终想要的字段名（以处理隐式无标题的情况），然后处理明确有标题的情况，现在需要跳过它：

import csv
header_vals = ["col1", "col2"]
header_row = {x: x for x in header_vals}
for fname in ["input1.csv", "input2.csv"]:
    with open(fname, newline="") as f:
        reader = csv.DictReader(f, fieldnames=header_vals)
        print(f"{fname}:")
        for row in reader:
            if row == header_row:
                print("  跳过显式标题")
                continue
            print(f"  {row}")

对于这两个文件：

input1.csv
==========
col1,col2
r1c1,r1c2
r2c1,r2c2
input2.csv
==========
r1c1,r1c2
r2c1,r2c2

输出结果如下：

input1.csv:
  跳过显式标题
  {'col1': 'r1c1', 'col2': 'r1c2'}
  {'col1': 'r2c1', 'col2': 'r2c2'}
input2.csv:
  {'col1': 'r1c1', 'col2': 'r1c2'}
  {'col1': 'r2c1', 'col2': 'r2c2'}

英文:

You can use the DictReader and supply it the fieldnames you ultimately want (to handle the implicit no-header case), then handle the explicit case where there was a header and now you need to skip it:

import csv
header_vals = [&quot;col1&quot;, &quot;col2&quot;]
header_row = {x: x for x in header_vals}
for fname in [&quot;input1.csv&quot;, &quot;input2.csv&quot;]:
    with open(fname, newline=&quot;&quot;) as f:
        reader = csv.DictReader(f, fieldnames=header_vals)
        print(f&quot;{fname}:&quot;)
        for row in reader:
            if row == header_row:
                print(&quot;  skipped explicit header&quot;)
                continue
            print(f&quot;  {row}&quot;)

For these two files:

input1.csv
==========
col1,col2
r1c1,r1c2
r2c1,r2c2
input2.csv
==========
r1c1,r1c2
r2c1,r2c2

that prints:

input1.csv:
  skipped explicit header
  {&#39;col1&#39;: &#39;r1c1&#39;, &#39;col2&#39;: &#39;r1c2&#39;}
  {&#39;col1&#39;: &#39;r2c1&#39;, &#39;col2&#39;: &#39;r2c2&#39;}
input2.csv:
  {&#39;col1&#39;: &#39;r1c1&#39;, &#39;col2&#39;: &#39;r1c2&#39;}
  {&#39;col1&#39;: &#39;r2c1&#39;, &#39;col2&#39;: &#39;r2c2&#39;}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中读取CSV文件，如果文件可以但不一定包含第一行的标题？

问题

答案1

答案2

如何为我的Discord机器人实现定时消息？

Holoviz Panel为什么显示文本而不是seaborn绘图？

PySpark多条件筛选

Type Error-String indices must be integers

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。