2023年7月18日 01:55:56go评论90阅读模式

英文:

When using pd.read_csv, is there a way to exclude certain rows based on their contents when identifying the header?

问题

我正在尝试使用 pd.read_csv 打开和修改许多不同的 .dat 文件。这些文件的示例如下：

不同的 .dat 文件具有相同的一般格式，但可能具有不同的列，因此在解释列和它们包含的数据的初始行方面可能有不同的行数。这意味着我不能在迭代文件时只是硬编码 header 参数。

我尝试过将 header 硬编码为我将使用的最小行数，似乎一切都按正确的顺序排列了，但我想在创建标题时排除信息行。在使用读取函数时，是否有一种方法可以做到这一点？我还希望保留文件顶部的信息行。

英文:

I am trying to use pd.read_csv to open and modify many different .dat files. An example of what these files look like is as follows:

  #Data file
  #Information on column 1
  #Information on column 2
  #Information on column 3
  col1 col2 col3
  data data data

Different .dat files have the same general format, but may have different columns, so different numbers of inital rows explaining the columns and the data they contain. This means that I can't just hardcode the header parameter when I am iterating throughout the files.

I have tried to hardcode the header to the smallest row number I'd use and it seems to have put everything in the right order, but I want to exclude the information rows when creating my header. Is there a way I can do this when I am using the read function? I'd also like to keep the information rows at the top of the file where they are.

答案1

得分: 2

你可以尝试指定comment=参数：

df = pd.read_csv('your_data.csv', sep=r'\s+', comment='#')
print(df)

输出：

   col1  col2  col3
0  data  data  data

your_data.csv的内容：

#数据文件
#列1的信息
#列2的信息
#列3的信息
col1 col2 col3
data data data

英文:

You can try to specify comment= parameter:

df = pd.read_csv(&#39;your_data.csv&#39;, sep=r&#39;\s+&#39;, comment=&#39;#&#39;)
print(df)

Prints:

   col1  col2  col3
0  data  data  data

Contents of your_data.csv:

#Data file
#Information on column 1
#Information on column 2
#Information on column 3
col1 col2 col3
data data data

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

When using pd.read_csv, is there a way to exclude certain rows based on their contents when identifying the header?

问题

答案1

Failed building wheel for mysqlclient on macOS.

更快的方式使用已保存的PyTorch模型（绕过import torch？）

如何为jupyter-lab-4.0.2激活jupyterlab-vim？

在四列之间制作交叉表，并生成多重索引输出。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。