2023年5月17日 23:24:03go评论69阅读模式

英文:

Pandas: read_csv with usecols that may not exist in the csv file

问题

I have several large CSV files with many columns, but they do not all have the same set of columns, resulting in a different number of columns in each file. I only want to read columns that contain a certain string, let's call it "abc-".

I created a list of possible column names that contain the string "abc-" and set the usecol parameter equal to this list when reading the CSV using pd.read_csv. However, it gave me an error:

ValueError: Usecols do not match columns, columns expected but not found: ['abc-150']

What can I do?

英文:

I have several large csv with many columns but they do not all have the same set of columns hence different number of columns in each file. And I only want to read columns that contains a certain string call it "abc-".

I created a list of possible column names that contain the string "abc-" and set the usecol equal to this when reading the csv using pd.read_csv. It gave me an error.

ValueError: Usecols do not match columns, columns expected but not found: ['abc-150']

What can I do ?

答案1

得分: 1

可以使用：

s = &quot;&quot;&quot;col1,abc,col3,abcde,fghabc&quot;&quot;&quot;

df = pd.read_csv(StringIO(s), usecols=lambda x: &quot;abc&quot; in x)

输出：

print(df.columns)

Index([&#39;abc&#39;, &#39;abcde&#39;, &#39;fghabc&#39;], dtype=&#39;object&#39;)

英文:

You can use :

s = &quot;&quot;&quot;col1,abc,col3,abcde,fghabc&quot;&quot;&quot;

df = pd.read_csv(StringIO(s), usecols=lambda x: &quot;abc&quot; in x)

Output :

print(df.columns)

Index([&#39;abc&#39;, &#39;abcde&#39;, &#39;fghabc&#39;], dtype=&#39;object&#39;)

答案2

得分: 1

Another possible solution, is to read columns first, filter them, then pass them to usecols as follows:

import pandas as pd
from io import StringIO
data = """
col1, col2, abc-col, abc-col2
val1, val2, val3, val4
val5, val6, val7, val8
"""
columns = pd.read_csv(StringIO(data), nrows=1).columns
df = pd.read_csv(StringIO(data), usecols=[col for col in columns if 'abc' in col])
df.head()

英文:

Another possible solution, is to read columns first, filter them, then pass them to usecols as follows:

import pandas as pd
from io import StringIO
data=&quot;&quot;&quot;
col1, col2, abc-col, abc-col2
val1, val2, val3, val4
val5, val6, val7, val8
&quot;&quot;&quot;
columns = pd.read_csv(StringIO(data), nrows=1).columns
df = pd.read_csv(StringIO(data), usecols=[col for col in columns if &#39;abc&#39; in col])
df.head()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas：使用可能不存在于CSV文件中的usecols进行read_csv。

问题

答案1

答案2

从Python项目中加载数据存储实体到Go语言会导致嵌套的结构体切片错误。

Pandas：计算随时间变化的群组之间的比率

AttributeError: ‘Figure’对象没有’sort_values’属性。

计算两个向量之间的有符号角度的数值稳定方法

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论