Pandas:使用可能不存在于CSV文件中的usecols进行read_csv。

huangapple go评论69阅读模式
英文:

Pandas: read_csv with usecols that may not exist in the csv file

问题

I have several large CSV files with many columns, but they do not all have the same set of columns, resulting in a different number of columns in each file. I only want to read columns that contain a certain string, let's call it "abc-".

I created a list of possible column names that contain the string "abc-" and set the usecol parameter equal to this list when reading the CSV using pd.read_csv. However, it gave me an error:

ValueError: Usecols do not match columns, columns expected but not found: ['abc-150']

What can I do?

英文:

I have several large csv with many columns but they do not all have the same set of columns hence different number of columns in each file. And I only want to read columns that contains a certain string call it "abc-".

I created a list of possible column names that contain the string "abc-" and set the usecol equal to this when reading the csv using pd.read_csv. It gave me an error.

ValueError: Usecols do not match columns, columns expected but not found: ['abc-150']

What can I do ?

答案1

得分: 1

可以使用:

s = """col1,abc,col3,abcde,fghabc"""

df = pd.read_csv(StringIO(s), usecols=lambda x: "abc" in x)

输出:

print(df.columns)

Index(['abc', 'abcde', 'fghabc'], dtype='object')
英文:

You can use :

s = """col1,abc,col3,abcde,fghabc"""
​
df = pd.read_csv(StringIO(s), usecols=lambda x: "abc" in x)


Output :

print(df.columns)

Index(['abc', 'abcde', 'fghabc'], dtype='object')

答案2

得分: 1

Another possible solution, is to read columns first, filter them, then pass them to usecols as follows:

import pandas as pd
from io import StringIO
data = """
col1, col2, abc-col, abc-col2
val1, val2, val3, val4
val5, val6, val7, val8
"""
columns = pd.read_csv(StringIO(data), nrows=1).columns
df = pd.read_csv(StringIO(data), usecols=[col for col in columns if 'abc' in col])
df.head()
英文:

Another possible solution, is to read columns first, filter them, then pass them to usecols as follows:

import pandas as pd
from io import StringIO
data="""
col1, col2, abc-col, abc-col2
val1, val2, val3, val4
val5, val6, val7, val8
"""
columns = pd.read_csv(StringIO(data), nrows=1).columns
df = pd.read_csv(StringIO(data), usecols=[col for col in columns if 'abc' in col])
df.head()

huangapple
  • 本文由 发表于 2023年5月17日 23:24:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76273725.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定