英文:
Pandas: read_csv with usecols that may not exist in the csv file
问题
I have several large CSV files with many columns, but they do not all have the same set of columns, resulting in a different number of columns in each file. I only want to read columns that contain a certain string, let's call it "abc-".
I created a list of possible column names that contain the string "abc-" and set the usecol
parameter equal to this list when reading the CSV using pd.read_csv
. However, it gave me an error:
ValueError: Usecols do not match columns, columns expected but not found: ['abc-150']
What can I do?
英文:
I have several large csv with many columns but they do not all have the same set of columns hence different number of columns in each file. And I only want to read columns that contains a certain string call it "abc-".
I created a list of possible column names that contain the string "abc-" and set the usecol
equal to this when reading the csv using pd.read_csv
. It gave me an error.
ValueError: Usecols do not match columns, columns expected but not found: ['abc-150']
What can I do ?
答案1
得分: 1
可以使用:
s = """col1,abc,col3,abcde,fghabc"""
df = pd.read_csv(StringIO(s), usecols=lambda x: "abc" in x)
输出:
print(df.columns)
Index(['abc', 'abcde', 'fghabc'], dtype='object')
英文:
You can use :
s = """col1,abc,col3,abcde,fghabc"""
df = pd.read_csv(StringIO(s), usecols=lambda x: "abc" in x)
Output :
print(df.columns)
Index(['abc', 'abcde', 'fghabc'], dtype='object')
答案2
得分: 1
Another possible solution, is to read columns first, filter them, then pass them to usecols
as follows:
import pandas as pd
from io import StringIO
data = """
col1, col2, abc-col, abc-col2
val1, val2, val3, val4
val5, val6, val7, val8
"""
columns = pd.read_csv(StringIO(data), nrows=1).columns
df = pd.read_csv(StringIO(data), usecols=[col for col in columns if 'abc' in col])
df.head()
英文:
Another possible solution, is to read columns first, filter them, then pass them to usecols
as follows:
import pandas as pd
from io import StringIO
data="""
col1, col2, abc-col, abc-col2
val1, val2, val3, val4
val5, val6, val7, val8
"""
columns = pd.read_csv(StringIO(data), nrows=1).columns
df = pd.read_csv(StringIO(data), usecols=[col for col in columns if 'abc' in col])
df.head()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论