英文:
pandas .drop(columns=[]) is returning KeyError when columns are in the csv and dataframe
问题
我正在尝试从CSV导入市场数据以运行一些回测。
我写了以下代码:
import pandas as pd
import numpy as np
df = pd.read_csv("30mindata.csv")
df = df.drop(columns=['Volume', 'NumberOfTrades', 'BidVolume', 'AskVolume'])
print(df)
我遇到了以下错误:
KeyError: "['Volume', 'NumberOfTrades', 'BidVolume', 'AskVolume'] not found in axis"
当我移除包含 drop()
的那行代码时,DataFrame 打印如下:
Date Time Open High Low Last Volume NumberOfTrades BidVolume AskVolume
0 2018/2/18 14:00:00 2734.50 2741.00 2734.00 2739.75 5304 2787 2299 3005
1 2018/2/18 14:30:00 2739.75 2741.00 2739.25 2740.25 1402 815 648 754
2 2018/2/18 15:00:00 2740.25 2743.50 2739.25 2742.00 4536 2301 2074 2462
3 2018/2/18 15:30:00 2742.25 2744.75 2742.25 2744.00 4102 1826 1949 2153
4 2018/2/18 16:00:00 2744.00 2744.25 2742.25 2742.25 2492 1113 1551 941
... ... ... ... ... ... ... ... ... ... ...
59074 2023/2/17 10:30:00 4076.25 4088.00 4076.00 4086.50 92507 54379 44917 47590
59075 2023/2/17 11:00:00 4086.50 4090.50 4079.25 4081.00 107233 67968 55784 51449
59076 2023/2/17 11:30:00 4081.00 4090.50 4079.50 4088.25 171507 92705 86022 85485
59077 2023/2/17 12:00:00 4088.00 4089.00 4085.25 4086.00 41032 17210 21176 19856
59078 2023/2/17 12:30:00 4086.25 4088.00 4085.25 4085.75 5164 2922 2818 2346
我有另一个文件使用了这种 pd.read_csv()
和然后 df.drop(columns=[])
的形式,工作得很好。我尝试了 df.loc[:, 'Volume']
,但得到了相同的 KeyError
,显示 Volume
未在轴上找到。我真的不明白为什么标签在数据框中没有,而不使用 .drop()
函数时可以正确输出。
英文:
I'm trying to import market data from a csv to run some backtests.
I wrote the following code:
import pandas as pd
import numpy as np
df = pd.read_csv("30mindata.csv")
df = df.drop(columns=['Volume', 'NumberOfTrades', 'BidVolume', 'AskVolume'])
print(df)
I'm getting the error:
> KeyError: "['Volume', 'NumberOfTrades', 'BidVolume', 'AskVolume'] not found in axis"
When I remove the line of code containing drop()
the dataframe prints as follows:
Date Time Open High Low Last Volume NumberOfTrades BidVolume AskVolume
0 2018/2/18 14:00:00 2734.50 2741.00 2734.00 2739.75 5304 2787 2299 3005
1 2018/2/18 14:30:00 2739.75 2741.00 2739.25 2740.25 1402 815 648 754
2 2018/2/18 15:00:00 2740.25 2743.50 2739.25 2742.00 4536 2301 2074 2462
3 2018/2/18 15:30:00 2742.25 2744.75 2742.25 2744.00 4102 1826 1949 2153
4 2018/2/18 16:00:00 2744.00 2744.25 2742.25 2742.25 2492 1113 1551 941
... ... ... ... ... ... ... ... ... ... ...
59074 2023/2/17 10:30:00 4076.25 4088.00 4076.00 4086.50 92507 54379 44917 47590
59075 2023/2/17 11:00:00 4086.50 4090.50 4079.25 4081.00 107233 67968 55784 51449
59076 2023/2/17 11:30:00 4081.00 4090.50 4079.50 4088.25 171507 92705 86022 85485
59077 2023/2/17 12:00:00 4088.00 4089.00 4085.25 4086.00 41032 17210 21176 19856
59078 2023/2/17 12:30:00 4086.25 4088.00 4085.25 4085.75 5164 2922 2818 2346
I have another file that uses this exact form of pd.read_csv()
and then df.drop(columns=[])
which works just fine. I tried df.loc[:, 'Volume']
and got the same KeyError
saying 'Volume' was not found in the axis
. I really don't understand how the labels aren't in the dataframe when they get output correctly without the .drop()
function
答案1
得分: 1
很可能是您的列名称中存在空格。
尝试通过执行以下操作删除这些空格...
import pandas as pd
df = pd.read_csv("30mindata.csv")
df.columns = [col.strip() for col in df.columns]
然后尝试像以前一样删除列。
英文:
It's very likely that you have blank spaces in the name of your columns.
Try removing those spaces doing this...
import pandas as pd
df = pd.read_csv("30mindata.csv")
df.columns = [col.strip() for col in df.columns]
Then try to drop the columns as before
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论