Pandas数据框架 – 什么导致了这个错误?

huangapple go评论79阅读模式
英文:

Pandas dataframe - what causes this error?

问题

这段代码中出现错误的原因是在使用store.select方法时,你传递了一个where参数,但是数据存储的格式是Fixed format(固定格式),而Fixed format存储不支持使用where参数进行部分选择。你可以尝试以下方式来避免类似的错误:

  1. 不使用where参数: 如果你想选择整个数据集而不需要过滤数据,可以简单地调用store.select('obj2'),不传递where参数。
store.select('obj2')
  1. 使用Table格式存储: 如果你希望能够使用where参数进行选择,可以使用Table格式来存储数据,而不是Fixed格式。在存储数据时,将format参数设置为'table'
store.put('obj2', frame, format='table')

然后,你就可以使用where参数进行选择,就像你在代码中尝试的那样。

遵循这些建议,你就可以避免类似的错误,并根据需要轻松选择数据。

英文:

My code:

frame = pd.DataFrame({'a': np.random.randn(100)})
store = pd.HDFStore('mydata.h5')
store['obj1'] = frame
store['obj1_col'] = frame['a']
store.put('obj2',frame,foramt='table')
store.select('obj2',where=['index >= 10 and index <= 15'])

Gives this error message:

TypeError: cannot pass a where specification when reading from 
a Fixed format store. this store must be selected in its entirety

Why does this code give this error if every piece of code is right? How do I avoid similar errors in the future?

答案1

得分: 0

(I wanted to comment, but I can't yet due to reputation...)

你好,这很有趣 - 由于某种原因,它在我的机器上运行。为了完整起见,我附上了代码(带有额外的导入)。

import pandas as pd
import numpy as np
frame = pd.DataFrame({'a': np.random.randn(100)})
store = pd.HDFStore('mydata.h5')
store['obj1'] = frame
store['obj1_col'] = frame['a']
store.put('obj2', frame, format='table')
store.select('obj2', where=['index >= 10 and index <= 15'])

返回

a
10	-0.049168
11	0.130048
12	-1.553641
13	-0.978392
14	0.723070
15	0.066814

您能否提供您正在使用的库的版本?我想知道我们是否可能使用了不同版本的库。
我有

import tables
import sys
print(sys.version) # 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]
print(pd.__version__) # 1.5.0
print(np.__version__) # 1.23.3
print(tables.__version__) # 3.8.0 ... (这个是依赖项)

为了澄清 - 我有一个怀疑,它可能与 pytables 版本有关,可能与此答案中提到的相关答案有关

您可以尝试升级 pytables(例如通过 pip install --upgrade tables)并重新运行一次吗?

英文:

(I wanted to comment, but I can't yet due to reputation...)

Hello, this is interesting -- for some reason it works on my machine. For the sake of completeness, I attach the code (with added imports).

import pandas as pd
import numpy as np
frame = pd.DataFrame({&#39;a&#39;: np.random.randn(100)})
store = pd.HDFStore(&#39;mydata.h5&#39;)
store[&#39;obj1&#39;] = frame
store[&#39;obj1_col&#39;] = frame[&#39;a&#39;]
store.put(&#39;obj2&#39;,frame,format=&#39;table&#39;)
store.select(&#39;obj2&#39;,where=[&#39;index &gt;= 10 and index &lt;= 15&#39;])

Returns

a
10	-0.049168
11	0.130048
12	-1.553641
13	-0.978392
14	0.723070
15	0.066814

Could you please mention the version of libraries you're using? I wonder if we might have different versions of libraries.
I have

import tables
import sys
print(sys.version) # 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]
print(pd.__version__) # 1.5.0
print(np.__version__) # 1.23.3
print(tables.__version__) # 3.8.0 ... (this one is dependency)

To clarify -- I have a suspicion that it might be connected to pytables version, as referred in this, possibly related answer.

Could you try upgrading pytables (e.g. by pip install --upgrade tables) and run again?

huangapple
  • 本文由 发表于 2023年5月28日 16:21:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76350562.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定