Python: 快速子集化pandas数据帧

huangapple go评论96阅读模式
英文:

Python: Fast subsetting of pandas dataframe

问题

以下是翻译好的内容:

  1. 我有一个包含约 500,000 行和 40 列的大型 Pandas 数据框
  2. >>> 数据
  3. ColA ColB ColC ... ColX ColY ColZ
  4. 445828 A 10 2020-02-21 ... 6 NaN 2019-08-13
  5. 445829 B 12 2020-02-21 ... 8 NaN 2019-08-13
  6. 445830 C 13 2020-02-21 ... 10 NaN 2019-08-13
  7. 445831 D 15 2020-02-21 ... 12 NaN 2019-08-13
  8. 445832 E 17 2020-02-21 ... 15 NaN 2019-08-13
  9. 我在一个类内部使用这个数据框这个类的一个方法是 `get_property(self, A, B, C)`
  10. def get_property(self, A, B, C):
  11. data_subset = self.data[(self.data.ColA == A) &
  12. (self.data.ColB == B) &
  13. (self.data.ColC == C)]
  14. return data_subset
  15. 我多次执行这个查询它相对耗时有没有办法提高这个查询的速度
  16. 我已经使用了 `data.set_index(['ColA', 'ColB', 'ColC'])`
英文:

I have a large pandas dataframe with ~500k lines and 40columns.

  1. >>> data
  2. ColA ColB ColC ... ColX ColY ColZ
  3. 445828 A 10 2020-02-21 ... 6 nan 2019-08-13
  4. 445829 B 12 2020-02-21 ... 8 nan 2019-08-13
  5. 445830 C 13 2020-02-21 ... 10 nan 2019-08-13
  6. 445831 D 15 2020-02-21 ... 12 nan 2019-08-13
  7. 445832 E 17 2020-02-21 ... 15 nan 2019-08-13

I use this dataframe inside a class. One of the method of this class is get_property(self, A, B, C).

  1. def get_property(self, option_basics):
  2. data_subset = self.data[(self.data.colA == A) &
  3. (self.data.colB == B) &
  4. (self.data.colC == C)]
  5. return data_subset

I make this query hundreds of time. It's relatively time consuming. Is there a way to increase the speed of this request?

I have already used data.set_index(['colA', 'colB', 'colC'])

答案1

得分: 0

将索引设置为大幅提高了查询性能(保持数据框中使用的列以保持向后兼容性)

data.set_index(['colA', 'colB', 'colC'], drop = False)

df.loc[A, B, C]

英文:

Setting the index massively improved the performance of the query (keeping the columns used in the dataframe for backward compatibility)

> data.set_index(['colA', 'colB', 'colC'], drop = False)
>
> df.loc[A,B,C]

huangapple
  • 本文由 发表于 2020年1月7日 00:56:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/59616048.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定