问题

以下是翻译好的内容：

我有一个包含约 500,000 行和 40 列的大型 Pandas 数据框。
>>> 数据
      ColA  ColB        ColC  ...  ColX  ColY        ColZ
445828    A    10  2020-02-21  ...     6   NaN  2019-08-13
445829    B    12  2020-02-21  ...     8   NaN  2019-08-13
445830    C    13  2020-02-21  ...    10   NaN  2019-08-13
445831    D    15  2020-02-21  ...    12   NaN  2019-08-13
445832    E    17  2020-02-21  ...    15   NaN  2019-08-13
我在一个类内部使用这个数据框。这个类的一个方法是 `get_property(self, A, B, C)`。
def get_property(self, A, B, C):
    data_subset = self.data[(self.data.ColA == A) &
                            (self.data.ColB == B) &
                            (self.data.ColC == C)]
    return data_subset
我多次执行这个查询。它相对耗时。有没有办法提高这个查询的速度？
我已经使用了 `data.set_index(['ColA', 'ColB', 'ColC'])`

英文:

I have a large pandas dataframe with ~500k lines and 40columns.

&gt;&gt;&gt; data 
       ColA  ColB     ColC           ...          ColX     ColY  ColZ
445828   A     10     2020-02-21     ...             6      nan  2019-08-13
445829   B     12     2020-02-21     ...             8      nan  2019-08-13
445830   C     13     2020-02-21     ...            10      nan  2019-08-13
445831   D     15     2020-02-21     ...            12      nan  2019-08-13
445832   E     17     2020-02-21     ...            15      nan  2019-08-13

I use this dataframe inside a class. One of the method of this class is get_property(self, A, B, C).

def get_property(self, option_basics):
    data_subset = self.data[(self.data.colA == A) &amp;
                       (self.data.colB == B) &amp;
                       (self.data.colC == C)]
    return data_subset

I make this query hundreds of time. It's relatively time consuming. Is there a way to increase the speed of this request?

I have already used data.set_index(['colA', 'colB', 'colC'])

答案1

得分: 0

将索引设置为大幅提高了查询性能（保持数据框中使用的列以保持向后兼容性）

data.set_index(['colA', 'colB', 'colC'], drop = False)

df.loc[A, B, C]

英文:

Setting the index massively improved the performance of the query (keeping the columns used in the dataframe for backward compatibility)

> data.set_index(['colA', 'colB', 'colC'], drop = False)
>
> df.loc[A,B,C]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python: 快速子集化pandas数据帧

问题

答案1

基于 Pandas 中的其他列的条件。

如何使用Selenium Python在Facebook广告库的搜索框中发送文本

numpy的逻辑逐元素操作在pandas 2.0中是否出现问题？（np.logical_or）

将数据框A的列1合并到数据框B，当数据框B的列1中存在多个匹配行时？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。