在类构造函数__init__()中如何初始化Pandas “DataFrame”作为类属性?

huangapple go评论67阅读模式
英文:

Pandas "DataFrame"s as class properties. How should I initialize them in class constructor __init__()?

问题

我有一个类它将管理多个pandas数据帧数据帧是类属性我在类构造函数中初始化了每个数据帧并将空数据帧分配给它们因为在创建类实例时数据帧不可用其中一些将使用该类的其他数据帧的数据创建如下所示

```python
class MyClass:
    """
    处理在csv文件中缓存的数据
    """

    def __init__(self):
        """
        初始化MyClass类
        """

        self._parent_df = pd.DataFrame()
        self._child_df = pd.DataFrame()
        self.stored_data_df = pd.DataFrame()
        self.personnel_data_df = pd.DataFrame()
        self.salary_df = pd.DataFrame()
        self.settings = {}
        self._last_update = self._get_last_upd()
        self._last_events_df = pd.DataFrame()

    @property
    def parent_df(self):
        if not self._parent_df.empty:
            return self._parent_df
        else:
            raise AttributeError()

    @parent_df.setter
    def parent_df(self, value: pd.DataFrame):
        self._parent_df = value

    # 更多获取和设置数据帧的属性
    
    # … 以及一些处理多个数据帧中的数据的方法

写这个类的最佳实践是什么?由于初始化和分配数据帧是消耗资源的任务,这种方法是否被认为是“Pythonic”?我应该避免在__init__中定义它们,或者将初始值设定为'None'而不是空数据帧吗?
self._parent_df = None
另外,如果有人知道任何一个像这样工作的好的开源包,我会很愿意参考。


<details>
<summary>英文:</summary>

I have a class which will manage multiple pandas Data Frames. The Data frames are class properties. I have initiated every Data Frame in the class constructor and assigned an empty Data Frame to them (Because Data Frames are not available at the time of creating a class instance and some of them will be created using data from other Data Frames of this class) like this:

class MyClass:
"""
Handle data cached in csv files
"""

def __init__(self):
    &quot;&quot;&quot;
    initialize MyClass class
    &quot;&quot;&quot;

    self._parent_df = pd.DataFrame()
    self._child_df = pd.DataFrame()
    self.stored_data_df = pd.DataFrame()
    self.personnel_data_df = pd.DataFrame()
    self.salary_df = pd.DataFrame()
    self.settings = {}
    self._last_update = self._get_last_upd()
    self._last_events_df = pd.DataFrame()

@property
def parent_df(self):
    if not self._parent_df.empty
        return self._parent_df
    else:
        raise AttributeError()

@parent_df.setter
def parent_df(self, value: pd.DataFrame):
    self._parent_df = value

# more properties getting and setting DataFrames

# … and some methods working with data in multiple DataFrames


What is the best practice to write this class? Since Initializing and assigning Data Frames are resource heavy tasks, is this approach considered &#39;Pythonic&#39;? Should I avoid defining them in __init__ or assign &#39;None&#39; as initial value instead of empty Data Frames?
`self._parent_df = None`
Also, if anybody knows any good open source package that has a class working like this, I&#39;ll be Happy to look at. 

</details>


# 答案1
**得分**: 1

我相信这种方法相当昂贵,因为在对象初始化时,您正在进行昂贵的初始化(许多数据框等)。

最佳方法称为惰性初始化,其中属性的getter负责初始化属性本身(如果尚未初始化)。
示例代码:

```python
class MyClass:
    def __init__(self):
        self._value = None

    @property
    def value(self):
        if self._value is None:
            self._value = expensive_initialization()
        return self._value

my_instance = MyClass()

当我们第一次访问 my_instance.value 时,my_instance._value 将触发对 expensive_inistialization 的调用(无论该属性的初始化是什么)。
这样,您可以在首次需要时触发每个属性的初始化。

对于包,可以使用 lazy_python,您还可以参考这篇深入解释这种方法的文章:在Python中创建惰性属性以提高性能的方法

希望对您有所帮助!

英文:

Well, I believe this approach is quite expensive Since, with the initialization of the object, you are doing expensive initialization (plenty of data frames, etc.).

The best approach is called lazy initialization, in which the getter of a property is responsible for initializing the property itself (in case it wasn't initialized).
Sample Code:

class MyClass:
    def __init__(self):
        self._value = None

    @property
    def value(self):
        if self._value is None:
            self._value = expensive_initialization()
        return self._value

my_instance = MyClass()

When we access my_instance.value for the first time, the my_instance._valuewill trigger the call for the expensive_inistialization (whatever it is for that property).
This way, you trigger the initialization for each property on its first-time need.

for packages, there is lazy_python, for further explanation you have this nice article for someone who went deep into explaining this approach.
How to Create Lazy Attributes to Improve Performance in Python..

I hope this helps!

huangapple
  • 本文由 发表于 2023年6月22日 07:50:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76527812.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定