英文:
Pandas "DataFrame"s as class properties. How should I initialize them in class constructor __init__()?
问题
我有一个类,它将管理多个pandas数据帧。数据帧是类属性。我在类构造函数中初始化了每个数据帧,并将空数据帧分配给它们(因为在创建类实例时数据帧不可用,其中一些将使用该类的其他数据帧的数据创建)如下所示:
```python
class MyClass:
"""
处理在csv文件中缓存的数据
"""
def __init__(self):
"""
初始化MyClass类
"""
self._parent_df = pd.DataFrame()
self._child_df = pd.DataFrame()
self.stored_data_df = pd.DataFrame()
self.personnel_data_df = pd.DataFrame()
self.salary_df = pd.DataFrame()
self.settings = {}
self._last_update = self._get_last_upd()
self._last_events_df = pd.DataFrame()
@property
def parent_df(self):
if not self._parent_df.empty:
return self._parent_df
else:
raise AttributeError()
@parent_df.setter
def parent_df(self, value: pd.DataFrame):
self._parent_df = value
# 更多获取和设置数据帧的属性
# … 以及一些处理多个数据帧中的数据的方法
写这个类的最佳实践是什么?由于初始化和分配数据帧是消耗资源的任务,这种方法是否被认为是“Pythonic”?我应该避免在__init__中定义它们,或者将初始值设定为'None'而不是空数据帧吗?
self._parent_df = None
另外,如果有人知道任何一个像这样工作的好的开源包,我会很愿意参考。
<details>
<summary>英文:</summary>
I have a class which will manage multiple pandas Data Frames. The Data frames are class properties. I have initiated every Data Frame in the class constructor and assigned an empty Data Frame to them (Because Data Frames are not available at the time of creating a class instance and some of them will be created using data from other Data Frames of this class) like this:
class MyClass:
"""
Handle data cached in csv files
"""
def __init__(self):
"""
initialize MyClass class
"""
self._parent_df = pd.DataFrame()
self._child_df = pd.DataFrame()
self.stored_data_df = pd.DataFrame()
self.personnel_data_df = pd.DataFrame()
self.salary_df = pd.DataFrame()
self.settings = {}
self._last_update = self._get_last_upd()
self._last_events_df = pd.DataFrame()
@property
def parent_df(self):
if not self._parent_df.empty
return self._parent_df
else:
raise AttributeError()
@parent_df.setter
def parent_df(self, value: pd.DataFrame):
self._parent_df = value
# more properties getting and setting DataFrames
# … and some methods working with data in multiple DataFrames
What is the best practice to write this class? Since Initializing and assigning Data Frames are resource heavy tasks, is this approach considered 'Pythonic'? Should I avoid defining them in __init__ or assign 'None' as initial value instead of empty Data Frames?
`self._parent_df = None`
Also, if anybody knows any good open source package that has a class working like this, I'll be Happy to look at.
</details>
# 答案1
**得分**: 1
我相信这种方法相当昂贵,因为在对象初始化时,您正在进行昂贵的初始化(许多数据框等)。
最佳方法称为惰性初始化,其中属性的getter负责初始化属性本身(如果尚未初始化)。
示例代码:
```python
class MyClass:
def __init__(self):
self._value = None
@property
def value(self):
if self._value is None:
self._value = expensive_initialization()
return self._value
my_instance = MyClass()
当我们第一次访问 my_instance.value
时,my_instance._value
将触发对 expensive_inistialization
的调用(无论该属性的初始化是什么)。
这样,您可以在首次需要时触发每个属性的初始化。
对于包,可以使用 lazy_python,您还可以参考这篇深入解释这种方法的文章:在Python中创建惰性属性以提高性能的方法。
希望对您有所帮助!
英文:
Well, I believe this approach is quite expensive Since, with the initialization of the object, you are doing expensive initialization (plenty of data frames, etc.).
The best approach is called lazy initialization, in which the getter of a property is responsible for initializing the property itself (in case it wasn't initialized).
Sample Code:
class MyClass:
def __init__(self):
self._value = None
@property
def value(self):
if self._value is None:
self._value = expensive_initialization()
return self._value
my_instance = MyClass()
When we access my_instance.value
for the first time, the my_instance._value
will trigger the call for the expensive_inistialization (whatever it is for that property).
This way, you trigger the initialization for each property on its first-time need.
for packages, there is lazy_python, for further explanation you have this nice article for someone who went deep into explaining this approach.
How to Create Lazy Attributes to Improve Performance in Python..
I hope this helps!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论