如何向继承自pandas.DataFrame的类中添加新属性?

huangapple go评论69阅读模式
英文:

How can I add new attributes to a pandas.DataFrame derived class?

问题

我想创建一个从`pandas.DataFrame`派生的类`__init__()`略有不同我会在新属性中存储一些额外的数据最后调用`DataFrame.__init__()`。

```python
from pandas import DataFrame

class DataFrameDerived(DataFrame):
    def __init__(self, *args, **kwargs):
        self.derived = True
        super().__init__(*args, **kwargs)

DataFrameDerived({'a':[1,2,3]})

在创建新属性(self.derived = True)时,这段代码会出现以下错误:

> RecursionError: maximum recursion depth exceeded while calling a Python object


<details>
<summary>英文:</summary>

I want to create a class derived from `pandas.DataFrame` with a slightly different `__init__()`. I&#39;ll store some additional data in new attributes and finally call `DataFrame.__init__()`.

from pandas import DataFrame

class DataFrameDerived(DataFrame):
def init(self, *args, **kwargs):
self.derived = True
super().init(*args, **kwargs)

DataFrameDerived({'a':[1,2,3]})


This code gives the following error when creating the new attribute (`self.derived = True`):

&gt; RecursionError: maximum recursion depth exceeded while calling a Python object

</details>


# 答案1
**得分**: 0

可以*可能*,但实现方式不太容易扩展。确实,[官方文档](https://pandas.pydata.org/docs/development/extending.html#subclassing-pandas-data-structures)建议使用替代方法。`pd.DataFrame`的实现复杂,涉及多重继承和各种混合方式,还使用各种属性设置/获取钩子,如`__getattr__`和`__setattr__`,以提供语法糖,例如使用`df.some_column`和`df.some_colum = whatever`,而不使用`df['some_column']`语法。如果查看堆栈跟踪,可以看到`__setattr__`正在发生*某些*事情:

RecursionError Traceback (most recent call last)
Cell In[1], line 8
5 self.derived = True
6 super().init(*args, **kwargs)
----> 8 DataFrameDerived({'a':[1,2,3]})

Cell In[1], line 5, in DataFrameDerived.init(self, *args, **kwargs)
4 def init(self, *args, **kwargs):
----> 5 self.derived = True
6 super().init(*args, **kwargs)

File ~/miniconda3/envs/py311/lib/python3.11/site-packages/pandas/core/generic.py:6014, in NDFrame.setattr(self, name, value)
6012 else:
6013 try:
-> 6014 existing = getattr(self, name)
6015 if isinstance(existing, Index):
6016 object.setattr(self, name, value)

File ~/miniconda3/envs/py311/lib/python3.11/site-packages/pandas/core/generic.py:5986, in NDFrame.getattr(self, name)
5976 """
5977 After regular attribute access, try looking up the name
5978 This allows simpler access to columns for interactive use.
5979 """
5980 # Note: obj.x will always call obj.getattribute('x') prior to
5981 # calling obj.getattr('x').
5982 if (
5983 name not in self._internal_names_set
5984 and name not in self._metadata
5985 and name not in self._accessors
-> 5986 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5987 ):
5988 return self[name]
5989 return object.getattribute(self, name)


了解了这些,你可以*盲目*地使用`object.__setattr__`来绕过此问题:

In [1]: from pandas import DataFrame
...:
...: class DataFrameDerived(DataFrame):
...: def init(self, *args, **kwargs):
...: object.setattr(self, 'derived', True)
...: super().init(*args, **kwargs)
...:
...: DataFrameDerived({'a':[1,2,3]})
Out[1]:
a
0 1
1 2
2 3


但再次强调,如果不真正了解实现方式,你只是在猜测“它能否工作”。它可能会工作。但正如链接文档中所指出的,你可能还需要[重写“构造函数”方法,以便在使用数据帧方法时,你的数据帧类型将返回其自身类型的数据帧](https://pandas.pydata.org/docs/development/extending.html#override-constructor-properties)。

除了使用继承之外,[另一种方法是注册其他访问器命名空间。](https://pandas.pydata.org/docs/development/extending.html#registering-custom-accessors)如果这对你有用,这是一种更简单的扩展pandas的方法。

如果不知道你确切想要实现什么,很难建议最佳方法。但你肯定应该从阅读我链接的有关[扩展Pandas](https://pandas.pydata.org/docs/development/extending.html#extending-pandas)的整个文档开始。

<details>
<summary>英文:</summary>

It is *possible*, but the implementation isn&#39;t very open to extension. Indeed, the [official docs](https://pandas.pydata.org/docs/development/extending.html#subclassing-pandas-data-structures) suggest using alternatives. The implementation of `pd.DataFrame` is complex, involving multiple inheritance with various mixins, and also, it uses the various attribute setting/getting hooks, like `__getattr__` and `__setattr__`, to among other things, provide syntactic sugar like using `df.some_column` and `df.some_colum = whatever` to work without using the `df[&#39;some_column&#39;]` syntax.  If you look at the stack trace, you can see that *something* is going on with `__setattr__`:


    RecursionError                            Traceback (most recent call last)
    Cell In[1], line 8
          5         self.derived = True
          6         super().__init__(*args, **kwargs)
    ----&gt; 8 DataFrameDerived({&#39;a&#39;:[1,2,3]})
    
    Cell In[1], line 5, in DataFrameDerived.__init__(self, *args, **kwargs)
          4 def __init__(self, *args, **kwargs):
    ----&gt; 5     self.derived = True
          6     super().__init__(*args, **kwargs)
    
    File ~/miniconda3/envs/py311/lib/python3.11/site-packages/pandas/core/generic.py:6014, in NDFrame.__setattr__(self, name, value)
       6012 else:
       6013     try:
    -&gt; 6014         existing = getattr(self, name)
       6015         if isinstance(existing, Index):
       6016             object.__setattr__(self, name, value)
    
    File ~/miniconda3/envs/py311/lib/python3.11/site-packages/pandas/core/generic.py:5986, in NDFrame.__getattr__(self, name)
       5976 &quot;&quot;&quot;
       5977 After regular attribute access, try looking up the name
       5978 This allows simpler access to columns for interactive use.
       5979 &quot;&quot;&quot;
       5980 # Note: obj.x will always call obj.__getattribute__(&#39;x&#39;) prior to
       5981 # calling obj.__getattr__(&#39;x&#39;).
       5982 if (
       5983     name not in self._internal_names_set
       5984     and name not in self._metadata
       5985     and name not in self._accessors
    -&gt; 5986     and self._info_axis._can_hold_identifiers_and_holds_name(name)
       5987 ):
       5988     return self[name]
       5989 return object.__getattribute__(self, name)

Knowing this, one might *blindly* just use `object.__setattr__` instead, to bypass this:

    In [1]: from pandas import DataFrame
       ...:
       ...: class DataFrameDerived(DataFrame):
       ...:     def __init__(self, *args, **kwargs):
       ...:         object.__setattr__(self, &#39;derived&#39;, True)
       ...:         super().__init__(*args, **kwargs)
       ...:
       ...: DataFrameDerived({&#39;a&#39;:[1,2,3]})
    Out[1]:
       a
    0  1
    1  2
    2  3

But again, without really understanding the implementation, you are just crossing your fingers and hoping &quot;it works&quot;. Which it may. But as noted in the linked docs, you are possibly also going to want to [override the &quot;constructor&quot; methods, so that your data frame type will return data frames of it&#39;s own type when using dataframe methods](https://pandas.pydata.org/docs/development/extending.html#override-constructor-properties).

Instead of using inheritance, [an alternative is to instead register other accessor namespaces.](https://pandas.pydata.org/docs/development/extending.html#registering-custom-accessors). This is one simpler method to extend pandas, if that works for you.

Without knowing more details about what exactly you are trying to accomplish, it is difficult to suggest the best way forward. But you should definitely start by reading the whole of those docs I&#39;ve linked to on [Extending Pandas](https://pandas.pydata.org/docs/development/extending.html#extending-pandas)

</details>



huangapple
  • 本文由 发表于 2023年6月12日 11:10:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76453441.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定