TypeError: Index(…) 必须使用某种集合来调用,构建 DataFrame 时传递了 ‘Mcap’

huangapple go评论113阅读模式
英文:

TypeError: Index(...) must be called with a collection of some kind, 'Mcap' was passed while constructing a Dataframe

问题

我试图对数据集进行Winsorization(维新泰格)。我在多个级别上进行操作。

第一步:我需要根据比率进行Winsorization,这个比率是基于TotalAsset(我的数据集中的一列)计算的。

FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'])

然后,我使用相同的代码(为了减少内存使用),将值提取为NumPy数组,然后应用Winsorization(需要移除顶部/底部的5%)。

winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05])

接下来,我需要将这些值还原回DataFrame。实际上,错误发生在这里。

pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05]), columns=[item])

然后,我将其与*FirmMonthlyAccountingData['totalAssets']相乘,以获取原始值。

Copy_of_firmmonthlydata[item] = pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05]), columns=[item]) * FirmMonthlyAccountingData['totalAssets']

最后,我需要使用for循环对所有列进行相同的操作,以尽可能节省内存。

columns_to_winsorize = ['Mcap', 'first', 'second', 'third']

for item in columns_to_winsorize:
    Copy_of_firmmonthlydata[item] = pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05]), columns=[item]) * FirmMonthlyAccountingData['totalAssets']

但是,我遇到了以下错误:

TypeError: Index(...) must be called with a collection of some kind, 'Mcap' was passed

Any help would be appreciated.

希望这对你有帮助。如果需要更多帮助,请随时提出。

英文:

i'm trying to winsorize a dataset. I do it in multiple levels.

first one: i need the winsorization based on a Ratio, which is based on TotalAsset (a column in my dataset).

FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'])

then i use the same code (in order to use less memory) and extract the values as numpy arrays and then i apply the Winsorazation (i need to remove top/button 5%).

winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05])

then i need to change this back into a Dataframe. The Error happens here actually.

pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'],axis=0).values,limits[0.05,0.05]),columns=item)

then multiply it with *FirmMonthlyAccountingData['totalAssets'] so i get the original values back.

Copy_of_firmmonthlydata[item]=pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'],axis=0).values,limits[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']

finally i need to do it for all the columns with a for loop, in order to save memory as much as possible.

columns_to_winsorize= ['Mcap', 'first', 'second', 'third']

for item in columns_to_winsorize:
    Copy_of_firmmonthlydata[item]=pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']

but i get this error

  TypeError                                 Traceback (most recent call last)
Cell In[27], line 10
      3 columns_to_winsorize= ['Mcap', 'first', 'second']
      9 for item in columns_to_winsorize:
---> 10     Copy_of_firmmonthlydata=pd.DataFrame(winsorize(FirmMonthlyAccountingData[f'{item}'].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']

File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\frame.py:722, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    720     # a masked array
    721     data = sanitize_masked_array(data)
--> 722     mgr = ndarray_to_mgr(
    723         data,
    724         index,
    725         columns,
    726         dtype=dtype,
    727         copy=copy,
    728         typ=manager,
    729     )
    731 elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):
    732     if data.dtype.names:
    733         # i.e. numpy structured array

File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\internals\construction.py:333, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    324     values = sanitize_array(
    325         values,
    326         None,
   (...)
    329         allow_2d=True,
    330     )
    332 # _prep_ndarraylike ensures that values.ndim == 2 at this point
--> 333 index, columns = _get_axes(
    334     values.shape[0], values.shape[1], index=index, columns=columns
    335 )
    337 _check_values_indices_shape_match(values, index, columns)
    339 if typ == "array":

File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\internals\construction.py:738, in _get_axes(N, K, index, columns)
    736     columns = default_index(K)
    737 else:
--> 738     columns = ensure_index(columns)
    739 return index, columns
...
   5066         f"{cls.__name__}(...) must be called with a collection of some "
   5067         f"kind, {repr(data)} was passed"
   5068     )

TypeError: Index(...) must be called with a collection of some kind, 'Mcap' was passed

Any help would be appreciated.

答案1

得分: 2

在这里,DataFrame 不是必需的,还将列 TotalAssets 转换为 numpy 数组:

for item in columns_to_winsorize:
    Copy_of_firmmonthlydata[item] = winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05]) * FirmMonthlyAccountingData['TotalAssets'].values

或者使用 Series

for item in columns_to_winsorize:
    Copy_of_firmmonthlydata[item] = pd.Series(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05])) * FirmMonthlyAccountingData['TotalAssets']
英文:

Here DataFrame is not necessary, also convert column TotalAssets to numpy array:

for item in columns_to_winsorize:
    Copy_of_firmmonthlydata[item]= winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]) *FirmMonthlyAccountingData['TotalAssets'].values

Or use Series:

for item in columns_to_winsorize:
    Copy_of_firmmonthlydata[item]=pd.Series(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]))*FirmMonthlyAccountingData['TotalAssets']

huangapple
  • 本文由 发表于 2023年8月10日 14:22:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76873078.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定