英文:
TypeError: Index(...) must be called with a collection of some kind, 'Mcap' was passed while constructing a Dataframe
问题
我试图对数据集进行Winsorization(维新泰格)。我在多个级别上进行操作。
第一步:我需要根据比率进行Winsorization,这个比率是基于TotalAsset(我的数据集中的一列)计算的。
FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'])
然后,我使用相同的代码(为了减少内存使用),将值提取为NumPy数组,然后应用Winsorization(需要移除顶部/底部的5%)。
winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05])
接下来,我需要将这些值还原回DataFrame。实际上,错误发生在这里。
pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05]), columns=[item])
然后,我将其与*FirmMonthlyAccountingData['totalAssets']
相乘,以获取原始值。
Copy_of_firmmonthlydata[item] = pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05]), columns=[item]) * FirmMonthlyAccountingData['totalAssets']
最后,我需要使用for循环对所有列进行相同的操作,以尽可能节省内存。
columns_to_winsorize = ['Mcap', 'first', 'second', 'third']
for item in columns_to_winsorize:
Copy_of_firmmonthlydata[item] = pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05]), columns=[item]) * FirmMonthlyAccountingData['totalAssets']
但是,我遇到了以下错误:
TypeError: Index(...) must be called with a collection of some kind, 'Mcap' was passed
Any help would be appreciated.
希望这对你有帮助。如果需要更多帮助,请随时提出。
英文:
i'm trying to winsorize a dataset. I do it in multiple levels.
first one: i need the winsorization based on a Ratio, which is based on TotalAsset (a column in my dataset).
FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'])
then i use the same code (in order to use less memory) and extract the values as numpy arrays and then i apply the Winsorazation (i need to remove top/button 5%).
winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05])
then i need to change this back into a Dataframe. The Error happens here actually.
pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'],axis=0).values,limits[0.05,0.05]),columns=item)
then multiply it with *FirmMonthlyAccountingData['totalAssets']
so i get the original values back.
Copy_of_firmmonthlydata[item]=pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'],axis=0).values,limits[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']
finally i need to do it for all the columns with a for loop, in order to save memory as much as possible.
columns_to_winsorize= ['Mcap', 'first', 'second', 'third']
for item in columns_to_winsorize:
Copy_of_firmmonthlydata[item]=pd.DataFrame(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']
but i get this error
TypeError Traceback (most recent call last)
Cell In[27], line 10
3 columns_to_winsorize= ['Mcap', 'first', 'second']
9 for item in columns_to_winsorize:
---> 10 Copy_of_firmmonthlydata=pd.DataFrame(winsorize(FirmMonthlyAccountingData[f'{item}'].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]),columns=item)*FirmMonthlyAccountingData['totalAssets']
File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\frame.py:722, in DataFrame.__init__(self, data, index, columns, dtype, copy)
720 # a masked array
721 data = sanitize_masked_array(data)
--> 722 mgr = ndarray_to_mgr(
723 data,
724 index,
725 columns,
726 dtype=dtype,
727 copy=copy,
728 typ=manager,
729 )
731 elif isinstance(data, (np.ndarray, Series, Index, ExtensionArray)):
732 if data.dtype.names:
733 # i.e. numpy structured array
File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\internals\construction.py:333, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
324 values = sanitize_array(
325 values,
326 None,
(...)
329 allow_2d=True,
330 )
332 # _prep_ndarraylike ensures that values.ndim == 2 at this point
--> 333 index, columns = _get_axes(
334 values.shape[0], values.shape[1], index=index, columns=columns
335 )
337 _check_values_indices_shape_match(values, index, columns)
339 if typ == "array":
File c:\Users\anaconda3\envs\PythonCourse2023\Lib\site-packages\pandas\core\internals\construction.py:738, in _get_axes(N, K, index, columns)
736 columns = default_index(K)
737 else:
--> 738 columns = ensure_index(columns)
739 return index, columns
...
5066 f"{cls.__name__}(...) must be called with a collection of some "
5067 f"kind, {repr(data)} was passed"
5068 )
TypeError: Index(...) must be called with a collection of some kind, 'Mcap' was passed
Any help would be appreciated.
答案1
得分: 2
在这里,DataFrame 不是必需的,还将列 TotalAssets
转换为 numpy 数组:
for item in columns_to_winsorize:
Copy_of_firmmonthlydata[item] = winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05]) * FirmMonthlyAccountingData['TotalAssets'].values
或者使用 Series
:
for item in columns_to_winsorize:
Copy_of_firmmonthlydata[item] = pd.Series(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values, limits=[0.05, 0.05])) * FirmMonthlyAccountingData['TotalAssets']
英文:
Here DataFrame is not necessary, also convert column TotalAssets
to numpy array:
for item in columns_to_winsorize:
Copy_of_firmmonthlydata[item]= winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]) *FirmMonthlyAccountingData['TotalAssets'].values
Or use Series
:
for item in columns_to_winsorize:
Copy_of_firmmonthlydata[item]=pd.Series(winsorize(FirmMonthlyAccountingData[item].div(FirmMonthlyAccountingData['TotalAssets'], axis=0).values,limits=[0.05,0.05]))*FirmMonthlyAccountingData['TotalAssets']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论